alvis/oO - oO - AgapGit

alvis/oO

Author	SHA1	Message	Date
alvis	f8d66aa01f	chore: remove Airflow completely from the stack Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-03 16:38:46 +00:00
alvis	bad1bb2cba	feat(simulate): MLflow tracking, Airflow DAG integration, health checks for mlflow/airflow - sim_runs schema: add judge_mode, n_policies, airflow_dag_run_id, mlflow_run_id columns - admin health endpoint: add mlflow + airflow checks (Basic auth for Airflow API) - admin nav: add Simulations page link; rename section label - runner.py: optional MLflow experiment tracking; multi-policy support - sim_dag.py: Airflow DAG for offline sim pipeline - admin simulate page + API client methods for sim runs - shared-types tsconfig: exclude test files from build Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 12:08:36 +00:00
alvis	4a42a6aabf	feat(admin): profile freshness panel in data-quality (#81 phase B.4) Adds a per-feature freshness summary to /admin/data-quality so the admin can spot features that are systematically stale or never computed: totalEligible — distinct users with tip_views in the last 30 days missing — eligible users with no row stored for the feature stale — eligible users whose stored row is past its TTL Backend exposes summarizeProfileFreshness() in profile/builder.ts; one query per feature joins eligible users LEFT JOIN profile rows. Coverage = (eligible − missing − stale) / eligible, colored green/yellow/red via the new PctGood helper (high-is-good, opposite of the existing Pct used for missing-feature/stale-token rates). Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:34:46 +00:00
alvis	9e96540bcc	feat(admin): per-user profile view + rebuild action (#81 phase B.1) Surfaces phase A's profile features in /admin/users/:id so we can verify they're actually computing useful values before investing in bandit consumption. The detail GET now includes profile rows joined with registry metadata (name, value, age, fresh badge, ttlSec, description). Read does NOT trigger compute — staleness must be visible. A new POST .../profile/rebuild button force-recomputes and is audit-logged like reset-bandit. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:27:08 +00:00
alvis	7d4c29e137	feat(profile): user-profile feature registry + builder (phase A) Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:22:22 +00:00
alvis	430804e9a5	feat(ml): prompt registry + per-request variant selection Replaces the hardcoded "v1" label with a real prompt registry: ml/serving/prompts.py — keyed by version: v1 (baseline), v2-mentor (calm/specific persona), v3-few-shot (v1 persona + curated examples) ml/serving/main.py — POST /generate accepts optional prompt_version, 422 on unknown, echoes the version actually used back in the response services/api/src/config.ts — TIP_PROMPT_VERSION: empty / single / comma-list (uniform random per request) services/api/src/routes/recommender.ts — pickPromptVersion() drives selection; the response's prompt_version (not a stale TS constant) is what lands in tip_scores so the #92 reward-analytics dashboard shows real per-variant reaction rates Closes #84. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 15:44:04 +00:00
alvis	aa4bdd8f09	feat(admin): LLM tip quality dashboard — per-model/prompt/kind breakdowns /admin/reward-analytics now surfaces served count, reaction rate, and avg reward grouped by llm_model, prompt_version, and tip_kind — closing the loop so model/prompt iterations in M2 are legible next to the bandit policy view. Data comes from the tip_scores columns added in `ffdf707` and tip_feedback.reward_milli; bandit-only tips show as "(bandit-only)". Closes #92. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 15:24:52 +00:00
alvis	e3ca3ba733	feat: SignalSource abstraction — generalize signal ingestion beyond Todoist (#78 ) - Add Signal + SignalSource interfaces to packages/shared-types - TipCandidate.features widened to Record<string,number\|boolean> to match Signal - TodoistSignalSource: encapsulates fetch, cache, 401 handling, bus events, and act() - SignalAggregator: parallel fan-out across sources with per-source failure isolation - Recommender refactored to consume Signal[] via aggregator; source action dispatch via aggregator.act() - ADR-0009: signal normalization strategy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 01:11:56 +00:00
alvis	4c8ef9ad86	fix: consentGiven boolean in test fixture (was number, broke docker build) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:14:07 +00:00
alvis	ffdf70733f	feat: M2 AI tips — LiteLLM gateway, context assembler, end-to-end generation pipeline Issues closed: #86, #87, #88, #89, #90, #91, #79, #80, #82 infra: - docker-compose `ai` profile: Ollama + LiteLLM services - infra/litellm/litellm_config.yaml: tip-generator / embedder / judge aliases - .env.example: LITELLM_URL, LITELLM_MASTER_KEY, OLLAMA_URL ml/serving: - POST /generate: calls LiteLLM tip-generator alias, returns TipCandidate[] - JSON retry loop (2 retries with correction prompt on malformed response) - _parse_llm_json strips markdown fences ml/features: - context.py: build_context() assembles user signals → PromptContext (sorts overdue/high-priority tasks first for LLM prompt quality) shared-types: - TipKind, TipSource, TipCandidate types - Tip gains kind + rationale fields services/api: - recommender: 3-stage pipeline (assemble → score → serve) Stage 1: Todoist tasks + LLM candidates fetched in parallel Stage 2: egreedy bandit scores merged candidate pool Stage 3: serve + log with prompt_version, llm_model, tip_kind - tip_scores: prompt_version, llm_model, tip_kind columns + migrations - config: LITELLM_URL added - integrations: surface token_status in /integrations response tests: - ml/serving/tests/test_generate.py: 13 tests (retry, 502/503, fence variants) - ml/features/test_context.py: 9 tests (sorting, edge cases) - services/api recommender.unit.test.ts: 16 pure-function tests (inferReward, dueAgeDays) - services/api recommender.test.ts: 4 integration tests (tip_scores columns, LLM fallback) - shared-types: TipCandidate, rationale, full TipFeedback action set docs: - ADR-0008: LiteLLM AI gateway decision - overview.md: M2 pipeline description updated - ml/README.md: serving + features roles updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:09:02 +00:00
alvis	faf44c18fc	feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework - Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606) - Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward): dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3 - Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id} with d=7 feature vector (base 5 + sin/cos day-of-week encoding) - Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events - Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables - Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0 - Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls - Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture - Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns - ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 07:44:37 +00:00

11 Commits