chore: remove Airflow completely from the stack

Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:38:46 +00:00
parent ce1c8bde57
commit f8d66aa01f
27 changed files with 663 additions and 719 deletions
--- a/docs/adr/0013-multi-agent-recommendation.md
+++ b/docs/adr/0013-multi-agent-recommendation.md
@@ -0,0 +1,106 @@
+# ADR-0013 — Multi-agent recommendation: pre-computed agent snippets + orchestrator LLM
+
+**Status:** Accepted
+**Date:** 2026-05-01
+**Supersedes:** ADR-0007, ADR-0012
+
+## Context
+
+The ε-greedy bandit (ADR-0007, promoted to v2 in ADR-0012) was the first recommendation
+policy. It served adequately during early M1 testing but carries structural problems that
+become more acute as the user base grows:
+
+- **Training signal sparsity.** The median user generates fewer than 5 reward signals per
+  week. Ridge regression on a 12-dimensional feature vector needs far more signal than
+  that to converge to a meaningful θ before the user loses interest.
+- **Cold-start cost.** Every new user starts with an uninformed identity matrix. Early tips
+  are essentially random for the first weeks of use — precisely when first impressions
+  matter most.
+- **Opacity.** The bandit cannot explain why it chose a tip. An orchestrator that reasons
+  explicitly over named agent outputs ("3 overdue tasks + peak hour approaching") is
+  interpretable by design.
+- **Coupling of generation and selection.** The current pipeline generates candidates, then
+  scores them; the scoring is decoupled from the LLM reasoning. Giving the LLM the full
+  pre-computed context directly is a simpler and more capable design.
+
+## Decision
+
+Replace the RL bandit with a **multi-agent pipeline**:
+
+### Sub-agents (async, pre-computed)
+
+Multiple domain-specialized Python agents each analyze user state from one angle and
+produce a **prompt snippet** — a short natural-language paragraph describing what they
+found. They do not produce tips. They run periodically (every 15 minutes) and store
+results in the new `agent_outputs` table with per-agent TTLs.
+
+Initial agent set:
+
+| Agent | ID | TTL |
+|---|---|---|
+| OverdueTaskAgent | `overdue-task` | 1h |
+| MomentumAgent | `momentum` | 6h |
+| TimeOfDayAgent | `time-of-day` | 15m |
+| RecentPatternsAgent | `recent-patterns` | 24h |
+| FocusAreaAgent | `focus-area` | 12h |
+
+### Orchestrator agent (real-time)
+
+When a user requests a tip, the TypeScript recommender:
+1. Fetches all non-expired `agent_outputs` rows for the user.
+2. Calls `POST /recommend` on `ml/serving` with the snippet list.
+3. `ml/serving` assembles a single orchestrator prompt (template `v4-orchestrator`)
+   that concatenates all snippets, then calls LiteLLM via the existing `tip-generator`
+   alias to produce one tip.
+
+No bandit scoring. No reward delivery to an ML model. The LLM receives full context and
+generates the tip in one call.
+
+### Feedback
+
+`tipFeedback` rows are still written on every user reaction. `inferReward()` still runs
+and `rewardMilli` is logged for observability and potential future supervised learning.
+Reactions are not delivered to an ML endpoint.
+
+## New data model
+
+```sql
+CREATE TABLE agent_outputs (
+  id TEXT PRIMARY KEY,
+  user_id TEXT NOT NULL REFERENCES users(id),
+  agent_id TEXT NOT NULL,          -- e.g. 'overdue-task'
+  prompt_text TEXT NOT NULL,       -- snippet produced by the agent
+  signals_snapshot TEXT,           -- JSON: inputs the agent consumed
+  computed_at TEXT NOT NULL,       -- ISO 8601
+  expires_at TEXT NOT NULL,        -- ISO 8601 = computed_at + TTL
+  agent_version TEXT NOT NULL      -- bump to invalidate cached outputs on logic changes
+);
+CREATE INDEX idx_agent_outputs_user_agent_exp
+  ON agent_outputs(user_id, agent_id, expires_at DESC);
+```
+
+## Consequences
+
+### Positive
+- Tips are explainable: `featuresJson` in `tipScores` records which agents contributed.
+- Cold-start is eliminated: the orchestrator reasons from signals immediately, no warm-up.
+- Adding or removing an agent is a self-contained change in `ml/agents/`.
+- Swapping LLM models remains a config change (LiteLLM alias unchanged).
+
+### Negative / risks
+- **No automatic exploration.** The bandit would discover that a user prefers certain tip
+  types without being told. The orchestrator only knows what the agents tell it.
+  Mitigation: agents can evolve to encode richer signals; offline evaluation via the
+  existing bench scripts remain available.
+- **Scheduler dependency.** If the pre-compute job falls behind, agent outputs go
+  stale. Mitigation: the orchestrator falls back to raw signal prompt when no outputs
+  exist; `TimeOfDayAgent` recomputes every 15 min to stay fresh.
+- **Higher per-request token cost.** The orchestrator prompt is longer than the old bandit
+  prompt. Mitigation: the `tip-generator` alias points to a small local model; token cost
+  is negligible at current scale.
+
+## Migration sequence
+
+See plan document in conversation context. 10 steps; each independently deployable and
+rollback-able. Cutover is Step 6 (single TypeScript PR). Bandit endpoints removed in
+Step 7 after 48h clean traffic.