Files
oO/docs/adr/0013-multi-agent-recommendation.md
alvis f8d66aa01f chore: remove Airflow completely from the stack
Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:38:46 +00:00

4.6 KiB

ADR-0013 — Multi-agent recommendation: pre-computed agent snippets + orchestrator LLM

Status: Accepted Date: 2026-05-01 Supersedes: ADR-0007, ADR-0012

Context

The ε-greedy bandit (ADR-0007, promoted to v2 in ADR-0012) was the first recommendation policy. It served adequately during early M1 testing but carries structural problems that become more acute as the user base grows:

  • Training signal sparsity. The median user generates fewer than 5 reward signals per week. Ridge regression on a 12-dimensional feature vector needs far more signal than that to converge to a meaningful θ before the user loses interest.
  • Cold-start cost. Every new user starts with an uninformed identity matrix. Early tips are essentially random for the first weeks of use — precisely when first impressions matter most.
  • Opacity. The bandit cannot explain why it chose a tip. An orchestrator that reasons explicitly over named agent outputs ("3 overdue tasks + peak hour approaching") is interpretable by design.
  • Coupling of generation and selection. The current pipeline generates candidates, then scores them; the scoring is decoupled from the LLM reasoning. Giving the LLM the full pre-computed context directly is a simpler and more capable design.

Decision

Replace the RL bandit with a multi-agent pipeline:

Sub-agents (async, pre-computed)

Multiple domain-specialized Python agents each analyze user state from one angle and produce a prompt snippet — a short natural-language paragraph describing what they found. They do not produce tips. They run periodically (every 15 minutes) and store results in the new agent_outputs table with per-agent TTLs.

Initial agent set:

Agent ID TTL
OverdueTaskAgent overdue-task 1h
MomentumAgent momentum 6h
TimeOfDayAgent time-of-day 15m
RecentPatternsAgent recent-patterns 24h
FocusAreaAgent focus-area 12h

Orchestrator agent (real-time)

When a user requests a tip, the TypeScript recommender:

  1. Fetches all non-expired agent_outputs rows for the user.
  2. Calls POST /recommend on ml/serving with the snippet list.
  3. ml/serving assembles a single orchestrator prompt (template v4-orchestrator) that concatenates all snippets, then calls LiteLLM via the existing tip-generator alias to produce one tip.

No bandit scoring. No reward delivery to an ML model. The LLM receives full context and generates the tip in one call.

Feedback

tipFeedback rows are still written on every user reaction. inferReward() still runs and rewardMilli is logged for observability and potential future supervised learning. Reactions are not delivered to an ML endpoint.

New data model

CREATE TABLE agent_outputs (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL REFERENCES users(id),
  agent_id TEXT NOT NULL,          -- e.g. 'overdue-task'
  prompt_text TEXT NOT NULL,       -- snippet produced by the agent
  signals_snapshot TEXT,           -- JSON: inputs the agent consumed
  computed_at TEXT NOT NULL,       -- ISO 8601
  expires_at TEXT NOT NULL,        -- ISO 8601 = computed_at + TTL
  agent_version TEXT NOT NULL      -- bump to invalidate cached outputs on logic changes
);
CREATE INDEX idx_agent_outputs_user_agent_exp
  ON agent_outputs(user_id, agent_id, expires_at DESC);

Consequences

Positive

  • Tips are explainable: featuresJson in tipScores records which agents contributed.
  • Cold-start is eliminated: the orchestrator reasons from signals immediately, no warm-up.
  • Adding or removing an agent is a self-contained change in ml/agents/.
  • Swapping LLM models remains a config change (LiteLLM alias unchanged).

Negative / risks

  • No automatic exploration. The bandit would discover that a user prefers certain tip types without being told. The orchestrator only knows what the agents tell it. Mitigation: agents can evolve to encode richer signals; offline evaluation via the existing bench scripts remain available.
  • Scheduler dependency. If the pre-compute job falls behind, agent outputs go stale. Mitigation: the orchestrator falls back to raw signal prompt when no outputs exist; TimeOfDayAgent recomputes every 15 min to stay fresh.
  • Higher per-request token cost. The orchestrator prompt is longer than the old bandit prompt. Mitigation: the tip-generator alias points to a small local model; token cost is negligible at current scale.

Migration sequence

See plan document in conversation context. 10 steps; each independently deployable and rollback-able. Cutover is Step 6 (single TypeScript PR). Bandit endpoints removed in Step 7 after 48h clean traffic.