feat(api): orchestrator cutover — replace bandit with multi-agent pipeline (ADR-0013 step 6)

POST /recommend now calls ml/serving /recommend with pre-computed agent snippets + task context instead of /generate + /score/egreedy/v2. Falls back to a random signal candidate when ml/serving is unavailable. Removes: remotePolicy, fetchLlmCandidates, sendRewardWithRetry, candidateCache, pickPromptVersion. Feedback handler keeps inferReward + tipFeedback writes for observability; reward delivery to the bandit is gone. tipScores.policy is now 'orchestrator'; promptVersion is 'v4-orchestrator'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:37:15 +00:00
parent 7e958a779d
commit c65bedcf68
4 changed files with 117 additions and 402 deletions
--- a/services/api/src/test/db.ts
+++ b/services/api/src/test/db.ts
@@ -131,6 +131,17 @@ export function makeTestDb(): DrizzleDb & { rawSqlite: BetterSqlite3Database } {
      finished_at TEXT
    );

+    CREATE TABLE IF NOT EXISTS agent_outputs (
+      id TEXT PRIMARY KEY,
+      user_id TEXT NOT NULL REFERENCES users(id),
+      agent_id TEXT NOT NULL,
+      prompt_text TEXT NOT NULL,
+      signals_snapshot TEXT,
+      computed_at TEXT NOT NULL,
+      expires_at TEXT NOT NULL,
+      agent_version TEXT NOT NULL
+    );
+
    CREATE TABLE IF NOT EXISTS sim_events (
      id TEXT PRIMARY KEY,
      run_id TEXT NOT NULL REFERENCES sim_runs(id),