feat(ml): prompt registry + per-request variant selection

Replaces the hardcoded "v1" label with a real prompt registry: ml/serving/prompts.py — keyed by version: v1 (baseline), v2-mentor (calm/specific persona), v3-few-shot (v1 persona + curated examples) ml/serving/main.py — POST /generate accepts optional prompt_version, 422 on unknown, echoes the version actually used back in the response services/api/src/config.ts — TIP_PROMPT_VERSION: empty / single / comma-list (uniform random per request) services/api/src/routes/recommender.ts — pickPromptVersion() drives selection; the response's prompt_version (not a stale TS constant) is what lands in tip_scores so the #92 reward-analytics dashboard shows real per-variant reaction rates Closes #84. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 15:44:04 +00:00
parent aa4bdd8f09
commit 430804e9a5
9 changed files with 294 additions and 44 deletions
--- a/ml/README.md
+++ b/ml/README.md
@@ -4,7 +4,7 @@ Python. Owns models, features, training, online scoring.

 | Dir | Role | Phase |
 |---|---|---|
-| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway, called by `recommender` | 1–2 |
+| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`), called by `recommender` | 1–2 |
 | `features/` | context assembler (`context.py`): signals → `PromptContext`; Feast adapter later | 2 |
 | `pipelines/` | batch feature + training DAGs (Prefect/Airflow) | 4 |
 | `registry/` | MLflow-backed model registry integration | 4 |
@@ -17,3 +17,7 @@ Python. Owns models, features, training, online scoring.
 - Online inference must be stateless and < 50ms p99.
 - Training reads from the offline feature store; serving reads from the online feature store; definitions are shared (no train/serve skew).
 - Shadow deploys before any policy change that affects real users.
+
+## Prompt registry
+
+`serving/prompts.py` keys tip-generation prompts by stable version string. Adding a new variant means adding an entry — no caller changes. Selection precedence: `POST /generate` body's `prompt_version` field → env `DEFAULT_PROMPT_VERSION` → `"v1"`. The TypeScript recommender drives selection via `TIP_PROMPT_VERSION` (single value or comma-separated rotation); the version actually used flows back in the response and is persisted to `tip_scores.prompt_version` so the admin reward-analytics dashboard can bucket reactions per variant.