Replaces the hardcoded "v1" label with a real prompt registry:
ml/serving/prompts.py — keyed by version: v1 (baseline),
v2-mentor (calm/specific persona),
v3-few-shot (v1 persona + curated examples)
ml/serving/main.py — POST /generate accepts optional prompt_version,
422 on unknown, echoes the version actually used
back in the response
services/api/src/config.ts — TIP_PROMPT_VERSION: empty / single / comma-list
(uniform random per request)
services/api/src/routes/recommender.ts
— pickPromptVersion() drives selection; the
response's prompt_version (not a stale TS
constant) is what lands in tip_scores so the
#92 reward-analytics dashboard shows real
per-variant reaction rates
Closes #84.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ml/
Python. Owns models, features, training, online scoring.
| Dir | Role | Phase |
|---|---|---|
serving/ |
FastAPI online scorer (/score, /generate) + LiteLLM gateway + prompt registry (prompts.py), called by recommender |
1–2 |
features/ |
context assembler (context.py): signals → PromptContext; Feast adapter later |
2 |
pipelines/ |
batch feature + training DAGs (Prefect/Airflow) | 4 |
registry/ |
MLflow-backed model registry integration | 4 |
experiments/ |
A/B assignment + multi-armed bandit policies | 4 |
notebooks/ |
research; never imported by production code | — |
Principles
- Every model has a model card in
registry/describing inputs, offline metrics, fairness checks, and rollout history. - Online inference must be stateless and < 50ms p99.
- Training reads from the offline feature store; serving reads from the online feature store; definitions are shared (no train/serve skew).
- Shadow deploys before any policy change that affects real users.
Prompt registry
serving/prompts.py keys tip-generation prompts by stable version string. Adding a new variant means adding an entry — no caller changes. Selection precedence: POST /generate body's prompt_version field → env DEFAULT_PROMPT_VERSION → "v1". The TypeScript recommender drives selection via TIP_PROMPT_VERSION (single value or comma-separated rotation); the version actually used flows back in the response and is persisted to tip_scores.prompt_version so the admin reward-analytics dashboard can bucket reactions per variant.