# ml/ Python. Owns models, features, training, online scoring. | Dir | Role | Phase | |---|---|---| | `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`), called by `recommender` | 1–2 | | `features/` | context assembler (`context.py`): signals → `PromptContext`; Feast adapter later | 2 | | `pipelines/` | batch feature + training DAGs (Prefect/Airflow) | 4 | | `registry/` | MLflow-backed model registry integration | 4 | | `experiments/` | A/B assignment + multi-armed bandit policies | 4 | | `notebooks/` | research; never imported by production code | — | ## Principles - Every model has a **model card** in `registry/` describing inputs, offline metrics, fairness checks, and rollout history. - Online inference must be stateless and < 50ms p99. - Training reads from the offline feature store; serving reads from the online feature store; definitions are shared (no train/serve skew). - Shadow deploys before any policy change that affects real users. ## Prompt registry `serving/prompts.py` keys tip-generation prompts by stable version string. Adding a new variant means adding an entry — no caller changes. Selection precedence: `POST /generate` body's `prompt_version` field → env `DEFAULT_PROMPT_VERSION` → `"v1"`. The TypeScript recommender drives selection via `TIP_PROMPT_VERSION` (single value or comma-separated rotation); the version actually used flows back in the response and is persisted to `tip_scores.prompt_version` so the admin reward-analytics dashboard can bucket reactions per variant.