Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
43 lines
2.5 KiB
Markdown
43 lines
2.5 KiB
Markdown
# ml/
|
||
|
||
Python. Owns models, features, training, online scoring.
|
||
|
||
| Dir | Role | Phase |
|
||
|---|---|---|
|
||
| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`) + JetStream consumers for `signals.>` / `feedback.>`, called by `recommender` | 1–2 |
|
||
| `features/` | context assembler (`context.py`): signals → `PromptContext`; profile-feature schema mirror (`profile_schema.py`); Feast adapter later | 2 |
|
||
| `pipelines/` | batch feature + training scripts | 4 |
|
||
| `registry/` | MLflow-backed model registry integration | 4 |
|
||
| `experiments/` | A/B assignment + multi-armed bandit policies | 4 |
|
||
| `notebooks/` | research; never imported by production code | — |
|
||
|
||
## Principles
|
||
|
||
- Every model has a **model card** in `registry/` describing inputs, offline metrics, fairness checks, and rollout history.
|
||
- Online inference must be stateless and < 50ms p99.
|
||
- Training reads from the offline feature store; serving reads from the online feature store; definitions are shared (no train/serve skew).
|
||
- Shadow deploys before any policy change that affects real users.
|
||
|
||
## Feature contract
|
||
|
||
### Profile features (batched)
|
||
|
||
User-level features (completion rate, preferred hour, tip volume…) are computed
|
||
by the TypeScript recommender and shipped to `ml/serving` on every `/score` and
|
||
`/generate` call as `profile_features: dict | None`. The Python mirror in
|
||
`features/profile_schema.py` documents each feature's name, dtype, TTL, source,
|
||
and null fallback — keep it in sync with `services/api/src/profile/registry.ts`
|
||
(a CI-style test asserts names and `ttlSec` values match). See ADR-0011.
|
||
|
||
### Context features (JIT)
|
||
|
||
Request-time signals assembled by `features/context.py` (`hour_of_day`,
|
||
`day_of_week`, task list). These are never cached — they are derived from the
|
||
system clock and the live Todoist feed at the moment of the score call.
|
||
`CONTEXT_FEATURES` in `context.py` declares freshness, source, and fallback for
|
||
each field (issue #61).
|
||
|
||
## Prompt registry
|
||
|
||
`serving/prompts.py` keys tip-generation prompts by stable version string. Adding a new variant means adding an entry — no caller changes. Selection precedence: `POST /generate` body's `prompt_version` field → env `DEFAULT_PROMPT_VERSION` → `"v1"`. The TypeScript recommender drives selection via `TIP_PROMPT_VERSION` (single value or comma-separated rotation); the version actually used flows back in the response and is persisted to `tip_scores.prompt_version` so the admin reward-analytics dashboard can bucket reactions per variant.
|