docs(ml): serving README + update ml/README and CLAUDE.md for #98

- ml/serving/README.md: new — contract, JetStream consumer docs, config,
  health story, extraction criteria, state file reference
- ml/README.md: note JetStream consumers in serving/ row
- CLAUDE.md: update active work to reflect #98 shipped, #99 still pending

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-25 10:21:40 +00:00
parent 4652e4b582
commit f48b5a7646
3 changed files with 96 additions and 2 deletions

View File

@@ -4,7 +4,7 @@ Python. Owns models, features, training, online scoring.
| Dir | Role | Phase |
|---|---|---|
| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`), called by `recommender` | 12 |
| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`) + JetStream consumers for `signals.>` / `feedback.>`, called by `recommender` | 12 |
| `features/` | context assembler (`context.py`): signals → `PromptContext`; Feast adapter later | 2 |
| `pipelines/` | batch feature + training DAGs (Prefect/Airflow) | 4 |
| `registry/` | MLflow-backed model registry integration | 4 |

94
ml/serving/README.md Normal file
View File

@@ -0,0 +1,94 @@
# ml/serving
FastAPI online scorer, tip generator, and JetStream consumer.
## Contract
| Endpoint | Description |
|----------|-------------|
| `POST /score` | LinUCB d=5 (baseline, shadow-eligible) |
| `POST /score/egreedy` | ε-greedy v1, d=7 (active policy — ADR-0007) |
| `POST /score/egreedy/v2` | ε-greedy v2, d=12 + profile features (shadow — ADR-0012) |
| `POST /reward` / `/reward/egreedy` / `/reward/egreedy/v2` | Online reward update per policy |
| `POST /generate` | LLM tip candidates via LiteLLM `tip-generator` alias |
| `GET /stats/{user_id}` / `/stats/egreedy/{user_id}` / `/stats/egreedy/v2/{user_id}` | Per-user policy stats |
| `GET /features/{user_id}` | Last 100 scored feature vectors (ring buffer) |
| `POST /reset/{user_id}` | Clear all per-user bandit state (admin) |
| `GET /health` | `{ ok, nats: { enabled, consumers: { signals, feedback } } }` |
Called by `services/api/src/recommender/` over HTTP. Contract is stable across policy swaps.
## Feature dimensions
| Policy | d | Extra dims vs previous |
|--------|---|------------------------|
| LinUCB v1 | 5 | hour_sin/cos, is_overdue, task_age, priority |
| ε-greedy v1 | 7 | + dow_sin/cos |
| ε-greedy v2 | 12 | + 5 profile features (ADR-0012) |
Profile features are computed by the TypeScript API and shipped on each `/score` call as `profile_features`. See `ml/README.md` and ADR-0011.
## JetStream consumers
On startup, `nats_consumer.py` registers two durable push consumers against NATS JetStream:
| Consumer | Stream | Subjects | Durable name |
|----------|--------|----------|--------------|
| signals | `signals` | `signals.>` | `feature-pipeline-signals` |
| feedback | `feedback` | `feedback.>` | `feature-pipeline-feedback` |
**Handled subjects:**
- `signals.task.synced` — writes `{last_sync_ts, task_count}` to `{STATE_DIR}/{user}_sync.json`
- `signals.tip.feedback` — logged for observability; reward update happens via the HTTP path in the recommender
**Ack semantics:** explicit ack on success; nak for redelivery on error; dead-lettered after `NATS_MAX_DELIVER` attempts.
**Disabled** when `NATS_URL` is unset (default in local dev without NATS). No import of `nats-py` occurs in that case.
## Config
| Env var | Default | Description |
|---------|---------|-------------|
| `STATE_DIR` | `/tmp/oo-bandit-state` | Directory for per-user bandit state JSON files |
| `LITELLM_URL` | `http://localhost:4000` | LiteLLM gateway |
| `LITELLM_MASTER_KEY` | `sk-oo-dev` | LiteLLM auth key |
| `NATS_URL` | `` | NATS broker URL; empty = consumers disabled |
| `NATS_DURABLE_PREFIX` | `feature-pipeline` | Prefix for durable consumer names |
| `NATS_MAX_DELIVER` | `5` | Max redelivery attempts before dropping |
| `DEFAULT_PROMPT_VERSION` | `v1` | Fallback prompt version for `/generate` |
## Health story
`GET /health` returns `{ ok: true }` plus NATS consumer state:
```json
{
"ok": true,
"nats": {
"enabled": true,
"consumers": {
"signals": { "last_msg_ts": "2026-04-25T10:00:00Z", "processed": 42, "errors": 0 },
"feedback": { "last_msg_ts": null, "processed": 0, "errors": 0 }
}
}
}
```
`last_msg_ts` is `null` until the first message arrives. Used by docker-compose healthcheck.
## Extraction criteria
Extract to its own process (already is one). Extract to a dedicated host / GPU node when:
- p99 scoring latency exceeds 50 ms under load, **or**
- model weights are too large to share memory with the Python process on the current host.
## State
Per-user bandit state is stored as JSON files in `STATE_DIR`:
| File pattern | Policy |
|---|---|
| `{user}.json` | LinUCB v1 |
| `{user}_egreedy.json` | ε-greedy v1 |
| `{user}_egreedy_v2.json` | ε-greedy v2 |
| `{user}_sync.json` | Last task sync metadata (written by JetStream consumer) |