diff --git a/CLAUDE.md b/CLAUDE.md index f3ac970..081c9f3 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -100,7 +100,7 @@ Ollama and LiteLLM are **shared Agap services**, not oO services — they live i **M1 shipped. M2 (AI tips) in progress.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`. -Active work: AI tip generation pipeline — issues #86–#93 in M2 milestone. +Active work: bandit promotion (#99 — offline sim + ADR-0012 pending) and M2 issues (#54 schema registry, #61 freshness SLAs, #78 signal abstraction, #93 model benchmark). ## What NOT to do diff --git a/ml/README.md b/ml/README.md index a45f15f..e34d2aa 100644 --- a/ml/README.md +++ b/ml/README.md @@ -4,7 +4,7 @@ Python. Owns models, features, training, online scoring. | Dir | Role | Phase | |---|---|---| -| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`), called by `recommender` | 1–2 | +| `serving/` | FastAPI online scorer (`/score`, `/generate`) + LiteLLM gateway + prompt registry (`prompts.py`) + JetStream consumers for `signals.>` / `feedback.>`, called by `recommender` | 1–2 | | `features/` | context assembler (`context.py`): signals → `PromptContext`; Feast adapter later | 2 | | `pipelines/` | batch feature + training DAGs (Prefect/Airflow) | 4 | | `registry/` | MLflow-backed model registry integration | 4 | diff --git a/ml/serving/README.md b/ml/serving/README.md new file mode 100644 index 0000000..729ff3d --- /dev/null +++ b/ml/serving/README.md @@ -0,0 +1,94 @@ +# ml/serving + +FastAPI online scorer, tip generator, and JetStream consumer. + +## Contract + +| Endpoint | Description | +|----------|-------------| +| `POST /score` | LinUCB d=5 (baseline, shadow-eligible) | +| `POST /score/egreedy` | ε-greedy v1, d=7 (active policy — ADR-0007) | +| `POST /score/egreedy/v2` | ε-greedy v2, d=12 + profile features (shadow — ADR-0012) | +| `POST /reward` / `/reward/egreedy` / `/reward/egreedy/v2` | Online reward update per policy | +| `POST /generate` | LLM tip candidates via LiteLLM `tip-generator` alias | +| `GET /stats/{user_id}` / `/stats/egreedy/{user_id}` / `/stats/egreedy/v2/{user_id}` | Per-user policy stats | +| `GET /features/{user_id}` | Last 100 scored feature vectors (ring buffer) | +| `POST /reset/{user_id}` | Clear all per-user bandit state (admin) | +| `GET /health` | `{ ok, nats: { enabled, consumers: { signals, feedback } } }` | + +Called by `services/api/src/recommender/` over HTTP. Contract is stable across policy swaps. + +## Feature dimensions + +| Policy | d | Extra dims vs previous | +|--------|---|------------------------| +| LinUCB v1 | 5 | hour_sin/cos, is_overdue, task_age, priority | +| ε-greedy v1 | 7 | + dow_sin/cos | +| ε-greedy v2 | 12 | + 5 profile features (ADR-0012) | + +Profile features are computed by the TypeScript API and shipped on each `/score` call as `profile_features`. See `ml/README.md` and ADR-0011. + +## JetStream consumers + +On startup, `nats_consumer.py` registers two durable push consumers against NATS JetStream: + +| Consumer | Stream | Subjects | Durable name | +|----------|--------|----------|--------------| +| signals | `signals` | `signals.>` | `feature-pipeline-signals` | +| feedback | `feedback` | `feedback.>` | `feature-pipeline-feedback` | + +**Handled subjects:** +- `signals.task.synced` — writes `{last_sync_ts, task_count}` to `{STATE_DIR}/{user}_sync.json` +- `signals.tip.feedback` — logged for observability; reward update happens via the HTTP path in the recommender + +**Ack semantics:** explicit ack on success; nak for redelivery on error; dead-lettered after `NATS_MAX_DELIVER` attempts. + +**Disabled** when `NATS_URL` is unset (default in local dev without NATS). No import of `nats-py` occurs in that case. + +## Config + +| Env var | Default | Description | +|---------|---------|-------------| +| `STATE_DIR` | `/tmp/oo-bandit-state` | Directory for per-user bandit state JSON files | +| `LITELLM_URL` | `http://localhost:4000` | LiteLLM gateway | +| `LITELLM_MASTER_KEY` | `sk-oo-dev` | LiteLLM auth key | +| `NATS_URL` | `` | NATS broker URL; empty = consumers disabled | +| `NATS_DURABLE_PREFIX` | `feature-pipeline` | Prefix for durable consumer names | +| `NATS_MAX_DELIVER` | `5` | Max redelivery attempts before dropping | +| `DEFAULT_PROMPT_VERSION` | `v1` | Fallback prompt version for `/generate` | + +## Health story + +`GET /health` returns `{ ok: true }` plus NATS consumer state: + +```json +{ + "ok": true, + "nats": { + "enabled": true, + "consumers": { + "signals": { "last_msg_ts": "2026-04-25T10:00:00Z", "processed": 42, "errors": 0 }, + "feedback": { "last_msg_ts": null, "processed": 0, "errors": 0 } + } + } +} +``` + +`last_msg_ts` is `null` until the first message arrives. Used by docker-compose healthcheck. + +## Extraction criteria + +Extract to its own process (already is one). Extract to a dedicated host / GPU node when: +- p99 scoring latency exceeds 50 ms under load, **or** +- model weights are too large to share memory with the Python process on the current host. + +## State + +Per-user bandit state is stored as JSON files in `STATE_DIR`: + +| File pattern | Policy | +|---|---| +| `{user}.json` | LinUCB v1 | +| `{user}_egreedy.json` | ε-greedy v1 | +| `{user}_egreedy_v2.json` | ε-greedy v2 | +| `{user}_sync.json` | Last task sync metadata (written by JetStream consumer) |