docs: mark M2 complete in README

All M2 items shipped: ADR-0014 (unified profile + inference framework),
per-agent auto-inference, tip generator, TipCandidate schema, prompt
versioning, model benchmark, task clustering, UX refinements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-06 08:02:44 +00:00
parent c67f2b14c4
commit 488a764519

View File

@@ -194,7 +194,7 @@ oO is ML-heavy. Without a cockpit, every model change ships blind. This console
15. [x] **Token-based admin auth**`POST /api/auth/token` for Playwright/CI; `ADMIN_TOKEN` env var (#105)
16. [x] **Docs pages** — admin documentation and runbooks inline
### Phase 2 — AI tips + multi-source signals *(M2)* in progress
### Phase 2 — AI tips + multi-source signals *(M2)* ✓ shipped
Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.
**Architectural shift (mid-M2):** the bandit-ranks-LLM-candidates design from earlier in M2 was replaced with a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM produces the tip directly. ADR-0014 layers a unified Profile + agent registry + auto-inference framework on top so the system generalizes cleanly to N agents.
@@ -206,26 +206,26 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi
- [x] Orchestrator cutover — recommender calls `ml/serving` with snippet list, no bandit scoring
- [x] Bandit endpoints + shadow policy machinery removed
**Unified Profile + agent registry (ADR-0014, in progress):**
- [ ] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30)
- [ ] Shared context-inference framework (#111)
- [ ] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116)
**Unified Profile + agent registry (ADR-0014, shipped):**
- [x] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30)
- [x] Shared context-inference framework (#111)
- [x] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116)
**AI infrastructure (unblock everything else):**
- [ ] `ai` compose profile — Ollama + LiteLLM for local dev; env vars `OLLAMA_URL` / `LITELLM_URL` (#86)
- [ ] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87)
- [x] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87)
**AI tip generation pipeline:**
- [x] Context assembler — user signals + feature store → structured prompt context (`ml/features/context.py`); skeleton implemented
- [ ] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79)
- [ ] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89)
- [x] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79)
- [x] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89)
- [ ] LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
- [ ] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91)
- [x] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91)
- [x] LLM tip quality dashboard — reaction breakdown by model / prompt_version in `/admin/reward-analytics` (#92)
**Evaluation & model selection:**
- [ ] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
- [ ] LLM prompt research — persona design, context injection strategies, few-shot examples (#84)
- [x] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
- [x] LLM prompt research — persona design, context injection strategies, few-shot examples (#84, #95)
**Pipeline architecture:**
- [x] Signal source abstraction — `SignalSource` interface for Todoist + extensible design (#78)
@@ -241,7 +241,8 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi
- [x] NATS JetStream replacing in-process bus (#21) — adapter ships in `services/api/src/events/nats.ts`; in-proc bus is the producer, JetStream is the durable mirror
- [x] Todoist sync via events (#22) — background scheduler in `services/api/src/signals/scheduler.ts` emits `signals.task.synced` every `TODOIST_SYNC_INTERVAL_MS`; on-demand fetch remains as freshness fallback
- [x] Event schema registry + protobuf CI gate (#54) — buf lint/breaking checks on every PR
- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; CONTEXT_FEATURES in ml/features/context.py
- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; `invalidated_by` mirrored into `ProfileFeature`; CONTEXT_FEATURES in ml/features/context.py
- [x] Embedding-based task clustering — `nomic-embed-text` for semantic dedup + focus-area features (#97)
- [x] Observability (#18) — structured logs via pino, W3C trace IDs, Sentry hooks, trace correlation end-to-end
- [ ] CI skeleton (#3), E2E tests (#20)
@@ -251,7 +252,7 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi
- [x] Reward fire-and-forget (#75) — retry logic + logging
- [x] Data retention purge (#76) — daily purge of 30-day-old tip_scores/tip_feedback
- [x] Port mismatch (#77) — fixed in docker-compose + env var config
- [ ] UX refinements (#100102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button
- [x] UX refinements (#100102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button
### Phase 3 — Native mobile *(M3)*
- [ ] iOS app (SwiftUI) with APNs push