From 488a7645190ec5cf36c015b07a5c5840d6c31457 Mon Sep 17 00:00:00 2001 From: alvis Date: Wed, 6 May 2026 08:02:44 +0000 Subject: [PATCH] docs: mark M2 complete in README All M2 items shipped: ADR-0014 (unified profile + inference framework), per-agent auto-inference, tip generator, TipCandidate schema, prompt versioning, model benchmark, task clustering, UX refinements. Co-Authored-By: Claude Sonnet 4.6 --- README.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 6562b52..ba21ec2 100644 --- a/README.md +++ b/README.md @@ -194,7 +194,7 @@ oO is ML-heavy. Without a cockpit, every model change ships blind. This console 15. [x] **Token-based admin auth** — `POST /api/auth/token` for Playwright/CI; `ADMIN_TOKEN` env var (#105) 16. [x] **Docs pages** — admin documentation and runbooks inline -### Phase 2 — AI tips + multi-source signals *(M2)* in progress +### Phase 2 — AI tips + multi-source signals *(M2)* ✓ shipped Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone. **Architectural shift (mid-M2):** the bandit-ranks-LLM-candidates design from earlier in M2 was replaced with a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM produces the tip directly. ADR-0014 layers a unified Profile + agent registry + auto-inference framework on top so the system generalizes cleanly to N agents. @@ -206,26 +206,26 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi - [x] Orchestrator cutover — recommender calls `ml/serving` with snippet list, no bandit scoring - [x] Bandit endpoints + shadow policy machinery removed -**Unified Profile + agent registry (ADR-0014, in progress):** -- [ ] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30) -- [ ] Shared context-inference framework (#111) -- [ ] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116) +**Unified Profile + agent registry (ADR-0014, shipped):** +- [x] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30) +- [x] Shared context-inference framework (#111) +- [x] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116) **AI infrastructure (unblock everything else):** - [ ] `ai` compose profile — Ollama + LiteLLM for local dev; env vars `OLLAMA_URL` / `LITELLM_URL` (#86) -- [ ] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87) +- [x] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87) **AI tip generation pipeline:** - [x] Context assembler — user signals + feature store → structured prompt context (`ml/features/context.py`); skeleton implemented -- [ ] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79) -- [ ] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89) +- [x] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79) +- [x] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89) - [ ] LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90) -- [ ] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91) +- [x] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91) - [x] LLM tip quality dashboard — reaction breakdown by model / prompt_version in `/admin/reward-analytics` (#92) **Evaluation & model selection:** -- [ ] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93) -- [ ] LLM prompt research — persona design, context injection strategies, few-shot examples (#84) +- [x] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93) +- [x] LLM prompt research — persona design, context injection strategies, few-shot examples (#84, #95) **Pipeline architecture:** - [x] Signal source abstraction — `SignalSource` interface for Todoist + extensible design (#78) @@ -241,7 +241,8 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi - [x] NATS JetStream replacing in-process bus (#21) — adapter ships in `services/api/src/events/nats.ts`; in-proc bus is the producer, JetStream is the durable mirror - [x] Todoist sync via events (#22) — background scheduler in `services/api/src/signals/scheduler.ts` emits `signals.task.synced` every `TODOIST_SYNC_INTERVAL_MS`; on-demand fetch remains as freshness fallback - [x] Event schema registry + protobuf CI gate (#54) — buf lint/breaking checks on every PR -- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; CONTEXT_FEATURES in ml/features/context.py +- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; `invalidated_by` mirrored into `ProfileFeature`; CONTEXT_FEATURES in ml/features/context.py +- [x] Embedding-based task clustering — `nomic-embed-text` for semantic dedup + focus-area features (#97) - [x] Observability (#18) — structured logs via pino, W3C trace IDs, Sentry hooks, trace correlation end-to-end - [ ] CI skeleton (#3), E2E tests (#20) @@ -251,7 +252,7 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi - [x] Reward fire-and-forget (#75) — retry logic + logging - [x] Data retention purge (#76) — daily purge of 30-day-old tip_scores/tip_feedback - [x] Port mismatch (#77) — fixed in docker-compose + env var config -- [ ] UX refinements (#100–102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button +- [x] UX refinements (#100–102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button ### Phase 3 — Native mobile *(M3)* - [ ] iOS app (SwiftUI) with APNs push