From 488a7645190ec5cf36c015b07a5c5840d6c31457 Mon Sep 17 00:00:00 2001
From: alvis <allogn@gmail.com>
Date: Wed, 6 May 2026 08:02:44 +0000
Subject: [PATCH] docs: mark M2 complete in README

All M2 items shipped: ADR-0014 (unified profile + inference framework),
per-agent auto-inference, tip generator, TipCandidate schema, prompt
versioning, model benchmark, task clustering, UX refinements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 README.md | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index 6562b52..ba21ec2 100644
--- a/README.md
+++ b/README.md
@@ -194,7 +194,7 @@ oO is ML-heavy. Without a cockpit, every model change ships blind. This console
 15. [x] **Token-based admin auth** — `POST /api/auth/token` for Playwright/CI; `ADMIN_TOKEN` env var (#105)
 16. [x] **Docs pages** — admin documentation and runbooks inline
 
-### Phase 2 — AI tips + multi-source signals  *(M2)* in progress
+### Phase 2 — AI tips + multi-source signals  *(M2)* ✓ shipped
 Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.
 
 **Architectural shift (mid-M2):** the bandit-ranks-LLM-candidates design from earlier in M2 was replaced with a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM produces the tip directly. ADR-0014 layers a unified Profile + agent registry + auto-inference framework on top so the system generalizes cleanly to N agents.
@@ -206,26 +206,26 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi
 - [x] Orchestrator cutover — recommender calls `ml/serving` with snippet list, no bandit scoring
 - [x] Bandit endpoints + shadow policy machinery removed
 
-**Unified Profile + agent registry (ADR-0014, in progress):**
-- [ ] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30)
-- [ ] Shared context-inference framework (#111)
-- [ ] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116)
+**Unified Profile + agent registry (ADR-0014, shipped):**
+- [x] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30)
+- [x] Shared context-inference framework (#111)
+- [x] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116)
 
 **AI infrastructure (unblock everything else):**
 - [ ] `ai` compose profile — Ollama + LiteLLM for local dev; env vars `OLLAMA_URL` / `LITELLM_URL` (#86)
-- [ ] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87)
+- [x] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87)
 
 **AI tip generation pipeline:**
 - [x] Context assembler — user signals + feature store → structured prompt context (`ml/features/context.py`); skeleton implemented
-- [ ] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79)
-- [ ] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89)
+- [x] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79)
+- [x] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89)
 - [ ] LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
-- [ ] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91)
+- [x] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91)
 - [x] LLM tip quality dashboard — reaction breakdown by model / prompt_version in `/admin/reward-analytics` (#92)
 
 **Evaluation & model selection:**
-- [ ] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
-- [ ] LLM prompt research — persona design, context injection strategies, few-shot examples (#84)
+- [x] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
+- [x] LLM prompt research — persona design, context injection strategies, few-shot examples (#84, #95)
 
 **Pipeline architecture:**
 - [x] Signal source abstraction — `SignalSource` interface for Todoist + extensible design (#78)
@@ -241,7 +241,8 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi
 - [x] NATS JetStream replacing in-process bus (#21) — adapter ships in `services/api/src/events/nats.ts`; in-proc bus is the producer, JetStream is the durable mirror
 - [x] Todoist sync via events (#22) — background scheduler in `services/api/src/signals/scheduler.ts` emits `signals.task.synced` every `TODOIST_SYNC_INTERVAL_MS`; on-demand fetch remains as freshness fallback
 - [x] Event schema registry + protobuf CI gate (#54) — buf lint/breaking checks on every PR
-- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; CONTEXT_FEATURES in ml/features/context.py
+- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; `invalidated_by` mirrored into `ProfileFeature`; CONTEXT_FEATURES in ml/features/context.py
+- [x] Embedding-based task clustering — `nomic-embed-text` for semantic dedup + focus-area features (#97)
 - [x] Observability (#18) — structured logs via pino, W3C trace IDs, Sentry hooks, trace correlation end-to-end
 - [ ] CI skeleton (#3), E2E tests (#20)
 
@@ -251,7 +252,7 @@ Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multi
 - [x] Reward fire-and-forget (#75) — retry logic + logging
 - [x] Data retention purge (#76) — daily purge of 30-day-old tip_scores/tip_feedback
 - [x] Port mismatch (#77) — fixed in docker-compose + env var config
-- [ ] UX refinements (#100–102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button
+- [x] UX refinements (#100–102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button
 
 ### Phase 3 — Native mobile  *(M3)*
 - [ ] iOS app (SwiftUI) with APNs push