docs(readme): replace inline issue checklists with Gitea milestone links

Roadmap phase sections now show shipped summaries only; open work lives in Gitea milestones. Eliminates duplicate source-of-truth between README and issue tracker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 15:34:45 +00:00
parent 8fd08379d7
commit b1bd3d465f
1 changed files with 9 additions and 151 deletions
--- a/README.md
+++ b/README.md
@@ -121,173 +121,31 @@ All model calls route through **LiteLLM** at `llm.alogins.net` (or `LITELLM_URL`

 ## Roadmap

+Issues and open work are tracked in [Gitea milestones](http://localhost:3000/alvis/oO/milestones). Pick an issue, check its milestone (= phase), read the service's `README.md`, ship.
+
 ### Phase 0 — Walking skeleton  *(M0)* ✓ shipped
-Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.
- [x] Monorepo scaffold, docker-compose dev env
- [x] `auth` — Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guard
- [x] `integrations/todoist` — OAuth2 flow, token stored in DB, disconnect supported
- [x] `recommender` with `RandomPolicy`; stable `POST /recommend` contract; 30s task cache
- [x] `apps/web` — sign-in, connect, tip pages; PWA manifest + icons
- [x] Feedback: `done / snooze / dismiss`; reward inferred from dwell-time (`inferReward`); marks task complete in Todoist
- [x] Deploy modular monolith to Agap VM via Caddy at `o.alogins.net`
- [x] ToS + Privacy Policy pages (`/legal/terms`, `/legal/privacy`); implicit consent on sign-in
- [x] Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
- [x] Metrics baseline: `tip_views` table (tip served) + `tip_feedback` (reactions) — activation + reaction rate queryable
+Single user signs in with Google, connects Todoist, sees one random task on a black page. Deletion works. Auth, integrations, recommender stub, PWA, feedback loop, ToS/privacy, metrics baseline.

 ### Phase 1 — Real signal + in-the-moment delivery  *(M1)* ✓ shipped
-Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.
- [x] Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
- [x] Todoist sync emits `signals.task.synced`; tip served/feedback emit `signals.tip.*`
- [x] Features extracted per task: `is_overdue`, `task_age_days`, `priority`; context: `hour_of_day`, `day_of_week`
- [x] **ε-greedy v1** (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk
- [x] **ε-greedy v2** (d=12, profile features: completion rate, dismiss rate, dwell, preferred hour, tip volume) in shadow; promoted to active policy (ADR-0012)
- [x] `RemotePolicy` in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to `tip_scores`
- [x] Feedback loop: dwell-time inferred reward (`inferReward`) → online model update; `done` in 15 s–2 min = +1.0 (magic zone)
- [x] Offline simulation framework (`ml/experiments/sim`): rule/LLM/claude-code judges, two-policy comparison, results persisted to `sim_runs` + `sim_events`
- [x] **Web Push** (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
- [x] Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
- [x] NATS JetStream bridge — durable `signals.>` and `feedback.>` streams; in-process bus stays the source of truth, every publish bridges out (#21, shipped)
- [x] Per-user profile features (completion rate, dismiss rate, dwell, preferred hour, tip volume) — event-driven, JIT invalidation (#81)
- [ ] Quiet-hours + dedupe for push delivery
- [ ] Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
- [ ] Apple OAuth (deferred to M3)
-
-#### M1 add-on — Admin & ML Ops Console  *(fully shipped)*
-
-oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to *act* on them (revoke a token, replay an event, promote a model, reset a bandit).
-
-**Framework pick — `apps/admin` on Next.js 15 + Tremor + shadcn/ui.**  Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses `packages/shared-types`, `sdk-js`, and the Auth.js session. Specialized ML tooling (MLflow) runs as a **separate external service** linked from the admin shell; Grafana panels are embedded.
-
-| Layer | Tool | Why |
-|-------|------|-----|
-| App shell | **Next.js 15** (new `apps/admin`) | Same stack as `apps/web`; reuses auth, types, SDK |
-| Dashboards / charts | **[Tremor](https://tremor.so)** | Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps |
-| CRUD primitives | **[shadcn/ui](https://ui.shadcn.com)** | Copy-paste Radix components; forms, dialogs, command palette |
-| Heavy grids | **[TanStack Table v8](https://tanstack.com/table)** | Sortable / paginated / virtualized tables (events, users, tips) |
-| Extra charts | **[Recharts](https://recharts.org)** / **[visx](https://airbnb.io/visx)** | Fallbacks where Tremor falls short (e.g. force graphs, Sankey) |
-| Model registry / experiments | **[MLflow](https://mlflow.org)** *(external — `o.alogins.net/mlflow`)* | Experiment tracking, artifact browser, model registry; own basic-auth |
-| Infra metrics | **[Grafana](https://grafana.com)** *(embedded panels)* | One ops source of truth |
-| Ad-hoc analysis | **[Marimo](https://marimo.io)** reactive notebooks | Python-native for the ML side; launch-out link |
-| AuthZ | `profile.role='admin'` + Next.js middleware | Reuses existing session; no new auth surface |
-
-**Rejected alternatives (so we don't re-litigate):**
- *Retool / AppSmith* — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
- *Streamlit / Gradio / Dash* — Python-first; thin RBAC and routing; splits our frontend stack in two
- *React-admin / Refine.dev* — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
- *Superset / Metabase as the admin surface* — excellent for BI, poor for operational **writes** (revoke, replay, promote). Plan: **adopt Superset in M4** for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now
-
-**Build sequence:**
-1. [x] **ADR-0006** — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
-2. [x] **Scaffold** — `apps/admin` with Next.js 15, Tailwind, Tremor; deploy behind Caddy at `admin.o.alogins.net`
-3. [x] **RBAC** — `role` column on `users`; admin-only Next.js middleware; seed first admin via `ADMIN_SEED_EMAIL` env; `admin_actions` audit-log table
-4. [x] **Overview dashboard** — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
-5. [x] **User explorer** — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit + rebuild-profile actions
-6. [x] **Event stream viewer** — live tail of `signals.*` with filters by subject/user/time; same UI when the bus swaps to NATS
-7. [x] **Features page** — features sent to `ml/serving` per scoring call; per-user profile features with freshness; diff across time
-8. [x] **Tips page** — tips served, scored, feedback reactions with policy/model breakdown
-9. [x] **Reward analytics** — reaction distribution over time; per-policy / per-model / per-prompt-version compare; slice by `hour_of_day`, `priority`, cohort
-10. [x] **Data quality widget** — missing-feature rate, stale-token rate, daily completeness heatmap; per-feature freshness SLA status
-11. [x] **Ops actions** — revoke token (Users page), rebuild profile, reset bandit, enable/disable shadow policies; every action audit-logged
-12. [x] **Health rollup** — `/admin/health` surfaces api, ml/serving, SQLite, event-bus, MLflow; auto-refreshes every 15s
-13. [x] **Read-only SQL runner** — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
-14. [x] **Offline simulation runner** — launch `ml/experiments/sim` from admin UI; track sim runs, judge, policy comparison
-15. [x] **Token-based admin auth** — `POST /api/auth/token` for Playwright/CI; `ADMIN_TOKEN` env var (#105)
-16. [x] **Docs pages** — admin documentation and runbooks inline
+Tips are picked, not drawn from a hat. Event bus, Todoist sync, task features, ε-greedy policy (v1 + v2), web push, NATS JetStream bridge, shadow-policy registry, offline sim framework, per-user profile features, admin + ML ops console (`apps/admin`).

 ### Phase 2 — AI tips + multi-source signals  *(M2)* ✓ shipped
-Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.
-
-**Architectural shift (mid-M2):** the bandit-ranks-LLM-candidates design from earlier in M2 was replaced with a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM produces the tip directly. ADR-0014 layers a unified Profile + agent registry + auto-inference framework on top so the system generalizes cleanly to N agents.
-
-**Multi-agent recommendation (ADR-0013, shipped):**
- [x] `agent_outputs` table + per-agent TTL caching
- [x] Five initial agents: `overdue-task`, `momentum`, `time-of-day`, `recent-patterns`, `focus-area`
- [x] Agent pre-compute scheduler
- [x] Orchestrator cutover — recommender calls `ml/serving` with snippet list, no bandit scoring
- [x] Bandit endpoints + shadow policy machinery removed
-
-**Unified Profile + agent registry (ADR-0014, shipped):**
- [x] Unified Profile model: prefs, contexts, consents + manifest plumbing + orchestrator cutover (#30)
- [x] Shared context-inference framework (#111)
- [x] Per-agent auto-inference: `time-of-day` (#112), `focus-area` (#113), `momentum` (#114), `overdue-task` (#115), `recent-patterns` (#116)
-
-**AI infrastructure (unblock everything else):**
- [x] `ai` compose profile — Ollama + LiteLLM for local dev; env vars `OLLAMA_URL` / `LITELLM_URL` (#86)
- [x] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87)
-
-**AI tip generation pipeline:**
- [x] Context assembler — user signals + feature store → structured prompt context (`ml/features/context.py`); skeleton implemented
- [x] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79)
- [x] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89)
- [x] LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to hardcoded tips on AI failure (#90)
- [x] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91)
- [x] LLM tip quality dashboard — reaction breakdown by model / prompt_version in `/admin/reward-analytics` (#92)
-
-**Evaluation & model selection:**
- [x] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
- [x] LLM prompt research — persona design, context injection strategies, few-shot examples (#84, #95)
-
-**Pipeline architecture:**
- [x] Signal source abstraction — `SignalSource` interface for Todoist + extensible design (#78)
- [x] Generalized recommendation pipeline — superseded by ADR-0013; multi-agent orchestrator is the pipeline (#80)
- [x] Feature registry + user profile builder — centralized features, persistent profiles, event-driven invalidation (#81)
- [ ] Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)
-
-**Policy research:**
- [ ] Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)
-
-**Integrations & infra (carried from M1):**
- [ ] Apple OAuth (#7)
- [x] NATS JetStream replacing in-process bus (#21) — adapter ships in `services/api/src/events/nats.ts`; in-proc bus is the producer, JetStream is the durable mirror
- [x] Todoist sync via events (#22) — background scheduler in `services/api/src/signals/scheduler.ts` emits `signals.task.synced` every `TODOIST_SYNC_INTERVAL_MS`; on-demand fetch remains as freshness fallback
- [x] Event schema registry + protobuf CI gate (#54) — buf lint/breaking checks on every PR
- [x] Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; `invalidated_by` mirrored into `ProfileFeature`; CONTEXT_FEATURES in ml/features/context.py
- [x] Embedding-based task clustering — `nomic-embed-text` for semantic dedup + focus-area features (#97)
- [x] Observability (#18) — structured logs via pino, W3C trace IDs, Sentry hooks, trace correlation end-to-end
- [ ] CI skeleton (#3), E2E tests (#20)
-
-**Bugs & UX (fix before new features):**
- [x] TipFeedback type mismatch (#73)
- [x] Todoist token refresh (#74) — OAuth token auto-refresh on 401
- [x] Reward fire-and-forget (#75) — retry logic + logging
- [x] Data retention purge (#76) — daily purge of 30-day-old tip_scores/tip_feedback
- [x] Port mismatch (#77) — fixed in docker-compose + env var config
- [x] UX refinements (#100–102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button
+Tips are AI-generated from user context. Multi-agent pipeline (ADR-0013): five pre-compute agents (`overdue-task`, `momentum`, `time-of-day`, `recent-patterns`, `focus-area`) emit prompt snippets; orchestrator LLM produces one tip. Unified Profile + agent registry + auto-inference framework (ADR-0014). LLM output validation + fallback. LiteLLM gateway, model benchmarking, prompt research, MLflow tracing.

 ### Phase 3 — Native mobile  *(M3)*
- [ ] iOS app (SwiftUI) with APNs push
- [ ] Android app (Compose) with FCM push
- [ ] `notifier` gains APNs + FCM channels, per-device rate limits
- [ ] Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
- [ ] Consolidate MLflow behind shared OIDC (SSO for all internal services)
- [ ] Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold
+iOS (SwiftUI + APNs) and Android (Compose + FCM). `notifier` service gains APNs + FCM channels. Auth migrated from Auth.js to dedicated OIDC provider. Decide-and-deliver scheduler. See [M3 milestone](http://localhost:3000/alvis/oO/milestone/3).

 ### Phase 4 — MLOps at scale  *(M4)*
- [x] MLflow deployed as external service (`mlops` compose profile); own auth; health check integrated
- [ ] Write first retraining pipeline + first MLflow experiment logging from `ml/serving` + JetStream consumers (#98)
- [ ] Feature-to-prompt pipeline — nightly batch job materializes context for LLM; cuts inline latency (#94)
- [ ] Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
- [ ] LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
- [ ] Embedding-based task clustering — `nomic-embed-text` for dedup + user pattern features (#97)
- [ ] Modular-monolith packaging + import-boundary lint (#47)
- [ ] Consolidate MLflow auth into shared OIDC provider (tracked as M3 issue #85)
- [ ] Shadow → A/B → launch pipeline as first-class in MLflow
- [ ] Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
- [ ] Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
- [ ] Drift monitoring (feature + prediction + reward drift); model cards per LLM version
+Retraining pipeline, feature-to-prompt batch jobs, prompt optimization loop, LLM fine-tuning on reaction signals, modular-monolith import-boundary lint, online experiments framework, drift monitoring. See [M4 milestone](http://localhost:3000/alvis/oO/milestone/4).

 ### Phase 5 — Production hardening  *(M5)*
- [ ] Audit logging, rotation of provider tokens + internal signing keys
- [ ] **k3s** on existing VM, then k8s + HPA once multi-node justified (no cliff)
- [ ] Multi-region failover, Postgres PITR, event-bus mirroring
- [ ] Public integration SDK; sandbox tenancy for third-party connectors
- [ ] Billing + subscription tiers
+Audit logging, key rotation, k3s → k8s, multi-region, public integration SDK, billing. See [M5 milestone](http://localhost:3000/alvis/oO/milestone/5).

 ---

 ## Contributing

-This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's `README.md`, ship.
+This repo is split into independent modules; most tickets belong to exactly one. Pick an issue from [Gitea](http://localhost:3000/alvis/oO/issues), read the service's `README.md`, ship.

 Conventions and per-service guidance live in [`CLAUDE.md`](CLAUDE.md).