Surfaces phase A's profile features in /admin/users/:id so we can verify they're actually computing useful values before investing in bandit consumption. The detail GET now includes profile rows joined with registry metadata (name, value, age, fresh badge, ttlSec, description). Read does NOT trigger compute — staleness must be visible. A new POST .../profile/rebuild button force-recomputes and is audit-logged like reset-bandit. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
oO
One tip. Right now. Feels like magic.
oO learns who you are from the apps you already use and surfaces one perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip.
Why
Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees.
Product principles
- One thing at a time. The UI is a black page with one tip. That's the product.
- We don't own your data, we understand it. Connect your apps; we read what we need, when we need it.
- Magic requires craft. Precision, timing, and restraint matter more than features.
- Private by default. Tokens are encrypted, models are per-user, deletion is one click.
Prototype scope (Phase 0)
Three pages. That's it.
| Page | What it does |
|---|---|
| Sign in | Google / Apple OAuth. No passwords. |
| Connect | A list of integrations. Tap "Todoist" → OAuth flow → token stored. |
| Tip | Black page. One tip. Tap to dismiss / done / snooze. |
Under the hood the "pick a tip" call already routes through a recommender service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract.
Architecture at a glance
┌──────────┐ OAuth ┌────────────┐
│ Web / │──────────▶│ auth │
│ Mobile │ └─────┬──────┘
│ client │ │ JWT
│ │ REST/GraphQL ▼
│ │────────▶┌───────────────┐
└──────────┘ │ gateway │──┬──▶ profile
└───────┬───────┘ ├──▶ integrations ──▶ Todoist / Google / ...
│ └──▶ recommender ──▶ ml/serving (Python)
▼
┌───────────────┐
│ events │ ◀── integrations emit normalized events
│ (Kafka/NATS) │ ──▶ ml/pipelines (features, training)
└───────────────┘
More detail in docs/architecture/ and decisions in docs/adr/.
Monorepo layout
See CLAUDE.md for the full tree and conventions.
apps/ web, ios, android
services/ gateway, auth, profile, integrations, recommender, events, notifier
packages/ shared-types, sdk-js, ui
ml/ pipelines, features, registry, experiments, serving
infra/ docker, k8s, terraform, ci
docs/ architecture, adr, api
AI stack
oO is AI-native: the recommender's job is to rank, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one.
Three-tier layout
| Tier | Service | Purpose | Where |
|---|---|---|---|
| Inference | Ollama | Local LLM + embedding; no data leaves the host | localhost:11434 |
| Routing | LiteLLM | Unified OpenAI-compatible API; model aliases; cloud fallback | llm.alogins.net (Agap shared) |
| Testing | OpenWebUI | Prompt iteration, model comparison, manual evals | ai.alogins.net (Agap shared) |
Tip generation pipeline (Phase 2 target)
User signals ──▶ Context assembler ──▶ LiteLLM ──▶ Ollama (local)
(tasks, calendar, (ml/features/) (routing) or cloud fallback
patterns, time)
▼
N typed TipCandidates
{content, kind, model,
prompt_version, confidence}
▼
Bandit policy (ml/serving)
scores + ranks candidates
▼
Best tip shown
▼
User reaction (done / snooze / dismiss + dwell)
▼
Online bandit update + prompt_version tracking
Why LiteLLM as gateway: All LLM calls use a single LITELLM_URL env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in tip_scores tells you exactly which model produced each tip.
Why Ollama first: Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind ANTHROPIC_API_KEY.
Models (planned)
| Alias | Model | Task |
|---|---|---|
tip-generator |
qwen2.5:7b (default) | Generate typed tip candidates from user context |
embedder |
nomic-embed-text | Task clustering, semantic similarity for dedup |
judge |
claude-haiku-4-5 (cloud, eval-only) | Offline sim judge; rates tip quality for A/B |
Roadmap
Phase 0 — Walking skeleton (M0) ✓ shipped
Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.
- Monorepo scaffold, docker-compose dev env
auth— Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guardintegrations/todoist— OAuth2 flow, token stored in DB, disconnect supportedrecommenderwithRandomPolicy; stablePOST /recommendcontract; 30s task cacheapps/web— sign-in, connect, tip pages; PWA manifest + icons- Feedback:
done / snooze / dismiss; reward inferred from dwell-time (inferReward); marks task complete in Todoist - Deploy modular monolith to Agap VM via Caddy at
o.alogins.net - ToS + Privacy Policy pages (
/legal/terms,/legal/privacy); implicit consent on sign-in - Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
- Metrics baseline:
tip_viewstable (tip served) +tip_feedback(reactions) — activation + reaction rate queryable
Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped
Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.
- Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
- Todoist sync emits
signals.task.synced; tip served/feedback emitsignals.tip.* - Features extracted per task:
is_overdue,task_age_days,priority; context:hour_of_day,day_of_week ml/servingLinUCB (d=5) + ε-greedy v1 (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to diskRemotePolicyin recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability totip_scores- Feedback loop: dwell-time inferred reward (
inferReward) → online model update;donein 15 s–2 min = +1.0 (magic zone) - Offline simulation framework (
ml/experiments/sim): rule/LLM/claude-code judges, two-policy comparison, results persisted tosim_runs+sim_events - ε-greedy v1 promoted to active policy (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim
- Web Push (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
- Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
- Quiet-hours + dedupe for push delivery
- Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
- NATS JetStream bridge — durable
signals.>andfeedback.>streams; in-process bus stays the source of truth, every publish bridges out (#21, shipped)
M1 add-on — Admin & ML Ops Console (fully shipped)
oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to act on them (revoke a token, replay an event, promote a model, reset a bandit).
Framework pick — apps/admin on Next.js 15 + Tremor + shadcn/ui. Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses packages/shared-types, sdk-js, and the Auth.js session. Specialized ML tooling (MLflow, Airflow) runs as separate external services linked from the admin shell; Grafana panels are embedded.
| Layer | Tool | Why |
|---|---|---|
| App shell | Next.js 15 (new apps/admin) |
Same stack as apps/web; reuses auth, types, SDK |
| Dashboards / charts | Tremor | Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps |
| CRUD primitives | shadcn/ui | Copy-paste Radix components; forms, dialogs, command palette |
| Heavy grids | TanStack Table v8 | Sortable / paginated / virtualized tables (events, users, tips) |
| Extra charts | Recharts / visx | Fallbacks where Tremor falls short (e.g. force graphs, Sankey) |
| Model registry / experiments | MLflow (external — o.alogins.net/mlflow) |
Experiment tracking, artifact browser, model registry; own basic-auth |
| Pipeline orchestration | Airflow (external — o.alogins.net/airflow) |
Batch feature + retraining DAGs; own web-auth |
| Infra metrics | Grafana (embedded panels) | One ops source of truth |
| Ad-hoc analysis | Marimo reactive notebooks | Python-native for the ML side; launch-out link |
| AuthZ | profile.role='admin' + Next.js middleware |
Reuses existing session; no new auth surface |
Rejected alternatives (so we don't re-litigate):
- Retool / AppSmith — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
- Streamlit / Gradio / Dash — Python-first; thin RBAC and routing; splits our frontend stack in two
- React-admin / Refine.dev — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
- Superset / Metabase as the admin surface — excellent for BI, poor for operational writes (revoke, replay, promote). Plan: adopt Superset in M4 for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now
Build sequence (plan, not code):
- ADR-0006 — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
- Scaffold —
apps/adminwith Next.js 15, Tailwind, Tremor; deploy behind Caddy atadmin.o.alogins.net - RBAC —
rolecolumn onusers; admin-only Next.js middleware; seed first admin viaADMIN_SEED_EMAILenv;admin_actionsaudit-log table - Overview dashboard — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
- User explorer — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions
- Event stream viewer — live tail of
signals.*with filters by subject/user/time; same UI when the bus swaps to NATS - Feature store browser — features sent to
ml/servingper scoring call; diff across time for a user - Model registry panel —
/admin/modelslinks out to MLflow (mlflow.o.alogins.net); experiment tracking and dataset management in MLflow + Airflow - MLOps hub —
/admin/experimentslinks to MLflow experiments/models and Airflow DAGs/datasets; bandit reset on Users page - Recommendation log (explainability) — per served tip:
(user, features, policy, score, feedback, latency);tip_scorestable, 30-day retention - Reward analytics — reaction distribution over time; per-policy compare; slice by
hour_of_day,priority, cohort - Data quality widget — missing-feature rate, stale-token rate, daily completeness heatmap
- Ops actions — revoke token (Users page), replay signal, disable/promote shadow policy; every action audit-logged
- Read-only SQL runner — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
- Health rollup —
/admin/healthsurfaces api, ml/serving, SQLite, event-bus; auto-refreshes every 15s - Docs —
apps/admin/README.md, runbook for common ops actions, ADR-0006 merged
- Apple OAuth (deferred to M2)
Phase 2 — AI tips + multi-source signals (M2)
Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.
AI infrastructure (unblock everything else):
aicompose profile — Ollama + LiteLLM for local dev; env varsOLLAMA_URL/LITELLM_URL(#86)- AI gateway — wire
ml/servingto LiteLLM; model aliasestip-generator+embedder(#87)
AI tip generation pipeline:
- Context assembler — user signals + feature store → structured prompt context (
ml/features/context.py) (#88) - Tip generator endpoint —
POST /generateinml/serving; LLM → N typedTipCandidateobjects (#79) TipCandidateshared schema —{content, kind, source, model, prompt_version, confidence}; update recommender pipeline (#89)- LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
- Prompt versioning —
prompt_version+modelcolumns intip_scores; content-hash invalidation (#91) - LLM tip quality dashboard — reaction breakdown by model / prompt_version in
/admin/reward-analytics(#92)
Evaluation & model selection:
- Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
- LLM prompt research — persona design, context injection strategies, few-shot examples (#84)
Pipeline architecture:
- Signal source abstraction —
SignalSourceinterface generalizing beyond Todoist (#78) - Generalized recommendation pipeline — candidate → rank → render stages (#80)
- Feature registry + user profile builder — centralized features, persistent profiles (#81)
- Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)
Policy research:
- Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)
Integrations & infra (carried from M1):
- Apple OAuth (#7)
- NATS JetStream replacing in-process bus (#21) — adapter ships in
services/api/src/events/nats.ts; in-proc bus is the producer, JetStream is the durable mirror - Todoist sync via events (#22) — background scheduler in
services/api/src/signals/scheduler.tsemitssignals.task.syncedeveryTODOIST_SYNC_INTERVAL_MS; on-demand fetch remains as freshness fallback - Event schema registry + protobuf CI gate (#54)
- Per-user freshness SLAs for features (#61)
- CI skeleton (#3), observability (#18), E2E tests (#20)
Bugs (fix before new features):
- TipFeedback type mismatch (#73)
- Todoist token refresh (#74)
- Reward fire-and-forget (#75)
- Data retention purge (#76)
- Port mismatch (#77)
Phase 3 — Native mobile (M3)
- iOS app (SwiftUI) with APNs push
- Android app (Compose) with FCM push
notifiergains APNs + FCM channels, per-device rate limits- Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
- Consolidate MLflow + Airflow behind shared OIDC (SSO for all internal services)
- Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold
Phase 4 — MLOps at scale (M4)
- Airflow + MLflow deployed as external services (
mlopscompose profile); each with own auth - Write first retraining DAG (Airflow) + first MLflow experiment logging from
ml/serving - Feature-to-prompt pipeline — nightly Airflow DAG materializes context for LLM; cuts inline latency (#94)
- Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
- LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
- Embedding-based task clustering —
nomic-embed-textfor dedup + user pattern features (#97) - Consolidate MLflow + Airflow auth into shared OIDC provider (tracked as M3 issue #85)
- Shadow → A/B → launch pipeline as first-class in MLflow
- Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
- Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
- Drift monitoring (feature + prediction + reward drift); model cards per LLM version
Phase 5 — Production hardening (M5)
- Audit logging, rotation of provider tokens + internal signing keys
- k3s on existing VM, then k8s + HPA once multi-node justified (no cliff)
- Multi-region failover, Postgres PITR, event-bus mirroring
- Public integration SDK; sandbox tenancy for third-party connectors
- Billing + subscription tiers
Contributing
This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's README.md, ship.
Conventions and per-service guidance live in CLAUDE.md.
License
All rights reserved — 2026. Contact the owner for licensing inquiries. (We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)