- Add Signal + SignalSource interfaces to packages/shared-types - TipCandidate.features widened to Record<string,number|boolean> to match Signal - TodoistSignalSource: encapsulates fetch, cache, 401 handling, bus events, and act() - SignalAggregator: parallel fan-out across sources with per-source failure isolation - Recommender refactored to consume Signal[] via aggregator; source action dispatch via aggregator.act() - ADR-0009: signal normalization strategy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
oO
One tip. Right now. Feels like magic.
oO learns who you are from the apps you already use and surfaces one perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip.
Why
Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees.
Product principles
- One thing at a time. The UI is a black page with one tip. That's the product.
- We don't own your data, we understand it. Connect your apps; we read what we need, when we need it.
- Magic requires craft. Precision, timing, and restraint matter more than features.
- Private by default. Tokens are encrypted, models are per-user, deletion is one click.
Prototype scope (Phase 0)
Three pages. That's it.
| Page | What it does |
|---|---|
| Sign in | Google / Apple OAuth. No passwords. |
| Connect | A list of integrations. Tap "Todoist" → OAuth flow → token stored. |
| Tip | Black page. One tip. Tap to dismiss / done / snooze. |
Under the hood the "pick a tip" call already routes through a recommender service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract.
Architecture at a glance
┌──────────┐ OAuth ┌────────────┐
│ Web / │──────────▶│ auth │
│ Mobile │ └─────┬──────┘
│ client │ │ JWT
│ │ REST/GraphQL ▼
│ │────────▶┌───────────────┐
└──────────┘ │ gateway │──┬──▶ profile
└───────┬───────┘ ├──▶ integrations ──▶ Todoist / Google / ...
│ └──▶ recommender ──▶ ml/serving (Python)
▼
┌───────────────┐
│ events │ ◀── integrations emit normalized events
│ (Kafka/NATS) │ ──▶ ml/pipelines (features, training)
└───────────────┘
More detail in docs/architecture/ and decisions in docs/adr/.
Monorepo layout
See CLAUDE.md for the full tree and conventions.
apps/ web, ios, android
services/ gateway, auth, profile, integrations, recommender, events, notifier
packages/ shared-types, sdk-js, ui
ml/ pipelines, features, registry, experiments, serving
infra/ docker, k8s, terraform, ci
docs/ architecture, adr, api
AI stack
oO is AI-native: the recommender's job is to rank, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one.
Three-tier layout
| Tier | Service | Purpose | Where |
|---|---|---|---|
| Inference | Ollama | Local LLM + embedding; no data leaves the host | localhost:11434 |
| Routing | LiteLLM | Unified OpenAI-compatible API; model aliases; cloud fallback | llm.alogins.net (Agap shared) |
| Testing | OpenWebUI | Prompt iteration, model comparison, manual evals | ai.alogins.net (Agap shared) |
Tip generation pipeline (Phase 2 target)
User signals ──▶ Context assembler ──▶ LiteLLM ──▶ Ollama (local)
(tasks, calendar, (ml/features/) (routing) or cloud fallback
patterns, time)
▼
N typed TipCandidates
{content, kind, model,
prompt_version, confidence}
▼
Bandit policy (ml/serving)
scores + ranks candidates
▼
Best tip shown
▼
User reaction (done / snooze / dismiss + dwell)
▼
Online bandit update + prompt_version tracking
Why LiteLLM as gateway: All LLM calls use a single LITELLM_URL env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in tip_scores tells you exactly which model produced each tip.
Why Ollama first: Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind ANTHROPIC_API_KEY.
Models (planned)
| Alias | Model | Task |
|---|---|---|
tip-generator |
qwen2.5:7b (default) | Generate typed tip candidates from user context |
embedder |
nomic-embed-text | Task clustering, semantic similarity for dedup |
judge |
claude-haiku-4-5 (cloud, eval-only) | Offline sim judge; rates tip quality for A/B |
Roadmap
Phase 0 — Walking skeleton (M0) ✓ shipped
Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.
- Monorepo scaffold, docker-compose dev env
auth— Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guardintegrations/todoist— OAuth2 flow, token stored in DB, disconnect supportedrecommenderwithRandomPolicy; stablePOST /recommendcontract; 30s task cacheapps/web— sign-in, connect, tip pages; PWA manifest + icons- Feedback:
done / snooze / dismiss; reward inferred from dwell-time (inferReward); marks task complete in Todoist - Deploy modular monolith to Agap VM via Caddy at
o.alogins.net - ToS + Privacy Policy pages (
/legal/terms,/legal/privacy); implicit consent on sign-in - Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
- Metrics baseline:
tip_viewstable (tip served) +tip_feedback(reactions) — activation + reaction rate queryable
Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped
Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.
- Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
- Todoist sync emits
signals.task.synced; tip served/feedback emitsignals.tip.* - Features extracted per task:
is_overdue,task_age_days,priority; context:hour_of_day,day_of_week ml/servingLinUCB (d=5) + ε-greedy v1 (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to diskRemotePolicyin recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability totip_scores- Feedback loop: dwell-time inferred reward (
inferReward) → online model update;donein 15 s–2 min = +1.0 (magic zone) - Offline simulation framework (
ml/experiments/sim): rule/LLM/claude-code judges, two-policy comparison, results persisted tosim_runs+sim_events - ε-greedy v1 promoted to active policy (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim
- Web Push (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
- Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
- Quiet-hours + dedupe for push delivery
- Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
- NATS JetStream replacing in-process bus (when multi-process pressure arrives)
M1 add-on — Admin & ML Ops Console (fully shipped)
oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to act on them (revoke a token, replay an event, promote a model, reset a bandit).
Framework pick — apps/admin on Next.js 15 + Tremor + shadcn/ui. Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses packages/shared-types, sdk-js, and the Auth.js session. Specialized ML tooling (MLflow, Airflow) runs as separate external services linked from the admin shell; Grafana panels are embedded.
| Layer | Tool | Why |
|---|---|---|
| App shell | Next.js 15 (new apps/admin) |
Same stack as apps/web; reuses auth, types, SDK |
| Dashboards / charts | Tremor | Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps |
| CRUD primitives | shadcn/ui | Copy-paste Radix components; forms, dialogs, command palette |
| Heavy grids | TanStack Table v8 | Sortable / paginated / virtualized tables (events, users, tips) |
| Extra charts | Recharts / visx | Fallbacks where Tremor falls short (e.g. force graphs, Sankey) |
| Model registry / experiments | MLflow (external — o.alogins.net/mlflow) |
Experiment tracking, artifact browser, model registry; own basic-auth |
| Pipeline orchestration | Airflow (external — o.alogins.net/airflow) |
Batch feature + retraining DAGs; own web-auth |
| Infra metrics | Grafana (embedded panels) | One ops source of truth |
| Ad-hoc analysis | Marimo reactive notebooks | Python-native for the ML side; launch-out link |
| AuthZ | profile.role='admin' + Next.js middleware |
Reuses existing session; no new auth surface |
Rejected alternatives (so we don't re-litigate):
- Retool / AppSmith — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
- Streamlit / Gradio / Dash — Python-first; thin RBAC and routing; splits our frontend stack in two
- React-admin / Refine.dev — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
- Superset / Metabase as the admin surface — excellent for BI, poor for operational writes (revoke, replay, promote). Plan: adopt Superset in M4 for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now
Build sequence (plan, not code):
- ADR-0006 — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
- Scaffold —
apps/adminwith Next.js 15, Tailwind, Tremor; deploy behind Caddy atadmin.o.alogins.net - RBAC —
rolecolumn onusers; admin-only Next.js middleware; seed first admin viaADMIN_SEED_EMAILenv;admin_actionsaudit-log table - Overview dashboard — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
- User explorer — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions
- Event stream viewer — live tail of
signals.*with filters by subject/user/time; same UI when the bus swaps to NATS - Feature store browser — features sent to
ml/servingper scoring call; diff across time for a user - Model registry panel —
/admin/modelslinks out to MLflow (mlflow.o.alogins.net); experiment tracking and dataset management in MLflow + Airflow - MLOps hub —
/admin/experimentslinks to MLflow experiments/models and Airflow DAGs/datasets; bandit reset on Users page - Recommendation log (explainability) — per served tip:
(user, features, policy, score, feedback, latency);tip_scorestable, 30-day retention - Reward analytics — reaction distribution over time; per-policy compare; slice by
hour_of_day,priority, cohort - Data quality widget — missing-feature rate, stale-token rate, daily completeness heatmap
- Ops actions — revoke token (Users page), replay signal, disable/promote shadow policy; every action audit-logged
- Read-only SQL runner — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
- Health rollup —
/admin/healthsurfaces api, ml/serving, SQLite, event-bus; auto-refreshes every 15s - Docs —
apps/admin/README.md, runbook for common ops actions, ADR-0006 merged
- Apple OAuth (deferred to M2)
Phase 2 — AI tips + multi-source signals (M2)
Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.
AI infrastructure (unblock everything else):
aicompose profile — Ollama + LiteLLM for local dev; env varsOLLAMA_URL/LITELLM_URL(#86)- AI gateway — wire
ml/servingto LiteLLM; model aliasestip-generator+embedder(#87)
AI tip generation pipeline:
- Context assembler — user signals + feature store → structured prompt context (
ml/features/context.py) (#88) - Tip generator endpoint —
POST /generateinml/serving; LLM → N typedTipCandidateobjects (#79) TipCandidateshared schema —{content, kind, source, model, prompt_version, confidence}; update recommender pipeline (#89)- LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
- Prompt versioning —
prompt_version+modelcolumns intip_scores; content-hash invalidation (#91) - LLM tip quality dashboard — reaction breakdown by model / prompt_version in
/admin/reward-analytics(#92)
Evaluation & model selection:
- Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
- LLM prompt research — persona design, context injection strategies, few-shot examples (#84)
Pipeline architecture:
- Signal source abstraction —
SignalSourceinterface generalizing beyond Todoist (#78) - Generalized recommendation pipeline — candidate → rank → render stages (#80)
- Feature registry + user profile builder — centralized features, persistent profiles (#81)
- Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)
Policy research:
- Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)
Integrations & infra (carried from M1):
- Apple OAuth (#7)
- NATS JetStream replacing in-process bus (#21)
- Todoist sync via events (#22)
- Event schema registry + protobuf CI gate (#54)
- Per-user freshness SLAs for features (#61)
- CI skeleton (#3), observability (#18), E2E tests (#20)
Bugs (fix before new features):
- TipFeedback type mismatch (#73)
- Todoist token refresh (#74)
- Reward fire-and-forget (#75)
- Data retention purge (#76)
- Port mismatch (#77)
Phase 3 — Native mobile (M3)
- iOS app (SwiftUI) with APNs push
- Android app (Compose) with FCM push
notifiergains APNs + FCM channels, per-device rate limits- Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
- Consolidate MLflow + Airflow behind shared OIDC (SSO for all internal services)
- Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold
Phase 4 — MLOps at scale (M4)
- Airflow + MLflow deployed as external services (
mlopscompose profile); each with own auth - Write first retraining DAG (Airflow) + first MLflow experiment logging from
ml/serving - Feature-to-prompt pipeline — nightly Airflow DAG materializes context for LLM; cuts inline latency (#94)
- Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
- LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
- Embedding-based task clustering —
nomic-embed-textfor dedup + user pattern features (#97) - Consolidate MLflow + Airflow auth into shared OIDC provider (tracked as M3 issue #85)
- Shadow → A/B → launch pipeline as first-class in MLflow
- Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
- Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
- Drift monitoring (feature + prediction + reward drift); model cards per LLM version
Phase 5 — Production hardening (M5)
- Audit logging, rotation of provider tokens + internal signing keys
- k3s on existing VM, then k8s + HPA once multi-node justified (no cliff)
- Multi-region failover, Postgres PITR, event-bus mirroring
- Public integration SDK; sandbox tenancy for third-party connectors
- Billing + subscription tiers
Contributing
This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's README.md, ship.
Conventions and per-service guidance live in CLAUDE.md.
License
All rights reserved — 2026. Contact the owner for licensing inquiries. (We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)