# oO > One tip. Right now. Feels like magic. oO learns who you are from the apps you already use and surfaces **one** perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip. --- ## Why Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees. ## Product principles 1. **One thing at a time.** The UI is a black page with one tip. That's the product. 2. **We don't own your data, we understand it.** Connect your apps; we read what we need, when we need it. 3. **Magic requires craft.** Precision, timing, and restraint matter more than features. 4. **Private by default.** Tokens are encrypted, models are per-user, deletion is one click. ## Prototype scope (Phase 0) Three pages. That's it. | Page | What it does | |------|--------------| | **Sign in** | Google / Apple OAuth. No passwords. | | **Connect** | A list of integrations. Tap "Todoist" → OAuth flow → token stored. | | **Tip** | Black page. One tip. Tap to dismiss / done / snooze. | Under the hood the "pick a tip" call already routes through a `recommender` service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract. --- ## Architecture at a glance ``` ┌──────────┐ OAuth ┌────────────┐ │ Web / │──────────▶│ auth │ │ Mobile │ └─────┬──────┘ │ client │ │ JWT │ │ REST/GraphQL ▼ │ │────────▶┌───────────────┐ └──────────┘ │ gateway │──┬──▶ profile └───────┬───────┘ ├──▶ integrations ──▶ Todoist / Google / ... │ └──▶ recommender ──▶ ml/serving (Python) ▼ ┌───────────────┐ │ events │ ◀── integrations emit normalized events │ (Kafka/NATS) │ ──▶ ml/pipelines (features, training) └───────────────┘ ``` More detail in [`docs/architecture/`](docs/architecture/) and decisions in [`docs/adr/`](docs/adr/). ## Monorepo layout See [`CLAUDE.md`](CLAUDE.md) for the full tree and conventions. ``` apps/ web, ios, android services/ gateway, auth, profile, integrations, recommender, events, notifier packages/ shared-types, sdk-js, ui ml/ pipelines, features, registry, experiments, serving infra/ docker, k8s, terraform, ci docs/ architecture, adr, api ``` --- ## AI stack oO is AI-native: the recommender's job is to **rank**, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one. ### Three-tier layout | Tier | Service | Purpose | Where | |------|---------|---------|-------| | Inference | **Ollama** | Local LLM + embedding; no data leaves the host | `localhost:11434` | | Routing | **LiteLLM** | Unified OpenAI-compatible API; model aliases; cloud fallback | `llm.alogins.net` (Agap shared) | | Testing | **OpenWebUI** | Prompt iteration, model comparison, manual evals | `ai.alogins.net` (Agap shared) | ### Tip generation pipeline (Phase 2 target) ``` User signals ──▶ Context assembler ──▶ LiteLLM ──▶ Ollama (local) (tasks, calendar, (ml/features/) (routing) or cloud fallback patterns, time) ▼ N typed TipCandidates {content, kind, model, prompt_version, confidence} ▼ Bandit policy (ml/serving) scores + ranks candidates ▼ Best tip shown ▼ User reaction (done / snooze / dismiss + dwell) ▼ Online bandit update + prompt_version tracking ``` **Why LiteLLM as gateway:** All LLM calls use a single `LITELLM_URL` env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in `tip_scores` tells you exactly which model produced each tip. **Why Ollama first:** Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind `ANTHROPIC_API_KEY`. ### Models (planned) | Alias | Model | Task | |-------|-------|------| | `tip-generator` | qwen2.5:7b (default) | Generate typed tip candidates from user context | | `embedder` | nomic-embed-text | Task clustering, semantic similarity for dedup | | `judge` | claude-haiku-4-5 (cloud, eval-only) | Offline sim judge; rates tip quality for A/B | --- ## Roadmap ### Phase 0 — Walking skeleton *(M0)* ✓ shipped Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works. - [x] Monorepo scaffold, docker-compose dev env - [x] `auth` — Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guard - [x] `integrations/todoist` — OAuth2 flow, token stored in DB, disconnect supported - [x] `recommender` with `RandomPolicy`; stable `POST /recommend` contract; 30s task cache - [x] `apps/web` — sign-in, connect, tip pages; PWA manifest + icons - [x] Feedback: `done / snooze / dismiss`; reward inferred from dwell-time (`inferReward`); marks task complete in Todoist - [x] Deploy modular monolith to Agap VM via Caddy at `o.alogins.net` - [x] ToS + Privacy Policy pages (`/legal/terms`, `/legal/privacy`); implicit consent on sign-in - [x] Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect - [x] Metrics baseline: `tip_views` table (tip served) + `tip_feedback` (reactions) — activation + reaction rate queryable ### Phase 1 — Real signal + in-the-moment delivery *(M1)* ✓ shipped Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web. - [x] Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical - [x] Todoist sync emits `signals.task.synced`; tip served/feedback emit `signals.tip.*` - [x] Features extracted per task: `is_overdue`, `task_age_days`, `priority`; context: `hour_of_day`, `day_of_week` - [x] `ml/serving` LinUCB (d=5) + **ε-greedy v1** (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk - [x] `RemotePolicy` in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to `tip_scores` - [x] Feedback loop: dwell-time inferred reward (`inferReward`) → online model update; `done` in 15 s–2 min = +1.0 (magic zone) - [x] Offline simulation framework (`ml/experiments/sim`): rule/LLM/claude-code judges, two-policy comparison, results persisted to `sim_runs` + `sim_events` - [x] **ε-greedy v1 promoted to active policy** (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim - [x] **Web Push** (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page - [x] Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56) - [ ] Quiet-hours + dedupe for push delivery - [ ] Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist) - [ ] NATS JetStream replacing in-process bus (when multi-process pressure arrives) #### M1 add-on — Admin & ML Ops Console *(fully shipped)* oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to *act* on them (revoke a token, replay an event, promote a model, reset a bandit). **Framework pick — `apps/admin` on Next.js 15 + Tremor + shadcn/ui.** Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses `packages/shared-types`, `sdk-js`, and the Auth.js session. Specialized ML tooling (MLflow, Airflow) runs as **separate external services** linked from the admin shell; Grafana panels are embedded. | Layer | Tool | Why | |-------|------|-----| | App shell | **Next.js 15** (new `apps/admin`) | Same stack as `apps/web`; reuses auth, types, SDK | | Dashboards / charts | **[Tremor](https://tremor.so)** | Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps | | CRUD primitives | **[shadcn/ui](https://ui.shadcn.com)** | Copy-paste Radix components; forms, dialogs, command palette | | Heavy grids | **[TanStack Table v8](https://tanstack.com/table)** | Sortable / paginated / virtualized tables (events, users, tips) | | Extra charts | **[Recharts](https://recharts.org)** / **[visx](https://airbnb.io/visx)** | Fallbacks where Tremor falls short (e.g. force graphs, Sankey) | | Model registry / experiments | **[MLflow](https://mlflow.org)** *(external — `o.alogins.net/mlflow`)* | Experiment tracking, artifact browser, model registry; own basic-auth | | Pipeline orchestration | **[Airflow](https://airflow.apache.org)** *(external — `o.alogins.net/airflow`)* | Batch feature + retraining DAGs; own web-auth | | Infra metrics | **[Grafana](https://grafana.com)** *(embedded panels)* | One ops source of truth | | Ad-hoc analysis | **[Marimo](https://marimo.io)** reactive notebooks | Python-native for the ML side; launch-out link | | AuthZ | `profile.role='admin'` + Next.js middleware | Reuses existing session; no new auth surface | **Rejected alternatives (so we don't re-litigate):** - *Retool / AppSmith* — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product - *Streamlit / Gradio / Dash* — Python-first; thin RBAC and routing; splits our frontend stack in two - *React-admin / Refine.dev* — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves - *Superset / Metabase as the admin surface* — excellent for BI, poor for operational **writes** (revoke, replay, promote). Plan: **adopt Superset in M4** for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now **Build sequence (plan, not code):** 1. [x] **ADR-0006** — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana 2. [x] **Scaffold** — `apps/admin` with Next.js 15, Tailwind, Tremor; deploy behind Caddy at `admin.o.alogins.net` 3. [x] **RBAC** — `role` column on `users`; admin-only Next.js middleware; seed first admin via `ADMIN_SEED_EMAIL` env; `admin_actions` audit-log table 4. [x] **Overview dashboard** — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel 5. [x] **User explorer** — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions 6. [x] **Event stream viewer** — live tail of `signals.*` with filters by subject/user/time; same UI when the bus swaps to NATS 7. [x] **Feature store browser** — features sent to `ml/serving` per scoring call; diff across time for a user 8. [x] **Model registry panel** — `/admin/models` links out to MLflow (`mlflow.o.alogins.net`); experiment tracking and dataset management in MLflow + Airflow 9. [x] **MLOps hub** — `/admin/experiments` links to MLflow experiments/models and Airflow DAGs/datasets; bandit reset on Users page 10. [x] **Recommendation log (explainability)** — per served tip: `(user, features, policy, score, feedback, latency)`; `tip_scores` table, 30-day retention 11. [x] **Reward analytics** — reaction distribution over time; per-policy compare; slice by `hour_of_day`, `priority`, cohort 12. [x] **Data quality widget** — missing-feature rate, stale-token rate, daily completeness heatmap 13. [x] **Ops actions** — revoke token (Users page), replay signal, disable/promote shadow policy; every action audit-logged 14. [x] **Read-only SQL runner** — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4) 15. [x] **Health rollup** — `/admin/health` surfaces api, ml/serving, SQLite, event-bus; auto-refreshes every 15s 16. [ ] **Docs** — `apps/admin/README.md`, runbook for common ops actions, ADR-0006 merged - [ ] Apple OAuth (deferred to M2) ### Phase 2 — AI tips + multi-source signals *(M2)* Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone. **AI infrastructure (unblock everything else):** - [ ] `ai` compose profile — Ollama + LiteLLM for local dev; env vars `OLLAMA_URL` / `LITELLM_URL` (#86) - [ ] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87) **AI tip generation pipeline:** - [ ] Context assembler — user signals + feature store → structured prompt context (`ml/features/context.py`) (#88) - [ ] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79) - [ ] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89) - [ ] LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90) - [ ] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91) - [ ] LLM tip quality dashboard — reaction breakdown by model / prompt_version in `/admin/reward-analytics` (#92) **Evaluation & model selection:** - [ ] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93) - [ ] LLM prompt research — persona design, context injection strategies, few-shot examples (#84) **Pipeline architecture:** - [ ] Signal source abstraction — `SignalSource` interface generalizing beyond Todoist (#78) - [ ] Generalized recommendation pipeline — candidate → rank → render stages (#80) - [ ] Feature registry + user profile builder — centralized features, persistent profiles (#81) - [ ] Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82) **Policy research:** - [ ] Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83) **Integrations & infra (carried from M1):** - [ ] Apple OAuth (#7) - [ ] NATS JetStream replacing in-process bus (#21) - [ ] Todoist sync via events (#22) - [ ] Event schema registry + protobuf CI gate (#54) - [ ] Per-user freshness SLAs for features (#61) - [ ] CI skeleton (#3), observability (#18), E2E tests (#20) **Bugs (fix before new features):** - [ ] TipFeedback type mismatch (#73) - [ ] Todoist token refresh (#74) - [ ] Reward fire-and-forget (#75) - [ ] Data retention purge (#76) - [ ] Port mismatch (#77) ### Phase 3 — Native mobile *(M3)* - [ ] iOS app (SwiftUI) with APNs push - [ ] Android app (Compose) with FCM push - [ ] `notifier` gains APNs + FCM channels, per-device rate limits - [ ] Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004) - [ ] Consolidate MLflow + Airflow behind shared OIDC (SSO for all internal services) - [ ] Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold ### Phase 4 — MLOps at scale *(M4)* - [x] Airflow + MLflow deployed as external services (`mlops` compose profile); each with own auth - [ ] Write first retraining DAG (Airflow) + first MLflow experiment logging from `ml/serving` - [ ] Feature-to-prompt pipeline — nightly Airflow DAG materializes context for LLM; cuts inline latency (#94) - [ ] Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95) - [ ] LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96) - [ ] Embedding-based task clustering — `nomic-embed-text` for dedup + user pattern features (#97) - [ ] Consolidate MLflow + Airflow auth into shared OIDC provider (tracked as M3 issue #85) - [ ] Shadow → A/B → launch pipeline as first-class in MLflow - [ ] Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B - [ ] Cross-user collaborative features (opt-in only); cohort slicing; fairness checks - [ ] Drift monitoring (feature + prediction + reward drift); model cards per LLM version ### Phase 5 — Production hardening *(M5)* - [ ] Audit logging, rotation of provider tokens + internal signing keys - [ ] **k3s** on existing VM, then k8s + HPA once multi-node justified (no cliff) - [ ] Multi-region failover, Postgres PITR, event-bus mirroring - [ ] Public integration SDK; sandbox tenancy for third-party connectors - [ ] Billing + subscription tiers --- ## Contributing This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's `README.md`, ship. Conventions and per-service guidance live in [`CLAUDE.md`](CLAUDE.md). ## License All rights reserved — 2026. Contact the owner for licensing inquiries. (We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)