alvis faf44c18fc feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework
- Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy
  replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606)
- Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward):
  dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3
- Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id}
  with d=7 feature vector (base 5 + sin/cos day-of-week encoding)
- Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges,
  two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events
- Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables
- Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0
- Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls
- Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture
- Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns
- ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 07:44:37 +00:00

oO

One tip. Right now. Feels like magic.

oO learns who you are from the apps you already use and surfaces one perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip.


Why

Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees.

Product principles

  1. One thing at a time. The UI is a black page with one tip. That's the product.
  2. We don't own your data, we understand it. Connect your apps; we read what we need, when we need it.
  3. Magic requires craft. Precision, timing, and restraint matter more than features.
  4. Private by default. Tokens are encrypted, models are per-user, deletion is one click.

Prototype scope (Phase 0)

Three pages. That's it.

Page What it does
Sign in Google / Apple OAuth. No passwords.
Connect A list of integrations. Tap "Todoist" → OAuth flow → token stored.
Tip Black page. One tip. Tap to dismiss / done / snooze.

Under the hood the "pick a tip" call already routes through a recommender service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract.


Architecture at a glance

 ┌──────────┐   OAuth   ┌────────────┐
 │  Web /   │──────────▶│   auth     │
 │  Mobile  │           └─────┬──────┘
 │  client  │                 │ JWT
 │          │   REST/GraphQL  ▼
 │          │────────▶┌───────────────┐
 └──────────┘         │   gateway     │──┬──▶ profile
                      └───────┬───────┘  ├──▶ integrations ──▶ Todoist / Google / ...
                              │          └──▶ recommender ──▶ ml/serving (Python)
                              ▼
                      ┌───────────────┐
                      │    events     │ ◀── integrations emit normalized events
                      │  (Kafka/NATS) │ ──▶ ml/pipelines (features, training)
                      └───────────────┘

More detail in docs/architecture/ and decisions in docs/adr/.

Monorepo layout

See CLAUDE.md for the full tree and conventions.

apps/        web, ios, android
services/    gateway, auth, profile, integrations, recommender, events, notifier
packages/    shared-types, sdk-js, ui
ml/          pipelines, features, registry, experiments, serving
infra/       docker, k8s, terraform, ci
docs/        architecture, adr, api

Roadmap

Phase 0 — Walking skeleton (M0) ✓ shipped

Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.

  • Monorepo scaffold, docker-compose dev env
  • auth — Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guard
  • integrations/todoist — OAuth2 flow, token stored in DB, disconnect supported
  • recommender with RandomPolicy; stable POST /recommend contract; 30s task cache
  • apps/web — sign-in, connect, tip pages; PWA manifest + icons
  • Feedback: done / snooze / dismiss; reward inferred from dwell-time (inferReward); marks task complete in Todoist
  • Deploy modular monolith to Agap VM via Caddy at o.alogins.net
  • ToS + Privacy Policy pages (/legal/terms, /legal/privacy); implicit consent on sign-in
  • Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
  • Metrics baseline: tip_views table (tip served) + tip_feedback (reactions) — activation + reaction rate queryable

Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped

Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.

  • Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
  • Todoist sync emits signals.task.synced; tip served/feedback emit signals.tip.*
  • Features extracted per task: is_overdue, task_age_days, priority; context: hour_of_day, day_of_week
  • ml/serving LinUCB (d=5) + ε-greedy v1 (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk
  • RemotePolicy in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to tip_scores
  • Feedback loop: dwell-time inferred reward (inferReward) → online model update; done in 15 s2 min = +1.0 (magic zone)
  • Offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-policy comparison, results persisted to sim_runs + sim_events
  • ε-greedy v1 promoted to active policy (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim
  • Web Push (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
  • Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
  • Quiet-hours + dedupe for push delivery
  • Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
  • NATS JetStream replacing in-process bus (when multi-process pressure arrives)

M1 add-on — Admin & ML Ops Console (fully shipped)

oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to act on them (revoke a token, replay an event, promote a model, reset a bandit).

Framework pick — apps/admin on Next.js 15 + Tremor + shadcn/ui. Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses packages/shared-types, sdk-js, and the Auth.js session. Specialized ML tooling (MLflow, Grafana, Marimo) is embedded via authenticated reverse-proxy, not re-implemented.

Layer Tool Why
App shell Next.js 15 (new apps/admin) Same stack as apps/web; reuses auth, types, SDK
Dashboards / charts Tremor Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps
CRUD primitives shadcn/ui Copy-paste Radix components; forms, dialogs, command palette
Heavy grids TanStack Table v8 Sortable / paginated / virtualized tables (events, users, tips)
Extra charts Recharts / visx Fallbacks where Tremor falls short (e.g. force graphs, Sankey)
Model registry MLflow UI (embedded) Artifact + run browser; don't re-build
Infra metrics Grafana (embedded panels) One ops source of truth
Ad-hoc analysis Marimo reactive notebooks Python-native for the ML side; launch-out link
AuthZ profile.role='admin' + Next.js middleware Reuses existing session; no new auth surface

Rejected alternatives (so we don't re-litigate):

  • Retool / AppSmith — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
  • Streamlit / Gradio / Dash — Python-first; thin RBAC and routing; splits our frontend stack in two
  • React-admin / Refine.dev — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
  • Superset / Metabase as the admin surface — excellent for BI, poor for operational writes (revoke, replay, promote). Plan: adopt Superset in M4 for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now

Build sequence (plan, not code):

  1. ADR-0006 — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
  2. Scaffoldapps/admin with Next.js 15, Tailwind, Tremor; deploy behind Caddy at admin.o.alogins.net
  3. RBACrole column on users; admin-only Next.js middleware; seed first admin via ADMIN_SEED_EMAIL env; admin_actions audit-log table
  4. Overview dashboard — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
  5. User explorer — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions
  6. Event stream viewer — live tail of signals.* with filters by subject/user/time; same UI when the bus swaps to NATS
  7. Feature store browser — features sent to ml/serving per scoring call; diff across time for a user
  8. Model registry panel — embed MLflow UI at /admin/models; promote / archive via admin context menu (writes audit-logged)
  9. Experiment dashboard — LinUCB per-arm stats (pulls, reward mean, α), cohort compare, bandit reset control
  10. Recommendation log (explainability) — per served tip: (user, features, policy, score, feedback, latency); tip_scores table, 30-day retention
  11. Reward analytics — reaction distribution over time; per-policy compare; slice by hour_of_day, priority, cohort
  12. Data quality widget — missing-feature rate, stale-token rate, daily completeness heatmap
  13. Ops actions — revoke token (Users page), replay signal, disable/promote shadow policy; every action audit-logged
  14. Read-only SQL runner — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
  15. Health rollup/admin/health surfaces api, ml/serving, SQLite, event-bus; auto-refreshes every 15s
  16. Docsapps/admin/README.md, runbook for common ops actions, ADR-0006 merged
  • Apple OAuth (deferred to M2)

Phase 2 — Multi-source profile & trust (M2)

Goal: oO knows more than tasks, and users can see/control what we know.

  • Integrations: Google Calendar, Apple Health (web import), generic webhook ingress
  • Unified Profile model (identity, preferences, contexts, consents)
  • Timing signals (Page Visibility, Idle Detection, coarse location) — opt-in, transparent
  • Advice library + mixing policy (todo vs advice vs ambient)
  • User-facing data dashboard: what's stored, what's computed, export, delete-by-category
  • Cost/usage observability

Phase 3 — Native mobile (M3)

  • iOS app (SwiftUI) with APNs push
  • Android app (Compose) with FCM push
  • notifier gains APNs + FCM channels, per-device rate limits
  • Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
  • Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold

Phase 4 — MLOps at scale (M4)

  • Prefect/Airflow for batch feature materialization + retraining
  • MLflow registry; shadow → A/B → launch pipeline as first-class
  • Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
  • Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
  • Drift monitoring (feature drift, prediction drift, reward drift); model cards per version

Phase 5 — Production hardening (M5)

  • Audit logging, rotation of provider tokens + internal signing keys
  • k3s on existing VM, then k8s + HPA once multi-node justified (no cliff)
  • Multi-region failover, Postgres PITR, event-bus mirroring
  • Public integration SDK; sandbox tenancy for third-party connectors
  • Billing + subscription tiers

Contributing

This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's README.md, ship.

Conventions and per-service guidance live in CLAUDE.md.

License

All rights reserved — 2026. Contact the owner for licensing inquiries. (We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)

Description
One tip. Right now. Feels like magic.
Readme 1.7 MiB
Languages
TypeScript 56.9%
Python 42.4%
CSS 0.5%
JavaScript 0.1%
Shell 0.1%