alvis 9e96540bcc feat(admin): per-user profile view + rebuild action (#81 phase B.1)
Surfaces phase A's profile features in /admin/users/:id so we can verify
they're actually computing useful values before investing in bandit
consumption. The detail GET now includes profile rows joined with registry
metadata (name, value, age, fresh badge, ttlSec, description). Read does
NOT trigger compute — staleness must be visible. A new POST
.../profile/rebuild button force-recomputes and is audit-logged like
reset-bandit.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:27:08 +00:00

oO

One tip. Right now. Feels like magic.

oO learns who you are from the apps you already use and surfaces one perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip.


Why

Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees.

Product principles

  1. One thing at a time. The UI is a black page with one tip. That's the product.
  2. We don't own your data, we understand it. Connect your apps; we read what we need, when we need it.
  3. Magic requires craft. Precision, timing, and restraint matter more than features.
  4. Private by default. Tokens are encrypted, models are per-user, deletion is one click.

Prototype scope (Phase 0)

Three pages. That's it.

Page What it does
Sign in Google / Apple OAuth. No passwords.
Connect A list of integrations. Tap "Todoist" → OAuth flow → token stored.
Tip Black page. One tip. Tap to dismiss / done / snooze.

Under the hood the "pick a tip" call already routes through a recommender service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract.


Architecture at a glance

 ┌──────────┐   OAuth   ┌────────────┐
 │  Web /   │──────────▶│   auth     │
 │  Mobile  │           └─────┬──────┘
 │  client  │                 │ JWT
 │          │   REST/GraphQL  ▼
 │          │────────▶┌───────────────┐
 └──────────┘         │   gateway     │──┬──▶ profile
                      └───────┬───────┘  ├──▶ integrations ──▶ Todoist / Google / ...
                              │          └──▶ recommender ──▶ ml/serving (Python)
                              ▼
                      ┌───────────────┐
                      │    events     │ ◀── integrations emit normalized events
                      │  (Kafka/NATS) │ ──▶ ml/pipelines (features, training)
                      └───────────────┘

More detail in docs/architecture/ and decisions in docs/adr/.

Monorepo layout

See CLAUDE.md for the full tree and conventions.

apps/        web, ios, android
services/    gateway, auth, profile, integrations, recommender, events, notifier
packages/    shared-types, sdk-js, ui
ml/          pipelines, features, registry, experiments, serving
infra/       docker, k8s, terraform, ci
docs/        architecture, adr, api

AI stack

oO is AI-native: the recommender's job is to rank, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one.

Three-tier layout

Tier Service Purpose Where
Inference Ollama Local LLM + embedding; no data leaves the host localhost:11434
Routing LiteLLM Unified OpenAI-compatible API; model aliases; cloud fallback llm.alogins.net (Agap shared)
Testing OpenWebUI Prompt iteration, model comparison, manual evals ai.alogins.net (Agap shared)

Tip generation pipeline (Phase 2 target)

User signals  ──▶  Context assembler  ──▶  LiteLLM  ──▶  Ollama (local)
(tasks, calendar,    (ml/features/)         (routing)     or cloud fallback
 patterns, time)
                                                ▼
                                     N typed TipCandidates
                                     {content, kind, model,
                                      prompt_version, confidence}
                                                ▼
                                    Bandit policy (ml/serving)
                                    scores + ranks candidates
                                                ▼
                                         Best tip shown
                                                ▼
                              User reaction (done / snooze / dismiss + dwell)
                                                ▼
                              Online bandit update + prompt_version tracking

Why LiteLLM as gateway: All LLM calls use a single LITELLM_URL env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in tip_scores tells you exactly which model produced each tip.

Why Ollama first: Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind ANTHROPIC_API_KEY.

Models (planned)

Alias Model Task
tip-generator qwen2.5:7b (default) Generate typed tip candidates from user context
embedder nomic-embed-text Task clustering, semantic similarity for dedup
judge claude-haiku-4-5 (cloud, eval-only) Offline sim judge; rates tip quality for A/B

Roadmap

Phase 0 — Walking skeleton (M0) ✓ shipped

Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.

  • Monorepo scaffold, docker-compose dev env
  • auth — Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guard
  • integrations/todoist — OAuth2 flow, token stored in DB, disconnect supported
  • recommender with RandomPolicy; stable POST /recommend contract; 30s task cache
  • apps/web — sign-in, connect, tip pages; PWA manifest + icons
  • Feedback: done / snooze / dismiss; reward inferred from dwell-time (inferReward); marks task complete in Todoist
  • Deploy modular monolith to Agap VM via Caddy at o.alogins.net
  • ToS + Privacy Policy pages (/legal/terms, /legal/privacy); implicit consent on sign-in
  • Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
  • Metrics baseline: tip_views table (tip served) + tip_feedback (reactions) — activation + reaction rate queryable

Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped

Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.

  • Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
  • Todoist sync emits signals.task.synced; tip served/feedback emit signals.tip.*
  • Features extracted per task: is_overdue, task_age_days, priority; context: hour_of_day, day_of_week
  • ml/serving LinUCB (d=5) + ε-greedy v1 (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk
  • RemotePolicy in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to tip_scores
  • Feedback loop: dwell-time inferred reward (inferReward) → online model update; done in 15 s2 min = +1.0 (magic zone)
  • Offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-policy comparison, results persisted to sim_runs + sim_events
  • ε-greedy v1 promoted to active policy (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim
  • Web Push (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
  • Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
  • Quiet-hours + dedupe for push delivery
  • Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
  • NATS JetStream bridge — durable signals.> and feedback.> streams; in-process bus stays the source of truth, every publish bridges out (#21, shipped)

M1 add-on — Admin & ML Ops Console (fully shipped)

oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to act on them (revoke a token, replay an event, promote a model, reset a bandit).

Framework pick — apps/admin on Next.js 15 + Tremor + shadcn/ui. Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses packages/shared-types, sdk-js, and the Auth.js session. Specialized ML tooling (MLflow, Airflow) runs as separate external services linked from the admin shell; Grafana panels are embedded.

Layer Tool Why
App shell Next.js 15 (new apps/admin) Same stack as apps/web; reuses auth, types, SDK
Dashboards / charts Tremor Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps
CRUD primitives shadcn/ui Copy-paste Radix components; forms, dialogs, command palette
Heavy grids TanStack Table v8 Sortable / paginated / virtualized tables (events, users, tips)
Extra charts Recharts / visx Fallbacks where Tremor falls short (e.g. force graphs, Sankey)
Model registry / experiments MLflow (external — o.alogins.net/mlflow) Experiment tracking, artifact browser, model registry; own basic-auth
Pipeline orchestration Airflow (external — o.alogins.net/airflow) Batch feature + retraining DAGs; own web-auth
Infra metrics Grafana (embedded panels) One ops source of truth
Ad-hoc analysis Marimo reactive notebooks Python-native for the ML side; launch-out link
AuthZ profile.role='admin' + Next.js middleware Reuses existing session; no new auth surface

Rejected alternatives (so we don't re-litigate):

  • Retool / AppSmith — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
  • Streamlit / Gradio / Dash — Python-first; thin RBAC and routing; splits our frontend stack in two
  • React-admin / Refine.dev — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
  • Superset / Metabase as the admin surface — excellent for BI, poor for operational writes (revoke, replay, promote). Plan: adopt Superset in M4 for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now

Build sequence (plan, not code):

  1. ADR-0006 — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
  2. Scaffoldapps/admin with Next.js 15, Tailwind, Tremor; deploy behind Caddy at admin.o.alogins.net
  3. RBACrole column on users; admin-only Next.js middleware; seed first admin via ADMIN_SEED_EMAIL env; admin_actions audit-log table
  4. Overview dashboard — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
  5. User explorer — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions
  6. Event stream viewer — live tail of signals.* with filters by subject/user/time; same UI when the bus swaps to NATS
  7. Feature store browser — features sent to ml/serving per scoring call; diff across time for a user
  8. Model registry panel/admin/models links out to MLflow (mlflow.o.alogins.net); experiment tracking and dataset management in MLflow + Airflow
  9. MLOps hub/admin/experiments links to MLflow experiments/models and Airflow DAGs/datasets; bandit reset on Users page
  10. Recommendation log (explainability) — per served tip: (user, features, policy, score, feedback, latency); tip_scores table, 30-day retention
  11. Reward analytics — reaction distribution over time; per-policy compare; slice by hour_of_day, priority, cohort
  12. Data quality widget — missing-feature rate, stale-token rate, daily completeness heatmap
  13. Ops actions — revoke token (Users page), replay signal, disable/promote shadow policy; every action audit-logged
  14. Read-only SQL runner — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
  15. Health rollup/admin/health surfaces api, ml/serving, SQLite, event-bus; auto-refreshes every 15s
  16. Docsapps/admin/README.md, runbook for common ops actions, ADR-0006 merged
  • Apple OAuth (deferred to M2)

Phase 2 — AI tips + multi-source signals (M2)

Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.

AI infrastructure (unblock everything else):

  • ai compose profile — Ollama + LiteLLM for local dev; env vars OLLAMA_URL / LITELLM_URL (#86)
  • AI gateway — wire ml/serving to LiteLLM; model aliases tip-generator + embedder (#87)

AI tip generation pipeline:

  • Context assembler — user signals + feature store → structured prompt context (ml/features/context.py) (#88)
  • Tip generator endpoint — POST /generate in ml/serving; LLM → N typed TipCandidate objects (#79)
  • TipCandidate shared schema — {content, kind, source, model, prompt_version, confidence}; update recommender pipeline (#89)
  • LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
  • Prompt versioning — prompt_version + model columns in tip_scores; content-hash invalidation (#91)
  • LLM tip quality dashboard — reaction breakdown by model / prompt_version in /admin/reward-analytics (#92)

Evaluation & model selection:

  • Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
  • LLM prompt research — persona design, context injection strategies, few-shot examples (#84)

Pipeline architecture:

  • Signal source abstraction — SignalSource interface generalizing beyond Todoist (#78)
  • Generalized recommendation pipeline — candidate → rank → render stages (#80)
  • Feature registry + user profile builder — centralized features, persistent profiles (#81)
  • Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)

Policy research:

  • Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)

Integrations & infra (carried from M1):

  • Apple OAuth (#7)
  • NATS JetStream replacing in-process bus (#21) — adapter ships in services/api/src/events/nats.ts; in-proc bus is the producer, JetStream is the durable mirror
  • Todoist sync via events (#22) — background scheduler in services/api/src/signals/scheduler.ts emits signals.task.synced every TODOIST_SYNC_INTERVAL_MS; on-demand fetch remains as freshness fallback
  • Event schema registry + protobuf CI gate (#54)
  • Per-user freshness SLAs for features (#61)
  • CI skeleton (#3), observability (#18), E2E tests (#20)

Bugs (fix before new features):

  • TipFeedback type mismatch (#73)
  • Todoist token refresh (#74)
  • Reward fire-and-forget (#75)
  • Data retention purge (#76)
  • Port mismatch (#77)

Phase 3 — Native mobile (M3)

  • iOS app (SwiftUI) with APNs push
  • Android app (Compose) with FCM push
  • notifier gains APNs + FCM channels, per-device rate limits
  • Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
  • Consolidate MLflow + Airflow behind shared OIDC (SSO for all internal services)
  • Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold

Phase 4 — MLOps at scale (M4)

  • Airflow + MLflow deployed as external services (mlops compose profile); each with own auth
  • Write first retraining DAG (Airflow) + first MLflow experiment logging from ml/serving
  • Feature-to-prompt pipeline — nightly Airflow DAG materializes context for LLM; cuts inline latency (#94)
  • Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
  • LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
  • Embedding-based task clustering — nomic-embed-text for dedup + user pattern features (#97)
  • Consolidate MLflow + Airflow auth into shared OIDC provider (tracked as M3 issue #85)
  • Shadow → A/B → launch pipeline as first-class in MLflow
  • Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
  • Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
  • Drift monitoring (feature + prediction + reward drift); model cards per LLM version

Phase 5 — Production hardening (M5)

  • Audit logging, rotation of provider tokens + internal signing keys
  • k3s on existing VM, then k8s + HPA once multi-node justified (no cliff)
  • Multi-region failover, Postgres PITR, event-bus mirroring
  • Public integration SDK; sandbox tenancy for third-party connectors
  • Billing + subscription tiers

Contributing

This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's README.md, ship.

Conventions and per-service guidance live in CLAUDE.md.

License

All rights reserved — 2026. Contact the owner for licensing inquiries. (We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)

Description
One tip. Right now. Feels like magic.
Readme 1.7 MiB
Languages
TypeScript 56.9%
Python 42.4%
CSS 0.5%
JavaScript 0.1%
Shell 0.1%