alvis/oO

Go to file

alvis 9e96540bcc feat(admin): per-user profile view + rebuild action (#81 phase B.1)

Surfaces phase A's profile features in /admin/users/:id so we can verify
they're actually computing useful values before investing in bandit
consumption. The detail GET now includes profile rows joined with registry
metadata (name, value, age, fresh badge, ttlSec, description). Read does
NOT trigger compute — staleness must be visible. A new POST
.../profile/rebuild button force-recomputes and is audit-logged like
reset-bandit.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-25 00:27:08 +00:00

.playwright-mcp

feat: complete M0 — legal pages, consent, tip_views metrics, account deletion UI

2026-04-15 09:09:08 +00:00

apps

feat(admin): per-user profile view + rebuild action (#81 phase B.1)

2026-04-25 00:27:08 +00:00

docs

feat(profile): user-profile feature registry + builder (phase A)

2026-04-25 00:22:22 +00:00

infra

fix(infra): ml-serving LITELLM_URL default → host.docker.internal:4000

2026-04-20 12:20:41 +00:00

feat(profile): user-profile feature registry + builder (phase A)

2026-04-25 00:22:22 +00:00

packages

feat: NATS JetStream + Todoist background sync (#21 , #22 )

2026-04-18 01:18:51 +00:00

services

feat(admin): per-user profile view + rebuild action (#81 phase B.1)

2026-04-25 00:27:08 +00:00

.env.example

feat(ml): prompt registry + per-request variant selection

2026-04-24 15:44:04 +00:00

.gitignore

feat: M1 — LinUCB bandit, RemotePolicy, Web Push, event bus

2026-04-15 14:08:00 +00:00

CLAUDE.md

refactor(infra): drop ai profile; ollama + litellm move to Agap

2026-04-20 12:16:21 +00:00

package.json

feat: Phase 0 walking skeleton — monorepo, API, web, ML stub

2026-04-14 12:41:24 +00:00

PLAN.md

refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0

2026-04-13 14:36:11 +00:00

pnpm-lock.yaml

feat: NATS JetStream + Todoist background sync (#21 , #22 )

2026-04-18 01:18:51 +00:00

pnpm-workspace.yaml

feat: Phase 0 walking skeleton — monorepo, API, web, ML stub

2026-04-14 12:41:24 +00:00

README.md

test: cover NATS bridge + Todoist scheduler; ADR-0010

2026-04-18 07:55:25 +00:00

tsconfig.base.json

feat: Phase 0 walking skeleton — monorepo, API, web, ML stub

2026-04-14 12:41:24 +00:00

turbo.json

feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework

2026-04-16 07:44:37 +00:00

README.md

oO

One tip. Right now. Feels like magic.

oO learns who you are from the apps you already use and surfaces one perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip.

Why

Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees.

Product principles

One thing at a time. The UI is a black page with one tip. That's the product.
We don't own your data, we understand it. Connect your apps; we read what we need, when we need it.
Magic requires craft. Precision, timing, and restraint matter more than features.
Private by default. Tokens are encrypted, models are per-user, deletion is one click.

Prototype scope (Phase 0)

Three pages. That's it.

Page	What it does
Sign in	Google / Apple OAuth. No passwords.
Connect	A list of integrations. Tap "Todoist" → OAuth flow → token stored.
Tip	Black page. One tip. Tap to dismiss / done / snooze.

Under the hood the "pick a tip" call already routes through a recommender service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract.

Architecture at a glance

 ┌──────────┐   OAuth   ┌────────────┐
 │  Web /   │──────────▶│   auth     │
 │  Mobile  │           └─────┬──────┘
 │  client  │                 │ JWT
 │          │   REST/GraphQL  ▼
 │          │────────▶┌───────────────┐
 └──────────┘         │   gateway     │──┬──▶ profile
                      └───────┬───────┘  ├──▶ integrations ──▶ Todoist / Google / ...
                              │          └──▶ recommender ──▶ ml/serving (Python)
                              ▼
                      ┌───────────────┐
                      │    events     │ ◀── integrations emit normalized events
                      │  (Kafka/NATS) │ ──▶ ml/pipelines (features, training)
                      └───────────────┘

More detail in docs/architecture/ and decisions in docs/adr/.

Monorepo layout

See CLAUDE.md for the full tree and conventions.

apps/        web, ios, android
services/    gateway, auth, profile, integrations, recommender, events, notifier
packages/    shared-types, sdk-js, ui
ml/          pipelines, features, registry, experiments, serving
infra/       docker, k8s, terraform, ci
docs/        architecture, adr, api

AI stack

oO is AI-native: the recommender's job is to rank, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one.

Three-tier layout

Tier	Service	Purpose	Where
Inference	Ollama	Local LLM + embedding; no data leaves the host	`localhost:11434`
Routing	LiteLLM	Unified OpenAI-compatible API; model aliases; cloud fallback	`llm.alogins.net` (Agap shared)
Testing	OpenWebUI	Prompt iteration, model comparison, manual evals	`ai.alogins.net` (Agap shared)

Tip generation pipeline (Phase 2 target)

User signals  ──▶  Context assembler  ──▶  LiteLLM  ──▶  Ollama (local)
(tasks, calendar,    (ml/features/)         (routing)     or cloud fallback
 patterns, time)
                                                ▼
                                     N typed TipCandidates
                                     {content, kind, model,
                                      prompt_version, confidence}
                                                ▼
                                    Bandit policy (ml/serving)
                                    scores + ranks candidates
                                                ▼
                                         Best tip shown
                                                ▼
                              User reaction (done / snooze / dismiss + dwell)
                                                ▼
                              Online bandit update + prompt_version tracking

Why LiteLLM as gateway: All LLM calls use a single LITELLM_URL env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in tip_scores tells you exactly which model produced each tip.

Why Ollama first: Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind ANTHROPIC_API_KEY.

Models (planned)

Alias	Model	Task
`tip-generator`	qwen2.5:7b (default)	Generate typed tip candidates from user context
`embedder`	nomic-embed-text	Task clustering, semantic similarity for dedup
`judge`	claude-haiku-4-5 (cloud, eval-only)	Offline sim judge; rates tip quality for A/B

Roadmap

Phase 0 — Walking skeleton (M0) ✓ shipped

Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.

Monorepo scaffold, docker-compose dev env
auth — Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guard
integrations/todoist — OAuth2 flow, token stored in DB, disconnect supported
recommender with RandomPolicy; stable POST /recommend contract; 30s task cache
apps/web — sign-in, connect, tip pages; PWA manifest + icons
Feedback: done / snooze / dismiss; reward inferred from dwell-time (inferReward); marks task complete in Todoist
Deploy modular monolith to Agap VM via Caddy at o.alogins.net
ToS + Privacy Policy pages (/legal/terms, /legal/privacy); implicit consent on sign-in
Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
Metrics baseline: tip_views table (tip served) + tip_feedback (reactions) — activation + reaction rate queryable

Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped

Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.

Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
Todoist sync emits signals.task.synced; tip served/feedback emit signals.tip.*
Features extracted per task: is_overdue, task_age_days, priority; context: hour_of_day, day_of_week
ml/serving LinUCB (d=5) + ε-greedy v1 (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk
RemotePolicy in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to tip_scores
Feedback loop: dwell-time inferred reward (inferReward) → online model update; done in 15 s–2 min = +1.0 (magic zone)
Offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-policy comparison, results persisted to sim_runs + sim_events
ε-greedy v1 promoted to active policy (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim
Web Push (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
Quiet-hours + dedupe for push delivery
Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
NATS JetStream bridge — durable signals.> and feedback.> streams; in-process bus stays the source of truth, every publish bridges out (#21, shipped)

M1 add-on — Admin & ML Ops Console (fully shipped)

oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to act on them (revoke a token, replay an event, promote a model, reset a bandit).

Framework pick — apps/admin on Next.js 15 + Tremor + shadcn/ui. Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses packages/shared-types, sdk-js, and the Auth.js session. Specialized ML tooling (MLflow, Airflow) runs as separate external services linked from the admin shell; Grafana panels are embedded.

Layer	Tool	Why
App shell	Next.js 15 (new `apps/admin`)	Same stack as `apps/web`; reuses auth, types, SDK
Dashboards / charts	Tremor	Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps
CRUD primitives	shadcn/ui	Copy-paste Radix components; forms, dialogs, command palette
Heavy grids	TanStack Table v8	Sortable / paginated / virtualized tables (events, users, tips)
Extra charts	Recharts / visx	Fallbacks where Tremor falls short (e.g. force graphs, Sankey)
Model registry / experiments	MLflow (external — `o.alogins.net/mlflow`)	Experiment tracking, artifact browser, model registry; own basic-auth
Pipeline orchestration	Airflow (external — `o.alogins.net/airflow`)	Batch feature + retraining DAGs; own web-auth
Infra metrics	Grafana (embedded panels)	One ops source of truth
Ad-hoc analysis	Marimo reactive notebooks	Python-native for the ML side; launch-out link
AuthZ	`profile.role='admin'` + Next.js middleware	Reuses existing session; no new auth surface

Rejected alternatives (so we don't re-litigate):

Retool / AppSmith — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
Streamlit / Gradio / Dash — Python-first; thin RBAC and routing; splits our frontend stack in two
React-admin / Refine.dev — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
Superset / Metabase as the admin surface — excellent for BI, poor for operational writes (revoke, replay, promote). Plan: adopt Superset in M4 for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now

Build sequence (plan, not code):

ADR-0006 — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
Scaffold — apps/admin with Next.js 15, Tailwind, Tremor; deploy behind Caddy at admin.o.alogins.net
RBAC — role column on users; admin-only Next.js middleware; seed first admin via ADMIN_SEED_EMAIL env; admin_actions audit-log table
Overview dashboard — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
User explorer — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions
Event stream viewer — live tail of signals.* with filters by subject/user/time; same UI when the bus swaps to NATS
Feature store browser — features sent to ml/serving per scoring call; diff across time for a user
Model registry panel — /admin/models links out to MLflow (mlflow.o.alogins.net); experiment tracking and dataset management in MLflow + Airflow
MLOps hub — /admin/experiments links to MLflow experiments/models and Airflow DAGs/datasets; bandit reset on Users page
Recommendation log (explainability) — per served tip: (user, features, policy, score, feedback, latency); tip_scores table, 30-day retention
Reward analytics — reaction distribution over time; per-policy compare; slice by hour_of_day, priority, cohort
Data quality widget — missing-feature rate, stale-token rate, daily completeness heatmap
Ops actions — revoke token (Users page), replay signal, disable/promote shadow policy; every action audit-logged
Read-only SQL runner — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
Health rollup — /admin/health surfaces api, ml/serving, SQLite, event-bus; auto-refreshes every 15s
Docs — apps/admin/README.md, runbook for common ops actions, ADR-0006 merged

Apple OAuth (deferred to M2)

Phase 2 — AI tips + multi-source signals (M2)

Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.

AI infrastructure (unblock everything else):

ai compose profile — Ollama + LiteLLM for local dev; env vars OLLAMA_URL / LITELLM_URL (#86)
AI gateway — wire ml/serving to LiteLLM; model aliases tip-generator + embedder (#87)

AI tip generation pipeline:

Context assembler — user signals + feature store → structured prompt context (ml/features/context.py) (#88)
Tip generator endpoint — POST /generate in ml/serving; LLM → N typed TipCandidate objects (#79)
TipCandidate shared schema — {content, kind, source, model, prompt_version, confidence}; update recommender pipeline (#89)
LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
Prompt versioning — prompt_version + model columns in tip_scores; content-hash invalidation (#91)
LLM tip quality dashboard — reaction breakdown by model / prompt_version in /admin/reward-analytics (#92)

Evaluation & model selection:

Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
LLM prompt research — persona design, context injection strategies, few-shot examples (#84)

Pipeline architecture:

Signal source abstraction — SignalSource interface generalizing beyond Todoist (#78)
Generalized recommendation pipeline — candidate → rank → render stages (#80)
Feature registry + user profile builder — centralized features, persistent profiles (#81)
Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)

Policy research:

Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)

Integrations & infra (carried from M1):

Apple OAuth (#7)
NATS JetStream replacing in-process bus (#21) — adapter ships in services/api/src/events/nats.ts; in-proc bus is the producer, JetStream is the durable mirror
Todoist sync via events (#22) — background scheduler in services/api/src/signals/scheduler.ts emits signals.task.synced every TODOIST_SYNC_INTERVAL_MS; on-demand fetch remains as freshness fallback
Event schema registry + protobuf CI gate (#54)
Per-user freshness SLAs for features (#61)
CI skeleton (#3), observability (#18), E2E tests (#20)

Bugs (fix before new features):

TipFeedback type mismatch (#73)
Todoist token refresh (#74)
Reward fire-and-forget (#75)
Data retention purge (#76)
Port mismatch (#77)

Phase 3 — Native mobile (M3)

iOS app (SwiftUI) with APNs push
Android app (Compose) with FCM push
notifier gains APNs + FCM channels, per-device rate limits
Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
Consolidate MLflow + Airflow behind shared OIDC (SSO for all internal services)
Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold

Phase 4 — MLOps at scale (M4)

Airflow + MLflow deployed as external services (mlops compose profile); each with own auth
Write first retraining DAG (Airflow) + first MLflow experiment logging from ml/serving
Feature-to-prompt pipeline — nightly Airflow DAG materializes context for LLM; cuts inline latency (#94)
Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
Embedding-based task clustering — nomic-embed-text for dedup + user pattern features (#97)
Consolidate MLflow + Airflow auth into shared OIDC provider (tracked as M3 issue #85)
Shadow → A/B → launch pipeline as first-class in MLflow
Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
Drift monitoring (feature + prediction + reward drift); model cards per LLM version

Phase 5 — Production hardening (M5)

Audit logging, rotation of provider tokens + internal signing keys
k3s on existing VM, then k8s + HPA once multi-node justified (no cliff)
Multi-region failover, Postgres PITR, event-bus mirroring
Public integration SDK; sandbox tenancy for third-party connectors
Billing + subscription tiers

Contributing

This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's README.md, ship.

Conventions and per-service guidance live in CLAUDE.md.

License

Languages

TypeScript 56.9%

Python 42.4%

CSS 0.5%

JavaScript 0.1%

Shell 0.1%

README.md Unescape Escape

oO

Why

Product principles

Prototype scope (Phase 0)

Architecture at a glance

Monorepo layout

AI stack

Three-tier layout

Tip generation pipeline (Phase 2 target)

Models (planned)

Roadmap

Phase 0 — Walking skeleton (M0) ✓ shipped

Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped

M1 add-on — Admin & ML Ops Console (fully shipped)

Phase 2 — AI tips + multi-source signals (M2)

Phase 3 — Native mobile (M3)

Phase 4 — MLOps at scale (M4)

Phase 5 — Production hardening (M5)

Contributing

License

README.md