alvis/oO

Go to file

alvis b3cf588f2f feat(ml): multi-agent context framework + v4 orchestrator prompt

Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum,
time_of_day, recent_patterns, focus_area) each producing a prompt snippet
from user signals. A registry wires them up; the orchestrator prompt in
ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM.

Also wires /api/agents route in the API and updates the Dockerfile to copy
the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-04 10:20:05 +00:00

.gitea/workflows

feat(schema): protobuf event registry + buf CI gate (#54 )

2026-04-25 16:48:24 +00:00

.playwright-mcp

feat: complete M0 — legal pages, consent, tip_views metrics, account deletion UI

2026-04-15 09:09:08 +00:00

apps

chore: remove Airflow completely from the stack

2026-05-03 16:38:46 +00:00

docs

chore: remove Airflow completely from the stack

2026-05-03 16:38:46 +00:00

infra

feat(ml): multi-agent context framework + v4 orchestrator prompt

2026-05-04 10:20:05 +00:00

feat(ml): multi-agent context framework + v4 orchestrator prompt

2026-05-04 10:20:05 +00:00

packages

feat(simulate): MLflow tracking, Airflow DAG integration, health checks for mlflow/airflow

2026-04-26 12:08:36 +00:00

scripts

feat(schema): protobuf event registry + buf CI gate (#54 )

2026-04-25 16:48:24 +00:00

services

feat(ml): multi-agent context framework + v4 orchestrator prompt

2026-05-04 10:20:05 +00:00

.dockerignore

chore(infra): wire MLflow/Airflow env vars, fix healthcheck, add .dockerignore

2026-04-26 12:08:43 +00:00

.env.example

chore: remove Airflow completely from the stack

2026-05-03 16:38:46 +00:00

.gitignore

feat: M1 — LinUCB bandit, RemotePolicy, Web Push, event bus

2026-04-15 14:08:00 +00:00

CLAUDE.md

chore: remove Airflow completely from the stack

2026-05-03 16:38:46 +00:00

package.json

feat: Phase 0 walking skeleton — monorepo, API, web, ML stub

2026-04-14 12:41:24 +00:00

PLAN.md

refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0

2026-04-13 14:36:11 +00:00

pnpm-lock.yaml

feat(observability): structured logs, W3C trace IDs, Sentry hooks (#18 )

2026-04-26 03:37:28 +00:00

pnpm-workspace.yaml

feat: Phase 0 walking skeleton — monorepo, API, web, ML stub

2026-04-14 12:41:24 +00:00

README.md

chore: remove Airflow completely from the stack

2026-05-03 16:38:46 +00:00

tsconfig.base.json

feat: Phase 0 walking skeleton — monorepo, API, web, ML stub

2026-04-14 12:41:24 +00:00

turbo.json

feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework

2026-04-16 07:44:37 +00:00

README.md

oO

One tip. Right now. Feels like magic.

oO learns who you are from the apps you already use and surfaces one perfectly-timed suggestion — an advice or a todo — on a black page. No feed. No dashboard. One tip.

Why

Everyone has too many tasks, too many apps, too much noise. What people actually need is a single, well-chosen nudge at the right moment. oO is that nudge, powered by a recommendation engine that gets smarter the more of your life it sees.

Product principles

One thing at a time. The UI is a black page with one tip. That's the product.
We don't own your data, we understand it. Connect your apps; we read what we need, when we need it.
Magic requires craft. Precision, timing, and restraint matter more than features.
Private by default. Tokens are encrypted, models are per-user, deletion is one click.

Prototype scope (Phase 0)

Three pages. That's it.

Page	What it does
Sign in	Google / Apple OAuth. No passwords.
Connect	A list of integrations. Tap "Todoist" → OAuth flow → token stored.
Tip	Black page. One tip. Tap to dismiss / done / snooze.

Under the hood the "pick a tip" call already routes through a recommender service with a pluggable policy — so v0 is literally "random Todoist task" but every other version slots into the same contract.

Architecture at a glance

 ┌──────────┐   OAuth   ┌────────────┐
 │  Web /   │──────────▶│   auth     │
 │  Mobile  │           └─────┬──────┘
 │  client  │                 │ JWT
 │          │   REST/GraphQL  ▼
 │          │────────▶┌───────────────┐
 └──────────┘         │   gateway     │──┬──▶ profile
                      └───────┬───────┘  ├──▶ integrations ──▶ Todoist / Google / ...
                              │          └──▶ recommender ──▶ ml/serving (Python)
                              ▼
                      ┌───────────────┐
                      │    events     │ ◀── integrations emit normalized events
                      │  (Kafka/NATS) │ ──▶ ml/pipelines (features, training)
                      └───────────────┘

More detail in docs/architecture/ and decisions in docs/adr/.

Monorepo layout

See CLAUDE.md for the full tree and conventions.

apps/        web, ios, android
services/    gateway, auth, profile, integrations, recommender, events, notifier
packages/    shared-types, sdk-js, ui
ml/          pipelines, features, registry, experiments, serving
infra/       docker, k8s, terraform, ci
docs/        architecture, adr, api

AI stack

oO is AI-native: the recommender's job is to rank, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one.

Three-tier layout

Tier	Service	Purpose	Where
Inference	Ollama	Local LLM + embedding; no data leaves the host	`localhost:11434`
Routing	LiteLLM	Unified OpenAI-compatible API; model aliases; cloud fallback	`llm.alogins.net` (Agap shared)
Testing	OpenWebUI	Prompt iteration, model comparison, manual evals	`ai.alogins.net` (Agap shared)

Tip generation pipeline (Phase 2 target)

User signals  ──▶  Context assembler  ──▶  LiteLLM  ──▶  Ollama (local)
(tasks, calendar,    (ml/features/)         (routing)     or cloud fallback
 patterns, time)
                                                ▼
                                     N typed TipCandidates
                                     {content, kind, model,
                                      prompt_version, confidence}
                                                ▼
                                    Bandit policy (ml/serving)
                                    scores + ranks candidates
                                                ▼
                                         Best tip shown
                                                ▼
                              User reaction (done / snooze / dismiss + dwell)
                                                ▼
                              Online bandit update + prompt_version tracking

Why LiteLLM as gateway: All LLM calls use a single LITELLM_URL env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in tip_scores tells you exactly which model produced each tip.

Why Ollama first: Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind ANTHROPIC_API_KEY.

Models (planned; routes through LiteLLM)

Alias	Model	Task
`tip-generator`	qwen2.5:1.5b (default)	Generate typed tip candidates from user context; local-first via Ollama
`embedder`	nomic-embed-text	Task clustering, semantic similarity for dedup; local via Ollama
`judge`	claude-haiku-4-5 (cloud, eval-only)	Offline sim judge; rates tip quality for A/B (requires `ANTHROPIC_API_KEY`)

All model calls route through LiteLLM at llm.alogins.net (or LITELLM_URL env var) using model aliases. This decouples tip generation from model selection — swap the backend model in LiteLLM config without code changes. See ADR-0008.

Roadmap

Phase 0 — Walking skeleton (M0) ✓ shipped

Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.

Monorepo scaffold, docker-compose dev env
auth — Google OAuth2/PKCE via openid-client v6; session cookie; Next.js middleware guard
integrations/todoist — OAuth2 flow, token stored in DB, disconnect supported
recommender with RandomPolicy; stable POST /recommend contract; 30s task cache
apps/web — sign-in, connect, tip pages; PWA manifest + icons
Feedback: done / snooze / dismiss; reward inferred from dwell-time (inferReward); marks task complete in Todoist
Deploy modular monolith to Agap VM via Caddy at o.alogins.net
ToS + Privacy Policy pages (/legal/terms, /legal/privacy); implicit consent on sign-in
Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
Metrics baseline: tip_views table (tip served) + tip_feedback (reactions) — activation + reaction rate queryable

Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped

Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.

Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
Todoist sync emits signals.task.synced; tip served/feedback emit signals.tip.*
Features extracted per task: is_overdue, task_age_days, priority; context: hour_of_day, day_of_week
ε-greedy v1 (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk
ε-greedy v2 (d=12, profile features: completion rate, dismiss rate, dwell, preferred hour, tip volume) in shadow; promoted to active policy (ADR-0012)
RemotePolicy in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to tip_scores
Feedback loop: dwell-time inferred reward (inferReward) → online model update; done in 15 s–2 min = +1.0 (magic zone)
Offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-policy comparison, results persisted to sim_runs + sim_events
Web Push (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
NATS JetStream bridge — durable signals.> and feedback.> streams; in-process bus stays the source of truth, every publish bridges out (#21, shipped)
Per-user profile features (completion rate, dismiss rate, dwell, preferred hour, tip volume) — event-driven, JIT invalidation (#81)
Quiet-hours + dedupe for push delivery
Delayed rewards: tasks completed directly in Todoist (requires webhook from Todoist)
Apple OAuth (deferred to M3)

M1 add-on — Admin & ML Ops Console (fully shipped)

oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to act on them (revoke a token, replay an event, promote a model, reset a bandit).

Framework pick — apps/admin on Next.js 15 + Tremor + shadcn/ui. Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses packages/shared-types, sdk-js, and the Auth.js session. Specialized ML tooling (MLflow) runs as a separate external service linked from the admin shell; Grafana panels are embedded.

Layer	Tool	Why
App shell	Next.js 15 (new `apps/admin`)	Same stack as `apps/web`; reuses auth, types, SDK
Dashboards / charts	Tremor	Analytics-first React + Tailwind — KPI cards, time-series, categorical, heatmaps
CRUD primitives	shadcn/ui	Copy-paste Radix components; forms, dialogs, command palette
Heavy grids	TanStack Table v8	Sortable / paginated / virtualized tables (events, users, tips)
Extra charts	Recharts / visx	Fallbacks where Tremor falls short (e.g. force graphs, Sankey)
Model registry / experiments	MLflow (external — `o.alogins.net/mlflow`)	Experiment tracking, artifact browser, model registry; own basic-auth
Infra metrics	Grafana (embedded panels)	One ops source of truth
Ad-hoc analysis	Marimo reactive notebooks	Python-native for the ML side; launch-out link
AuthZ	`profile.role='admin'` + Next.js middleware	Reuses existing session; no new auth surface

Rejected alternatives (so we don't re-litigate):

Retool / AppSmith — low-code speed, but admin logic leaves our repo; weak analytics affordances for an analytics product
Streamlit / Gradio / Dash — Python-first; thin RBAC and routing; splits our frontend stack in two
React-admin / Refine.dev — strong CRUD scaffolding, but analytics/ML views feel bolted on; we'd rebuild Tremor-style dashboards ourselves
Superset / Metabase as the admin surface — excellent for BI, poor for operational writes (revoke, replay, promote). Plan: adopt Superset in M4 for BI alongside batch pipelines; ship a read-only SQL widget inside admin for now

Build sequence:

ADR-0006 — record the framework choice + "embed, don't rebuild" rule for MLflow/Grafana
Scaffold — apps/admin with Next.js 15, Tailwind, Tremor; deploy behind Caddy at admin.o.alogins.net
RBAC — role column on users; admin-only Next.js middleware; seed first admin via ADMIN_SEED_EMAIL env; admin_actions audit-log table
Overview dashboard — DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel
User explorer — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit + rebuild-profile actions
Event stream viewer — live tail of signals.* with filters by subject/user/time; same UI when the bus swaps to NATS
Features page — features sent to ml/serving per scoring call; per-user profile features with freshness; diff across time
Tips page — tips served, scored, feedback reactions with policy/model breakdown
Reward analytics — reaction distribution over time; per-policy / per-model / per-prompt-version compare; slice by hour_of_day, priority, cohort
Data quality widget — missing-feature rate, stale-token rate, daily completeness heatmap; per-feature freshness SLA status
Ops actions — revoke token (Users page), rebuild profile, reset bandit, enable/disable shadow policies; every action audit-logged
Health rollup — /admin/health surfaces api, ml/serving, SQLite, event-bus, MLflow; auto-refreshes every 15s
Read-only SQL runner — SELECT-only runner against SQLite + saved queries (sunsets to Superset in M4)
Offline simulation runner — launch ml/experiments/sim from admin UI; track sim runs, judge, policy comparison
Token-based admin auth — POST /api/auth/token for Playwright/CI; ADMIN_TOKEN env var (#105)
Docs pages — admin documentation and runbooks inline

Phase 2 — AI tips + multi-source signals (M2) in progress

Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.

AI infrastructure (unblock everything else):

ai compose profile — Ollama + LiteLLM for local dev; env vars OLLAMA_URL / LITELLM_URL (#86)
AI gateway — wire ml/serving to LiteLLM; model aliases tip-generator + embedder (#87)

AI tip generation pipeline:

Context assembler — user signals + feature store → structured prompt context (ml/features/context.py); skeleton implemented
Tip generator endpoint — POST /generate in ml/serving; LLM → N typed TipCandidate objects (#79)
TipCandidate shared schema — {content, kind, source, model, prompt_version, confidence}; update recommender pipeline (#89)
LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
Prompt versioning — prompt_version + model columns in tip_scores; content-hash invalidation (#91)
LLM tip quality dashboard — reaction breakdown by model / prompt_version in /admin/reward-analytics (#92)

Evaluation & model selection:

Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
LLM prompt research — persona design, context injection strategies, few-shot examples (#84)

Pipeline architecture:

Signal source abstraction — SignalSource interface for Todoist + extensible design (#78)
Generalized recommendation pipeline — candidate → rank → render stages (#80)
Feature registry + user profile builder — centralized features, persistent profiles, event-driven invalidation (#81)
Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)

Policy research:

Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)

Integrations & infra (carried from M1):

Apple OAuth (#7)
NATS JetStream replacing in-process bus (#21) — adapter ships in services/api/src/events/nats.ts; in-proc bus is the producer, JetStream is the durable mirror
Todoist sync via events (#22) — background scheduler in services/api/src/signals/scheduler.ts emits signals.task.synced every TODOIST_SYNC_INTERVAL_MS; on-demand fetch remains as freshness fallback
Event schema registry + protobuf CI gate (#54) — buf lint/breaking checks on every PR
Per-user freshness SLAs for features (#61) — context-feature (JIT) vs profile-feature (batched) spec in ADR-0011; CONTEXT_FEATURES in ml/features/context.py
Observability (#18) — structured logs via pino, W3C trace IDs, Sentry hooks, trace correlation end-to-end
CI skeleton (#3), E2E tests (#20)

Bugs & UX (fix before new features):

TipFeedback type mismatch (#73)
Todoist token refresh (#74) — OAuth token auto-refresh on 401
Reward fire-and-forget (#75) — retry logic + logging
Data retention purge (#76) — daily purge of 30-day-old tip_scores/tip_feedback
Port mismatch (#77) — fixed in docker-compose + env var config
UX refinements (#100–102) — "done/snooze/dismiss" feedback only, config page UI, settings gear button

Phase 3 — Native mobile (M3)

iOS app (SwiftUI) with APNs push
Android app (Compose) with FCM push
notifier gains APNs + FCM channels, per-device rate limits
Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
Consolidate MLflow behind shared OIDC (SSO for all internal services)
Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold

Phase 4 — MLOps at scale (M4)

MLflow deployed as external service (mlops compose profile); own auth; health check integrated
Write first retraining pipeline + first MLflow experiment logging from ml/serving + JetStream consumers (#98)
Feature-to-prompt pipeline — nightly batch job materializes context for LLM; cuts inline latency (#94)
Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
Embedding-based task clustering — nomic-embed-text for dedup + user pattern features (#97)
Modular-monolith packaging + import-boundary lint (#47)
Consolidate MLflow auth into shared OIDC provider (tracked as M3 issue #85)
Shadow → A/B → launch pipeline as first-class in MLflow
Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
Drift monitoring (feature + prediction + reward drift); model cards per LLM version

Phase 5 — Production hardening (M5)

Audit logging, rotation of provider tokens + internal signing keys
k3s on existing VM, then k8s + HPA once multi-node justified (no cliff)
Multi-region failover, Postgres PITR, event-bus mirroring
Public integration SDK; sandbox tenancy for third-party connectors
Billing + subscription tiers

Contributing

This repo is split into independent modules; most tickets belong to exactly one. Pick an issue, check its milestone (= phase), read the service's README.md, ship.

Conventions and per-service guidance live in CLAUDE.md.

License

Languages

TypeScript 56.9%

Python 42.4%

CSS 0.5%

JavaScript 0.1%

Shell 0.1%

README.md Unescape Escape

oO

Why

Product principles

Prototype scope (Phase 0)

Architecture at a glance

Monorepo layout

AI stack

Three-tier layout

Tip generation pipeline (Phase 2 target)

Models (planned; routes through LiteLLM)

Roadmap

Phase 0 — Walking skeleton (M0) ✓ shipped

Phase 1 — Real signal + in-the-moment delivery (M1) ✓ shipped

M1 add-on — Admin & ML Ops Console (fully shipped)

Phase 2 — AI tips + multi-source signals (M2) in progress

Phase 3 — Native mobile (M3)

Phase 4 — MLOps at scale (M4)

Phase 5 — Production hardening (M5)

Contributing

License

README.md