- Clarify compose profile requirement for build/up (silent no-op without --profile) - Add --force-recreate pattern for env-var-only changes - Document MLflow host_header and auth gotchas for container-to-container calls - Record MLflow tracing addition and #118 M4 tracking issue Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
14 KiB
oO — Project Instructions
What this is
oO is a recommendation system for personal tips. It collects signals across a user's life (tasks, habits, calendar, mood, context) to build a rich profile and deliver one perfectly-timed tip — an advice or a todo — that feels like magic.
The magic is the product. Precision + timing + minimalism. The UI shows a single black page with one tip. The complexity lives behind it.
Prime directives
- Modular by package, deployable by stage. Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = modular monolith + Python ML sidecar. See ADR-0003.
- Recommendation engine is the core. Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
- Python owns ML. Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason.
- OAuth-first for identity and integrations. Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently.
- Privacy is a feature, not a phase. Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed.
- Feel-of-magic over feature count. When in doubt, ship fewer things, polished. The tip page is a watch face.
Architecture (high level)
The tree below is logical module structure. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003).
apps/ user-facing clients
web/ Next.js PWA — the first shipped client
mobile-ios/ Swift/SwiftUI (Phase 3)
mobile-android/ Kotlin/Compose (Phase 3)
services/ backend modules — each owns a contract; may share a deployable
gateway/ BFF for clients; auth check; fan-out
auth/ OAuth (Google, Apple, ...), sessions, JWT issuance
profile/ user profile, preferences, consents
integrations/ third-party connectors + token vault (Todoist first)
recommender/ orchestration: candidates → policy → tip; feedback sink
events/ event bus ingress + durable signal store
notifier/ push/email/web delivery (web push from Phase 1)
packages/ shared libraries (importable across services + apps)
shared-types/ HTTP types via OpenAPI; event types via protobuf (ADR-0005)
sdk-js/ client SDK used by web + mobile webviews
ui/ shared React components + design tokens
ml/ Python — separate deployable from day one
serving/ online scorer (FastAPI), called by recommender
features/ feature definitions + store adapter
pipelines/ batch feature + training scripts
registry/ MLflow model registry integration
experiments/ assignment + A/B + bandit policies
notebooks/ research only; never imported by production code
infra/ docker-compose (Phase 0), k3s/k8s (later), terraform, CI
docs/ architecture notes, ADRs, API specs
Phase 0 deployables: one Node process (services/* bundled via modular monolith) + one Python process (ml/serving, stubbed until M1) + Postgres + NATS. Services extract to their own process when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003.
Contracts between modules
- HTTP (OpenAPI, in
packages/shared-types/http/) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical. - Events (Protocol Buffers, in
packages/shared-types/events/) — durable signals + feedback. Today: in-processBuswith aonPublishbridge to NATS JetStream whenNATS_URLis set (ADR-0010). The in-proc bus stays the source of truth — JetStream is the durable mirror that cross-process consumers (ml/serving, future feature pipelines) tail. Proto schemas (ADR-0005) live inpackages/shared-types/events/oo/events/v1/;buf lint+buf breakingrun in CI on every PR touching those files (.gitea/workflows/buf-check.yaml). - Do not redefine types per module. Regenerate from
shared-types.
Conventions
- Each module ships a
README.mddescribing its contract, its/healthstory, and its extraction criteria (when it should become its own process). - One PR = one concern. Conventional-commit prefixes (
feat:,fix:,chore:,docs:,refactor:). - ADRs go in
docs/adr/NNNN-title.mdfor any decision that constrains future work. - No secrets in repo. Local dev via
.env.local(gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later). - Compose profiles:
core(api + web + admin),full(adds ml-serving + nats),mlops(adds MLflow),ai(adds Ollama + LiteLLM). Mix as needed. Always pass--profile <name>tobuild/up— without a profile, no services are selected and builds silently do nothing. - Docker rebuild: use
--force-recreateonupwhen only env vars changed (no image rebuild needed); new env vars in.env.localare not picked up by a running container until it is recreated. - Run Python agent tests:
python3 -m pytest ml/agents/tests/ -x -q(tests add repo root tosys.paththemselves). - Run Python feature tests:
python3 -m pytest ml/features/ -x -q ml/features/files are Python mirrors of TS registries — TS is source of truth. Tests parseregistry.tswith regex to detect drift; follow the same pattern whenever a new field is added toProfileFeature.
Definition of done (per feature)
- Code + tests merged.
- Module's
README.mdupdated. - If it changes a contract →
shared-typesregenerated + consumers updated. - If it changes architecture → ADR added.
- Deployable via
docker compose uplocally. - If it touches user data → a deletion path exists and is tested.
AI stack
oO generates tips through a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM assembles them into one tip. All LLM calls route through LiteLLM at llm.alogins.net using model aliases — swapping models is a config change, not a code change.
| Alias | Model | Used by |
|---|---|---|
tip-generator |
qwen2.5:1.5b (default) | ml/serving tip generation |
embedder |
nomic-embed-text | task clustering, dedup |
judge |
claude-haiku-4-5 (cloud, eval only) | offline sim |
Env vars: LITELLM_URL (prod https://llm.alogins.net), OLLAMA_URL (Agap host, http://host.docker.internal:11434 from containers).
Ollama and LiteLLM are shared Agap services, not oO services — they live in agap_git/openai/docker-compose.yml along with langfuse (observability). oO never starts them; ml-serving just calls the alias.
All httpx calls in ml/ must use trust_env=False to bypass the system proxy — same rule as bw and curl. Pattern: httpx.Client(trust_env=False, timeout=N).
MLflow container-to-container calls: always pass host_header="localhost" to MLflowClient — MLflow's --allowed-hosts rejects Host: mlflow (the container DNS name) with 403. Auth credential is MLFLOW_ADMIN_PASSWORD. MLflow REST API lives at the origin root (/api/2.0/mlflow), not under the /mlflow UI prefix.
Multi-agent tip generation pipeline (ADR-0013):
- Pre-compute agents (
ml/agents/<id>/) run on a schedule, each emitting a snippet intoagent_outputswith a per-agent TTL - On request,
recommender(TS) loads the eligible agent set (registry-driven, ADR-0014) and pulls the freshest non-expired snippets POST /recommendinml/servingassembles the orchestrator prompt (v4-orchestrator) and calls LiteLLM via thetip-generatoralias- Returned tip is logged in
tip_scoreswith the contributing agent set; reaction is logged for observability (no bandit reward loop)
Current phase
M1 shipped (core + admin). M2 (AI tips) in progress. See README.md for the phase roadmap and docs/architecture/ for diagrams. Work is tracked as Gitea milestones + issues on alvis/oO.
Recent completions:
-
ADR-0013 — multi-agent recommendation: pre-computed agent snippets + orchestrator LLM (replaces ε-greedy bandit) — 2026-05-01
-
LLM context assembler + tip generation scaffold (#79, #88)
-
Model benchmarking for tip generation (#93, #95)
-
Admin UX refinements: feedback consolidation, settings placement (#100–102)
-
ADR-0012 — ε-greedy v2 (D=12) — 2026-04-26 (now superseded by ADR-0013)
-
ADR-0014 complete: unified Profile schema + backfill, manifest plumbing,
/api/profileread-through, registry-driven eligibility filter, inference framework + per-agent inference, legacy consent column drop — 2026-05-05 -
Rich per-agent inference for all four active agents (#112, #114, #115, #116) — 2026-05-06: quiet/peak hours (time-of-day), z-score baseline (momentum), p50 lateness + project realness (overdue-task), adaptive lookback + weekly/daily cycles (recent-patterns)
-
Semantic task clustering via nomic-embed-text + focus-area preferred_areas inference (#97, #113) — 2026-05-06:
ml/agents/clustering.py, focus-area v2.0.0 -
Per-user feature freshness SLAs (#61) — 2026-05-06:
invalidated_bymirrored intoProfileFeature; drift-detection test added -
MLflow tracing added to
ml/servingfor all agent calls — 2026-05-06:ml/serving/mlflow_client.py; activated byMLFLOW_TRACKING_URI=http://mlflow:5000(default in composefullprofile); requires--profile mlopsfor the MLflow container. Issue #118 (M4) tracks removal from production critical path.
Active work (M2): (all M2 items complete — see README for M3 planning)
ADR-0014 endpoint map (as of step 6)
| Endpoint | Purpose |
|---|---|
GET /api/profile |
Read-through: user globals + prefs (by scope) + consents + contexts |
PATCH /api/profile/prefs/:scope |
Upsert user_preferences rows (source='user') |
PATCH /api/profile/consents |
Grant / revoke consent keys |
PATCH /api/profile/contexts |
Create / activate / deactivate named contexts |
GET /api/agents/registry |
Manifest list (proxy to ml/serving; 60 s cache) |
POST /api/agents/:agentId/compute |
Internal: run agent compute for (user, agent) |
POST /agents/{agent_id}/infer (ml/serving) |
Run inference framework → {inferred_prefs} |
Inference framework (ADR-0014 §3)
Lives in ml/agents/inference/. run_inference(manifest, history) evaluates all InferredParam entries in the manifest and returns {key: value}. Rules:
- Below
min_history→ emitcold_start_default infer()error → emitcold_start_default(never crashes)- Results written to
user_preferenceswithsource='inferred'; keys withsource='user'are never overwritten
All five agents are at v1.2.0. Per-agent inferred params (all live in ml/agents/<name>.py):
| Agent | Inferred params | Notes |
|---|---|---|
time-of-day |
preferred_hour, quiet_start, quiet_end, peak_hours, tz |
Quiet window = longest below-baseline hour run; peak = top-quartile done hours; tz cold-start only (from auth provider) |
momentum |
engagement_trend, baseline_completions_per_day, stdev |
Baseline = 28d rolling mean done/day; snippet uses z-score language |
overdue-task |
lateness_tolerance_days, project_realness |
Tolerance = p50 lateness from TaskCompletion history; realness = project median vs global median |
recent-patterns |
lookback_days, weekly_cycle, daily_cycle |
Lookback sized to ≥30 done events; cycles use peak-to-mean ratio; snippet hints when strength > 0.5 |
focus-area |
preferred_areas |
Top-2 project IDs by task completion count; semantic clustering via ml/agents/clustering.py in compute() |
UserHistory carries both events: list[FeedbackEvent] and task_completions: list[TaskCompletion]. AgentInferRequest (ml/serving) accepts task_completions: list[dict] alongside feedback_history.
min_history is checked against len(history.events) (feedback events), not task_completions. Agents that infer from completions should set min_history=0 and guard inside infer().
What NOT to do
- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
- Don't implement auth by hand. Auth.js behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
- Don't hardwire a recommender. The contract is
POST /recommend → {tip}. Swap internals (multi-agent orchestrator today, future LLM/hybrid variants), keep contract. - Don't hardcode the agent list. The orchestrator is registry-driven (ADR-0014); adding/removing an agent is a manifest change in
ml/agents/<id>/, never a recommender edit. - Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
- Don't call LLMs directly from application code. All LLM calls go through
ml/serving(Python) viaLITELLM_URL. The TS recommender never holds a model name. - Don't embed MLflow/OpenWebUI in the admin panel. They are external services; link out to them. The admin shell links to
o.alogins.net/mlflow,ai.alogins.net. - Don't
nats.publish()directly from feature code. All publishes go through the in-processBus(services/api/src/events/bus.ts); the NATS adapter (events/nats.ts) bridges every publish to JetStream whenNATS_URLis set. This keeps subscribers, the ring-buffer tail used by the admin event viewer, and JetStream all in lockstep.
Admin app
apps/admin rewrites /api/* → $NEXT_PUBLIC_API_URL/api/* via next.config.ts. So apiFetch('/admin/stats') in apps/admin/src/lib/api.ts hits the Express backend, not a Next.js route.
Running tsc --noEmit -p apps/admin/tsconfig.json always reports Cannot find module 'next' errors — expected outside the Next.js build context; use next build for real type errors.
Auth / session pattern
Sessions use an sid cookie. Admin routes stack requireAuth (sets req.userId) then requireAdmin (checks role = 'admin' in DB). Token-based admin auth: POST /api/auth/token with { token } matching ADMIN_TOKEN env var sets the sid cookie — used by Playwright and CI.