Files
oO/CLAUDE.md
alvis ad6747c242 feat(profile): /api/profile + eligibility filter + inference framework (ADR-0014 steps 4-6)
Step 4 — /api/profile read-through API:
  GET  /api/profile          → { user, prefs, consents, contexts }
  PATCH /api/profile/prefs/:scope  upsert user_preferences (source='user')
  PATCH /api/profile/consents      grant / revoke consent keys
  PATCH /api/profile/contexts      create / activate / deactivate contexts
  Legacy consentGiven bit folded in as data:core fallback.

Step 5 — registry-driven eligibility filter:
  fetchRegistry() exported from agent-registry.ts.
  profile/eligibility.ts: getEligibleAgentIds(userId) — filters by required
  consents, silenced_in_contexts, and user_preferences[enabled=false].
  fetchOrchestratorTip filters agent_outputs to eligible set before calling
  ml/serving /recommend. Fail-closed: registry unavailable → empty set.

Step 6 — shared context-inference framework (#111) + time-of-day proof (#112):
  ml/agents/inference/: UserHistory, FeedbackEvent, run_inference().
  Framework: cold-start, min_history gating, error fallback, structured logs.
  TimeOfDayAgent v1.1.0: inferred_params=[preferred_hour]; also reads
  quiet_start/quiet_end from agent_prefs. agent_prefs injected by TS caller.
  AgentInput gains agent_prefs field.
  ml/serving: POST /agents/{agent_id}/infer endpoint.
  agent-outputs.ts computeAndStore: loads prefs before compute, calls /infer
  after, persists results (source='inferred'); user overrides never touched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:14:25 +00:00

159 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# oO — Project Instructions
## What this is
**oO** is a recommendation system for personal tips. It collects signals across a user's life (tasks, habits, calendar, mood, context) to build a rich profile and deliver **one** perfectly-timed tip — an advice or a todo — that feels like magic.
The magic is the product. Precision + timing + minimalism. The UI shows a single black page with one tip. The complexity lives behind it.
## Prime directives
1. **Modular by package, deployable by stage.** Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = **modular monolith + Python ML sidecar**. See ADR-0003.
2. **Recommendation engine is the core.** Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
3. **Python owns ML.** Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason.
4. **OAuth-first for identity and integrations.** Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently.
5. **Privacy is a feature, not a phase.** Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed.
6. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. The tip page is a watch face.
## Architecture (high level)
The tree below is **logical module structure**. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003).
```
apps/ user-facing clients
web/ Next.js PWA — the first shipped client
mobile-ios/ Swift/SwiftUI (Phase 3)
mobile-android/ Kotlin/Compose (Phase 3)
services/ backend modules — each owns a contract; may share a deployable
gateway/ BFF for clients; auth check; fan-out
auth/ OAuth (Google, Apple, ...), sessions, JWT issuance
profile/ user profile, preferences, consents
integrations/ third-party connectors + token vault (Todoist first)
recommender/ orchestration: candidates → policy → tip; feedback sink
events/ event bus ingress + durable signal store
notifier/ push/email/web delivery (web push from Phase 1)
packages/ shared libraries (importable across services + apps)
shared-types/ HTTP types via OpenAPI; event types via protobuf (ADR-0005)
sdk-js/ client SDK used by web + mobile webviews
ui/ shared React components + design tokens
ml/ Python — separate deployable from day one
serving/ online scorer (FastAPI), called by recommender
features/ feature definitions + store adapter
pipelines/ batch feature + training scripts
registry/ MLflow model registry integration
experiments/ assignment + A/B + bandit policies
notebooks/ research only; never imported by production code
infra/ docker-compose (Phase 0), k3s/k8s (later), terraform, CI
docs/ architecture notes, ADRs, API specs
```
**Phase 0 deployables:** one Node process (`services/*` bundled via modular monolith) + one Python process (`ml/serving`, stubbed until M1) + Postgres + NATS. Services **extract to their own process** when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003.
## Contracts between modules
- **HTTP** (OpenAPI, in `packages/shared-types/http/`) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical.
- **Events** (Protocol Buffers, in `packages/shared-types/events/`) — durable signals + feedback. Today: in-process `Bus` with a `onPublish` bridge to NATS JetStream when `NATS_URL` is set (ADR-0010). The in-proc bus stays the source of truth — JetStream is the durable mirror that cross-process consumers (`ml/serving`, future feature pipelines) tail. Proto schemas (ADR-0005) live in `packages/shared-types/events/oo/events/v1/`; `buf lint` + `buf breaking` run in CI on every PR touching those files (`.gitea/workflows/buf-check.yaml`).
- Do not redefine types per module. Regenerate from `shared-types`.
## Conventions
- Each module ships a `README.md` describing its contract, its `/health` story, and its extraction criteria (when it should become its own process).
- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
- ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
- No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving), `mlops` (adds MLflow), `ai` (adds Ollama + LiteLLM). Mix as needed.
## Definition of done (per feature)
1. Code + tests merged.
2. Module's `README.md` updated.
3. If it changes a contract → `shared-types` regenerated + consumers updated.
4. If it changes architecture → ADR added.
5. Deployable via `docker compose up` locally.
6. If it touches user data → a deletion path exists and is tested.
## AI stack
oO generates tips through a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM assembles them into one tip. All LLM calls route through **LiteLLM** at `llm.alogins.net` using model aliases — swapping models is a config change, not a code change.
| Alias | Model | Used by |
|-------|-------|---------|
| `tip-generator` | qwen2.5:1.5b (default) | `ml/serving` tip generation |
| `embedder` | nomic-embed-text | task clustering, dedup |
| `judge` | claude-haiku-4-5 (cloud, eval only) | offline sim |
Env vars: `LITELLM_URL` (prod `https://llm.alogins.net`), `OLLAMA_URL` (Agap host, `http://host.docker.internal:11434` from containers).
Ollama and LiteLLM are **shared Agap services**, not oO services — they live in `agap_git/openai/docker-compose.yml` along with langfuse (observability). oO never starts them; ml-serving just calls the alias.
**Multi-agent tip generation pipeline (ADR-0013):**
1. Pre-compute agents (`ml/agents/<id>/`) run on a schedule, each emitting a snippet into `agent_outputs` with a per-agent TTL
2. On request, `recommender` (TS) loads the eligible agent set (registry-driven, ADR-0014) and pulls the freshest non-expired snippets
3. `POST /recommend` in `ml/serving` assembles the orchestrator prompt (`v4-orchestrator`) and calls LiteLLM via the `tip-generator` alias
4. Returned tip is logged in `tip_scores` with the contributing agent set; reaction is logged for observability (no bandit reward loop)
## Current phase
**M1 shipped (core + admin). M2 (AI tips) in progress.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`.
Recent completions:
- ADR-0013 — multi-agent recommendation: pre-computed agent snippets + orchestrator LLM (replaces ε-greedy bandit) — 2026-05-01
- LLM context assembler + tip generation scaffold (#79, #88)
- Model benchmarking for tip generation (#93, #95)
- Admin UX refinements: feedback consolidation, settings placement (#100102)
- ADR-0012 — ε-greedy v2 (D=12) — 2026-04-26 (now superseded by ADR-0013)
- ADR-0014 steps 16: unified Profile schema + backfill, manifest plumbing, `/api/profile` read-through, registry-driven eligibility filter, inference framework + time-of-day migration — 2026-05-05
Active work (M2):
- ADR-0014 step 7 — per-agent inference: focus-area (#113), momentum (#114), overdue-task (#115), recent-patterns (#116)
- ADR-0014 step 8 — drop `users.consentGiven` column
- Signal abstraction for multi-source support (#78)
- Per-user feature freshness SLAs (#61, ADR-0011 phase B)
## ADR-0014 endpoint map (as of step 6)
| Endpoint | Purpose |
|----------|---------|
| `GET /api/profile` | Read-through: user globals + prefs (by scope) + consents + contexts |
| `PATCH /api/profile/prefs/:scope` | Upsert user_preferences rows (source='user') |
| `PATCH /api/profile/consents` | Grant / revoke consent keys |
| `PATCH /api/profile/contexts` | Create / activate / deactivate named contexts |
| `GET /api/agents/registry` | Manifest list (proxy to ml/serving; 60 s cache) |
| `POST /api/agents/:agentId/compute` | Internal: run agent compute for (user, agent) |
| `POST /agents/{agent_id}/infer` *(ml/serving)* | Run inference framework → `{inferred_prefs}` |
## Inference framework (ADR-0014 §3)
Lives in `ml/agents/inference/`. `run_inference(manifest, history)` evaluates all `InferredParam` entries in the manifest and returns `{key: value}`. Rules:
- Below `min_history` → emit `cold_start_default`
- `infer()` error → emit `cold_start_default` (never crashes)
- Results written to `user_preferences` with `source='inferred'`; keys with `source='user'` are never overwritten
Time-of-day agent (`1.1.0`) is the proof agent (#112): infers `preferred_hour` (mode done-hour) and reads `quiet_start`/`quiet_end` from prefs.
## What NOT to do
- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
- Don't implement auth by hand. Auth.js behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
- Don't hardwire a recommender. The contract is `POST /recommend → {tip}`. Swap internals (multi-agent orchestrator today, future LLM/hybrid variants), keep contract.
- Don't hardcode the agent list. The orchestrator is registry-driven (ADR-0014); adding/removing an agent is a manifest change in `ml/agents/<id>/`, never a recommender edit.
- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
- Don't call LLMs directly from application code. All LLM calls go through `ml/serving` (Python) via `LITELLM_URL`. The TS recommender never holds a model name.
- Don't embed MLflow/OpenWebUI in the admin panel. They are external services; link out to them. The admin shell links to `o.alogins.net/mlflow`, `ai.alogins.net`.
- Don't `nats.publish()` directly from feature code. All publishes go through the in-process `Bus` (`services/api/src/events/bus.ts`); the NATS adapter (`events/nats.ts`) bridges every publish to JetStream when `NATS_URL` is set. This keeps subscribers, the ring-buffer tail used by the admin event viewer, and JetStream all in lockstep.
## Admin app
`apps/admin` rewrites `/api/*``$NEXT_PUBLIC_API_URL/api/*` via `next.config.ts`. So `apiFetch('/admin/stats')` in `apps/admin/src/lib/api.ts` hits the Express backend, not a Next.js route.
Running `tsc --noEmit -p apps/admin/tsconfig.json` always reports `Cannot find module 'next'` errors — expected outside the Next.js build context; use `next build` for real type errors.
## Auth / session pattern
Sessions use an `sid` cookie. Admin routes stack `requireAuth` (sets `req.userId`) then `requireAdmin` (checks `role = 'admin'` in DB). Token-based admin auth: `POST /api/auth/token` with `{ token }` matching `ADMIN_TOKEN` env var sets the `sid` cookie — used by Playwright and CI.