oO/CLAUDE.md

# oO — Project Instructions

## What this is

**oO** is a recommendation system for personal tips. It collects signals across a user's life (tasks, habits, calendar, mood, context) to build a rich profile and deliver **one** perfectly-timed tip — an advice or a todo — that feels like magic.

The magic is the product. Precision + timing + minimalism. The UI shows a single black page with one tip. The complexity lives behind it.

## Prime directives

1. **Modular by package, deployable by stage.** Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = **modular monolith + Python ML sidecar**. See ADR-0003.
2. **Recommendation engine is the core.** Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
3. **Python owns ML.** Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason.
4. **OAuth-first for identity and integrations.** Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently.
5. **Privacy is a feature, not a phase.** Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed.
6. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. The tip page is a watch face.

## Architecture (high level)

The tree below is **logical module structure**. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003).

```
apps/              user-facing clients
  web/             Next.js PWA — the first shipped client
  mobile-ios/      Swift/SwiftUI (Phase 3)
  mobile-android/  Kotlin/Compose (Phase 3)

services/          backend modules — each owns a contract; may share a deployable
  gateway/         BFF for clients; auth check; fan-out
  auth/            OAuth (Google, Apple, ...), sessions, JWT issuance
  profile/         user profile, preferences, consents
  integrations/    third-party connectors + token vault (Todoist first)
  recommender/     orchestration: candidates → policy → tip; feedback sink
  events/          event bus ingress + durable signal store
  notifier/        push/email/web delivery (web push from Phase 1)

packages/          shared libraries (importable across services + apps)
  shared-types/    HTTP types via OpenAPI; event types via protobuf (ADR-0005)
  sdk-js/          client SDK used by web + mobile webviews
  ui/              shared React components + design tokens

ml/                Python — separate deployable from day one
  serving/         online scorer (FastAPI), called by recommender
  features/        feature definitions + store adapter
  pipelines/       batch feature + training scripts
  registry/        MLflow model registry integration
  experiments/     assignment + A/B + bandit policies
  notebooks/       research only; never imported by production code

infra/             docker-compose (Phase 0), k3s/k8s (later), terraform, CI
docs/              architecture notes, ADRs, API specs
```

**Phase 0 deployables:** one Node process (`services/*` bundled via modular monolith) + one Python process (`ml/serving`, stubbed until M1) + Postgres + NATS. Services **extract to their own process** when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003.

## Contracts between modules

- **HTTP** (OpenAPI, in `packages/shared-types/http/`) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical.
- **Events** (Protocol Buffers, in `packages/shared-types/events/`) — durable signals + feedback. Today: in-process `Bus` with a `onPublish` bridge to NATS JetStream when `NATS_URL` is set (ADR-0010). The in-proc bus stays the source of truth — JetStream is the durable mirror that cross-process consumers (`ml/serving`, future feature pipelines) tail. Proto schemas (ADR-0005) live in `packages/shared-types/events/oo/events/v1/`; `buf lint` + `buf breaking` run in CI on every PR touching those files (`.gitea/workflows/buf-check.yaml`).
- Do not redefine types per module. Regenerate from `shared-types`.

## Conventions

- Each module ships a `README.md` describing its contract, its `/health` story, and its extraction criteria (when it should become its own process).
- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
- ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
- No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving + nats), `mlops` (adds MLflow), `ai` (adds Ollama + LiteLLM). Mix as needed. Always pass `--profile <name>` to `build`/`up` — without a profile, no services are selected and builds silently do nothing.
- Docker rebuild: use `--force-recreate` on `up` when only env vars changed (no image rebuild needed); new env vars in `.env.local` are not picked up by a running container until it is recreated.
- Docker rebuild gotchas:
  - **Never run two `docker compose up --build` at once** — both grab the same `--mount=type=cache,id=pnpm` and deadlock on the API's `pnpm --prod deploy` step. Symptom: build sits silent for hours on `[api builder 8/8]`. Before starting any build, check `ps aux | grep "docker compose"` and kill any prior `up --build` (`kill -9 <pid>` — the wrapper bash and the docker compose binary are separate PIDs; kill the docker compose one).
  - **Don't add `--offline` to `pnpm --prod deploy`** — pnpm's metadata cache (`/root/.cache/pnpm/`) is not in the `/pnpm/store` cache mount, so `--offline` fails with `ERR_PNPM_NO_OFFLINE_META` for transitive devDeps (e.g. vite via vitest). Leave the deploy step network-on; it works.
  - **All TS Dockerfiles need `python3 make g++`** in the base stage — `better-sqlite3` rebuilds natively on install. Missing from `Dockerfile.admin` historically caused `gyp ERR! find Python` failures.
  - **A clean build of `--profile core` takes ~3 min total** when the buildx cache is warm. If it's been silent for >10 min, check for the parallel-build deadlock above before assuming "still going".
- Run Python agent tests: `python3 -m pytest ml/agents/tests/ -x -q` (tests add repo root to `sys.path` themselves).
- Run Python feature tests: `python3 -m pytest ml/features/ -x -q`
- `ml/features/` files are Python mirrors of TS registries — TS is source of truth. Tests parse `registry.ts` with regex to detect drift; follow the same pattern whenever a new field is added to `ProfileFeature`.

## Definition of done (per feature)

1. Code + tests merged.
2. Module's `README.md` updated.
3. If it changes a contract → `shared-types` regenerated + consumers updated.
4. If it changes architecture → ADR added.
5. Deployable via `docker compose up` locally.
6. If it touches user data → a deletion path exists and is tested.

## AI stack

oO generates tips through a multi-agent pipeline (ADR-0013): pre-compute agents emit prompt snippets, an orchestrator LLM assembles them into one tip. All LLM calls route through **LiteLLM** at `llm.alogins.net` using model aliases — swapping models is a config change, not a code change.

| Alias | Model | Used by |
|-------|-------|---------|
| `tip-generator` | qwen2.5:1.5b (default) | `ml/serving` tip generation |
| `embedder` | nomic-embed-text | task clustering, dedup |
| `judge` | claude-haiku-4-5 (cloud, eval only) | offline sim |

Env vars: `LITELLM_URL` (prod `https://llm.alogins.net`), `OLLAMA_URL` (Agap host, `http://host.docker.internal:11434` from containers).

Ollama and LiteLLM are **shared Agap services**, not oO services — they live in `agap_git/openai/docker-compose.yml` along with langfuse (observability). oO never starts them; ml-serving just calls the alias.

All `httpx` calls in `ml/` must use `trust_env=False` to bypass the system proxy — same rule as `bw` and curl. Pattern: `httpx.Client(trust_env=False, timeout=N)`.

MLflow container-to-container calls: always pass `host_header="localhost"` to `MLflowClient` — MLflow's `--allowed-hosts` rejects `Host: mlflow` (the container DNS name) with 403. Auth credential is `MLFLOW_ADMIN_PASSWORD`. MLflow REST API lives at the origin root, not under the `/mlflow` UI prefix.

### MLflow API versions — runs vs traces

MLflow uses **two API versions** — use the right one or you'll get 405:

| What | API prefix | Example |
|------|-----------|---------|
| Runs, experiments, metrics | `/api/2.0/mlflow/` | `runs/search`, `experiments/list` |
| Traces (LLM observability) | `/api/3.0/mlflow/traces/` | `traces/{trace_id}` |

**Experiment IDs:** `3` = oO/serving. Artifacts stored as run tags prefixed `artifact:<path>`.

### Querying from the host shell

Always strip the proxy and pass `Host: localhost` (no port — `localhost:5000` fails the DNS-rebinding check).

```bash
# Search recent runs (experiment 3)
env -u HTTPS_PROXY -u HTTP_PROXY -u ALL_PROXY -u https_proxy -u http_proxy -u all_proxy \
  curl -s -H "Host: localhost" -u "admin:${MLFLOW_ADMIN_PASSWORD}" \
  -X POST http://localhost:5000/api/2.0/mlflow/runs/search \
  -H "Content-Type: application/json" \
  -d '{"experiment_ids":["3"],"max_results":5,"order_by":["start_time DESC"]}'

# Get a trace by ID (note: /api/3.0/, not /api/2.0/)
env -u HTTPS_PROXY -u HTTP_PROXY -u ALL_PROXY -u https_proxy -u http_proxy -u all_proxy \
  curl -s -H "Host: localhost" -u "admin:${MLFLOW_ADMIN_PASSWORD}" \
  http://localhost:5000/api/3.0/mlflow/traces/tr-<trace_id> | python3 -m json.tool
```

The trace response includes `trace_metadata.mlflow.traceInputs/Outputs`, `trace_metadata.mlflow.trace.sizeStats` (num_spans), and `tags.mlflow.traceName`.

### Getting spans (Python client from inside the container)

The REST API has **no endpoint for spans** — `/api/3.0/mlflow/traces/{id}/spans` returns 404. Use the Python client inside `oo-ml-serving-1`:

```bash
docker exec oo-ml-serving-1 python3 -c "
import mlflow, json, os
mlflow.set_tracking_uri('http://mlflow:5000')
os.environ['MLFLOW_TRACKING_USERNAME'] = 'admin'
os.environ['MLFLOW_TRACKING_PASSWORD'] = os.environ.get('MLFLOW_ADMIN_PASSWORD', '')

client = mlflow.tracking.MlflowClient()
trace = client.get_trace('tr-<trace_id>')
for span in trace.data.spans:
    print(span.name, '| parent:', span.parent_id, '| status:', span.status)
    print('  inputs:', json.dumps(span.inputs)[:200])
    print('  outputs:', json.dumps(span.outputs)[:200])
    print('  attrs:', span.attributes)
"
```

### Span structure for a tip generation trace

A healthy `recommend` trace has 3 spans:

| Span | Type | Parent | Key attributes |
|------|------|--------|---------------|
| `recommend` | CHAIN | (root) | `agent_count`, `latency_ms`; inputs include `agent_ids` list |
| `build_context` | TOOL | recommend | `agent_count`, `task_count`, `science_destiny` |
| `llm_orchestrator` | LLM | recommend | `prompt_tokens`, `completion_tokens`, `model`, `attempts` |

### Diagnosing "no agents in trace"

If the trace shows `agent_ids: []` and `agent_count: 0` in the root span, and the orchestrator prompt says *"No pre-computed agent context available"*, it means the recommender found zero eligible snippets at request time. Causes:

1. **Agent compute hasn't run** — no `agent_outputs` rows for this user yet
2. **Snippets expired** — TTL elapsed since last compute
3. **Eligibility filter dropped all agents** — none passed the manifest-driven check

Diagnose with:
```bash
docker exec oo-api-1 psql "$DATABASE_URL" -c \
  "SELECT agent_id, computed_at, expires_at FROM agent_outputs WHERE user_id='<uid>' ORDER BY computed_at DESC LIMIT 10;"
```

**Multi-agent tip generation pipeline (ADR-0013):**
1. Pre-compute agents (`ml/agents/<id>/`) run on a schedule, each emitting a snippet into `agent_outputs` with a per-agent TTL
2. On request, `recommender` (TS) loads the eligible agent set (registry-driven, ADR-0014) and pulls the freshest non-expired snippets
3. `POST /recommend` in `ml/serving` assembles the orchestrator prompt (`v4-orchestrator`) and calls LiteLLM via the `tip-generator` alias
4. Returned tip is logged in `tip_scores` with the contributing agent set; reaction is logged for observability (no bandit reward loop)

## Current phase

**M1 shipped (core + admin). M2 (AI tips) in progress.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`.

Recent completions:
- ADR-0013 — multi-agent recommendation: pre-computed agent snippets + orchestrator LLM (replaces ε-greedy bandit) — 2026-05-01
- LLM context assembler + tip generation scaffold (#79, #88)
- Model benchmarking for tip generation (#93, #95)
- Admin UX refinements: feedback consolidation, settings placement (#100–102)
- ADR-0012 — ε-greedy v2 (D=12) — 2026-04-26 (now superseded by ADR-0013)
- ADR-0014 complete: unified Profile schema + backfill, manifest plumbing, `/api/profile` read-through, registry-driven eligibility filter, inference framework + per-agent inference, legacy consent column drop — 2026-05-05
- Rich per-agent inference for all four active agents (#112, #114, #115, #116) — 2026-05-06: quiet/peak hours (time-of-day), z-score baseline (momentum), p50 lateness + project realness (overdue-task), adaptive lookback + weekly/daily cycles (recent-patterns)
- Semantic task clustering via nomic-embed-text + focus-area preferred_areas inference (#97, #113) — 2026-05-06: `ml/agents/clustering.py`, focus-area v2.0.0

- Per-user feature freshness SLAs (#61) — 2026-05-06: `invalidated_by` mirrored into `ProfileFeature`; drift-detection test added
- MLflow tracing added to `ml/serving` for all agent calls — 2026-05-06: `ml/serving/mlflow_client.py`; activated by `MLFLOW_TRACKING_URI=http://mlflow:5000` (default in compose `full` profile); requires `--profile mlops` for the MLflow container. Issue #118 (M4) tracks removal from production critical path.

Active work (M2): *(all M2 items complete — see README for M3 planning)*

## ADR-0014 endpoint map (as of step 6)

| Endpoint | Purpose |
|----------|---------|
| `GET /api/profile` | Read-through: user globals + prefs (by scope) + consents + contexts |
| `PATCH /api/profile/prefs/:scope` | Upsert user_preferences rows (source='user') |
| `PATCH /api/profile/consents` | Grant / revoke consent keys |
| `PATCH /api/profile/contexts` | Create / activate / deactivate named contexts |
| `GET /api/agents/registry` | Manifest list (proxy to ml/serving; 60 s cache) |
| `POST /api/agents/:agentId/compute` | Internal: run agent compute for (user, agent) |
| `POST /agents/{agent_id}/infer` *(ml/serving)* | Run inference framework → `{inferred_prefs}` |

## Inference framework (ADR-0014 §3)

Lives in `ml/agents/inference/`. `run_inference(manifest, history)` evaluates all `InferredParam` entries in the manifest and returns `{key: value}`. Rules:
- Below `min_history` → emit `cold_start_default`
- `infer()` error → emit `cold_start_default` (never crashes)
- Results written to `user_preferences` with `source='inferred'`; keys with `source='user'` are never overwritten

All five agents are at v1.2.0. Per-agent inferred params (all live in `ml/agents/<name>.py`):

| Agent | Inferred params | Notes |
|-------|----------------|-------|
| `time-of-day` | `preferred_hour`, `quiet_start`, `quiet_end`, `peak_hours`, `tz` | Quiet window = longest below-baseline hour run; peak = top-quartile done hours; tz cold-start only (from auth provider) |
| `momentum` | `engagement_trend`, `baseline_completions_per_day`, `stdev` | Baseline = 28d rolling mean done/day; snippet uses z-score language |
| `overdue-task` | `lateness_tolerance_days`, `project_realness` | Tolerance = p50 lateness from TaskCompletion history; realness = project median vs global median |
| `recent-patterns` | `lookback_days`, `weekly_cycle`, `daily_cycle` | Lookback sized to ≥30 done events; cycles use peak-to-mean ratio; snippet hints when strength > 0.5 |
| `focus-area` | `preferred_areas` | Top-2 project IDs by task completion count; semantic clustering via `ml/agents/clustering.py` in compute() |

`UserHistory` carries both `events: list[FeedbackEvent]` and `task_completions: list[TaskCompletion]`. `AgentInferRequest` (ml/serving) accepts `task_completions: list[dict]` alongside `feedback_history`.

`min_history` is checked against `len(history.events)` (feedback events), **not** `task_completions`. Agents that infer from completions should set `min_history=0` and guard inside `infer()`.

## What NOT to do

- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
- Don't implement auth by hand. Auth.js behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
- Don't hardwire a recommender. The contract is `POST /recommend → {tip}`. Swap internals (multi-agent orchestrator today, future LLM/hybrid variants), keep contract.
- Don't hardcode the agent list. The orchestrator is registry-driven (ADR-0014); adding/removing an agent is a manifest change in `ml/agents/<id>/`, never a recommender edit.
- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
- Don't call LLMs directly from application code. All LLM calls go through `ml/serving` (Python) via `LITELLM_URL`. The TS recommender never holds a model name.
- Don't embed MLflow/OpenWebUI in the admin panel. They are external services; link out to them. The admin shell links to `o.alogins.net/mlflow`, `ai.alogins.net`.
- Don't `nats.publish()` directly from feature code. All publishes go through the in-process `Bus` (`services/api/src/events/bus.ts`); the NATS adapter (`events/nats.ts`) bridges every publish to JetStream when `NATS_URL` is set. This keeps subscribers, the ring-buffer tail used by the admin event viewer, and JetStream all in lockstep.

## Admin app

`apps/admin` rewrites `/api/*` → `$NEXT_PUBLIC_API_URL/api/*` via `next.config.ts`. So `apiFetch('/admin/stats')` in `apps/admin/src/lib/api.ts` hits the Express backend, not a Next.js route.

Running `tsc --noEmit -p apps/admin/tsconfig.json` always reports `Cannot find module 'next'` errors — expected outside the Next.js build context; use `next build` for real type errors.

## Auth / session pattern

Sessions use an `sid` cookie. Admin routes stack `requireAuth` (sets `req.userId`) then `requireAdmin` (checks `role = 'admin'` in DB). Token-based admin auth: `POST /api/auth/token` with `{ token }` matching `ADMIN_TOKEN` env var sets the `sid` cookie — used by Playwright and CI.