New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks: 1. collect.py — generates candidates, logs to MLflow 2. export_for_judge — exports pending runs for Claude Code scoring 3. compare — generates leaderboard by (model, prompt) cell Config via dag_run.conf supports all collect.py options (models, prompts, n_tips, n_scenarios, temperature, experiment name, max_model_b). New admin API endpoints (`services/api/src/routes/bench.ts`): - GET /api/bench/experiments — list tip-bench-* experiments - POST /api/bench/run — trigger DAG with custom config - GET /api/bench/runs/:experiment — list runs in experiment - GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt) All endpoints require admin auth. Human judge (Claude Code) scores are applied manually post-export; future enhancement: add webhook to DAG. Admin UI can now trigger and monitor benchmarks from a dashboard panel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
services/api
Express BFF that serves all client-facing routes, manages sessions, runs background signal sync, and proxies admin calls to ml/serving.
Contract
GET /health { ok: true }
POST /api/auth/login → redirect to Google OAuth
GET /api/auth/callback OAuth return URL
POST /api/auth/logout
GET /api/auth/session → { user? }
POST /api/auth/token { token } → set sid cookie (ADMIN_TOKEN auth)
GET /api/integrations list connected integrations
POST /api/integrations/todoist/connect start Todoist OAuth
GET /api/integrations/todoist/callback
DELETE /api/integrations/:provider disconnect
POST /api/recommend → { tip }
POST /api/tip/:id/feedback { action } → { ok }
GET /api/user/profile
DELETE /api/user account deletion
POST /api/push/subscribe
DELETE /api/push/subscribe
GET /api/admin/stats DAU/WAU, feedback breakdown
GET /api/admin/users
GET /api/admin/events recent event stream (ring buffer)
GET /api/admin/sim/runs offline sim run list
POST /api/admin/sim/run launch offline sim
GET /api/admin/sim/runs/:id/output tail sim stdout
...
GET /api/ml/* admin-only proxy to ml/serving
Middleware stack (request order)
cors— origin limited toWEB_BASE_URLtracingMiddleware— reads or generates W3Ctraceparent; setsreq.traceId+req.traceparentpinoHttp— structured JSON request/response logs withtraceIdfield;/healthsuppressedexpress.json()/cookieParsersessionMiddleware— validatessidcookie, attachesreq.userId
Observability
Logs are structured JSON via pino. Every line includes traceId (extracted from the incoming W3C traceparent header, or generated fresh). The same traceparent is forwarded on all outbound HTTP calls to ml/serving so traces correlate end-to-end.
Sentry error capture is active when SENTRY_DSN is set.
Background tasks
- Todoist sync scheduler — runs every
TODOIST_SYNC_INTERVAL_MS(default 15 min); starts 10 s after boot to avoid startup surge. - Retention purge — deletes
tipScoresandtipFeedbackrows older than 30 days; runs on boot and daily. - Profile TTL invalidation — listens to
signals.task.syncedandsignals.tip.feedbackon the in-process Bus; invalidates cached user-level profile features so the next/recommendgets fresh values.
Config
| Env var | Default | Description |
|---|---|---|
PORT |
3001 |
Listen port |
NODE_ENV |
development |
Environment label |
DATABASE_PATH |
./data/oo.db |
SQLite file |
SESSION_SECRET |
required | Cookie signing secret |
GOOGLE_CLIENT_ID/SECRET |
required | OAuth |
TODOIST_CLIENT_ID/SECRET |
required | OAuth |
API_BASE_URL |
http://localhost:3001 |
Self-referential redirect URI |
WEB_BASE_URL |
http://localhost:3000 |
CORS + post-login redirect |
ML_SERVING_URL |
http://localhost:8000 |
ml/serving base URL |
NATS_URL |
`` | NATS broker; empty = in-process bus only |
TODOIST_SYNC_INTERVAL_MS |
900000 |
Background sync cadence |
TIP_PROMPT_VERSION |
`` | Prompt variant(s) for /generate |
LOG_LEVEL |
info |
pino log level |
SENTRY_DSN |
`` | Sentry DSN; empty = Sentry disabled |
VAPID_* |
Web push keys | |
ADMIN_TOKEN |
`` | Static token for service/Playwright admin auth; empty = disabled |
Health story
GET /health returns { ok: true }. No dependency checks — upstream deps (ml/serving, NATS) have their own health endpoints checked separately.
Extraction criteria
Extract to its own host when:
- Auth session management needs a dedicated Redis/PG session store, or
- Background sync load (Todoist, future connectors) displaces API serving on the shared host, or
- Team boundary emerges between auth/BFF and recommender orchestration.