Files
oO/services
alvis 0474ad4deb feat(airflow): integrate bench harness into bench_collect DAG
New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks:
1. collect.py — generates candidates, logs to MLflow
2. export_for_judge — exports pending runs for Claude Code scoring
3. compare — generates leaderboard by (model, prompt) cell

Config via dag_run.conf supports all collect.py options (models, prompts,
n_tips, n_scenarios, temperature, experiment name, max_model_b).

New admin API endpoints (`services/api/src/routes/bench.ts`):
- GET /api/bench/experiments — list tip-bench-* experiments
- POST /api/bench/run — trigger DAG with custom config
- GET /api/bench/runs/:experiment — list runs in experiment
- GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt)

All endpoints require admin auth. Human judge (Claude Code) scores are
applied manually post-export; future enhancement: add webhook to DAG.

Admin UI can now trigger and monitor benchmarks from a dashboard panel.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 11:54:30 +00:00
..

services/

Backend modules. Each owns a contract and ships its own README.md. In Phase 0 these are internal packages inside a single Node process (ADR-0003); they extract to their own processes as pressure justifies.

Dir Role Phase-0 shape Extracts when
gateway/ BFF for clients; auth check; fan-out in-proc router never (stays as the edge)
auth/ Google OAuth (Apple in M1), sessions, JWT Auth.js behind OIDC shape mobile native ships (M3)
profile/ user profile, preferences, consents in-proc module team ownership diverges
integrations/ connectors + encrypted token vault in-proc module credential blast-radius isolation
recommender/ POST /recommend — policy-driven tip selection in-proc; calls ml/serving from M1 scaling hotspot
events/ event bus + signal log in-proc emitter; bridges to NATS JetStream when NATS_URL set (ADR-0010) always a library + broker, not a service
notifier/ push/email delivery + quiet hours in-proc; web push in M1 SLA divergence or mobile push scale

Contracts that cross module lines (HTTP or events) come from packages/shared-types/. In-module imports across modules are forbidden by import lint.