chore: remove Airflow completely from the stack

Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-03 16:38:46 +00:00
parent ce1c8bde57
commit f8d66aa01f
27 changed files with 663 additions and 719 deletions

View File

@@ -22,11 +22,19 @@ Two ways to sign in:
| Route | Description |
|-------|-------------|
| `/` | Overview: DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel |
| `/users` | User list (paginated) |
| `/users/:id` | User detail: identity, consents, integrations, profile features (#81 phase B), tip stats, reward history; revoke-integration + reset-bandit + rebuild-profile actions |
| `/audit` | Admin action audit log |
| `/events` | Event stream viewer (stub — pending API history endpoint) |
| `/reward-analytics` | Reaction distribution + per-policy / per-model / per-prompt-version / per-tip-kind breakdowns with avg reward |
| `/users` | User list (paginated, searchable) |
| `/users/:id` | User detail: identity, consents, integrations, profile features (completion rate, dismiss rate, dwell, preferred hour, tip volume), tip stats, reward history; revoke-integration + reset-bandit + rebuild-profile actions |
| `/audit` | Admin action audit log with timestamps and descriptions |
| `/events` | Live event stream viewer with filters by subject/user/time; tail of `signals.*` from ring buffer or NATS JetStream |
| `/features` | Feature store browser: features sent to `ml/serving` per scoring call; freshness status; per-feature SLA tracking |
| `/tips` | Served tips explorer: tip content, score, policy, model, feedback reactions; per-user timeline |
| `/reward-analytics` | Reaction distribution + per-policy / per-model / per-prompt-version breakdowns with avg reward; time-series and cohort slicing |
| `/data-quality` | Missing-feature rate heatmap, stale-token rate, daily completeness, per-feature freshness SLA status |
| `/health` | System health rollup: api, ml/serving, SQLite, event-bus, MLflow with 15s auto-refresh |
| `/sql` | Read-only SQL runner against SQLite; saved queries support; sunsets to Superset in M4 |
| `/simulate` | Offline simulation runner: launch `ml/experiments/sim`, track runs, judge selection, policy comparison |
| `/docs` | Admin documentation and ops runbooks inline |
| `/ops` | Operational dashboard (deprecation candidate; pending UX refinement #107) |
## Dev
@@ -40,8 +48,9 @@ pnpm --filter @oo/admin dev # starts on :3080
Stays as a Next.js app in the monorepo permanently — it's not a candidate for extraction.
It gets richer (more pages, embedded MLflow/Grafana) but not split.
## Known issues
## Known issues & pending improvements
- `@tremor/react 3.x` declares a peer dep on React 18; the workspace uses React 19.
Works in practice. Will resolve naturally when Tremor ships React 19 support or when
we switch to Tremor v4 (which targets React 18+).
- UX refinements pending (#100102): feedback options consolidation, config page UI migration, settings UI placement