chore: remove Airflow completely from the stack

Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:38:46 +00:00
parent ce1c8bde57
commit f8d66aa01f
27 changed files with 663 additions and 719 deletions
--- a/apps/admin/README.md
+++ b/apps/admin/README.md
@@ -22,11 +22,19 @@ Two ways to sign in:
 | Route | Description |
 |-------|-------------|
 | `/` | Overview: DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel |
-| `/users` | User list (paginated) |
-| `/users/:id` | User detail: identity, consents, integrations, profile features (#81 phase B), tip stats, reward history; revoke-integration + reset-bandit + rebuild-profile actions |
-| `/audit` | Admin action audit log |
-| `/events` | Event stream viewer (stub — pending API history endpoint) |
-| `/reward-analytics` | Reaction distribution + per-policy / per-model / per-prompt-version / per-tip-kind breakdowns with avg reward |
+| `/users` | User list (paginated, searchable) |
+| `/users/:id` | User detail: identity, consents, integrations, profile features (completion rate, dismiss rate, dwell, preferred hour, tip volume), tip stats, reward history; revoke-integration + reset-bandit + rebuild-profile actions |
+| `/audit` | Admin action audit log with timestamps and descriptions |
+| `/events` | Live event stream viewer with filters by subject/user/time; tail of `signals.*` from ring buffer or NATS JetStream |
+| `/features` | Feature store browser: features sent to `ml/serving` per scoring call; freshness status; per-feature SLA tracking |
+| `/tips` | Served tips explorer: tip content, score, policy, model, feedback reactions; per-user timeline |
+| `/reward-analytics` | Reaction distribution + per-policy / per-model / per-prompt-version breakdowns with avg reward; time-series and cohort slicing |
+| `/data-quality` | Missing-feature rate heatmap, stale-token rate, daily completeness, per-feature freshness SLA status |
+| `/health` | System health rollup: api, ml/serving, SQLite, event-bus, MLflow with 15s auto-refresh |
+| `/sql` | Read-only SQL runner against SQLite; saved queries support; sunsets to Superset in M4 |
+| `/simulate` | Offline simulation runner: launch `ml/experiments/sim`, track runs, judge selection, policy comparison |
+| `/docs` | Admin documentation and ops runbooks inline |
+| `/ops` | Operational dashboard (deprecation candidate; pending UX refinement #107) |

 ## Dev

@@ -40,8 +48,9 @@ pnpm --filter @oo/admin dev   # starts on :3080
 Stays as a Next.js app in the monorepo permanently — it's not a candidate for extraction.
 It gets richer (more pages, embedded MLflow/Grafana) but not split.

-## Known issues
+## Known issues & pending improvements

 - `@tremor/react 3.x` declares a peer dep on React 18; the workspace uses React 19.
  Works in practice. Will resolve naturally when Tremor ships React 19 support or when
  we switch to Tremor v4 (which targets React 18+).
+- UX refinements pending (#100–102): feedback options consolidation, config page UI migration, settings UI placement