chore: remove Airflow completely from the stack
Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
20
CLAUDE.md
20
CLAUDE.md
@@ -42,7 +42,7 @@ packages/ shared libraries (importable across services + apps)
|
||||
ml/ Python — separate deployable from day one
|
||||
serving/ online scorer (FastAPI), called by recommender
|
||||
features/ feature definitions + store adapter
|
||||
pipelines/ batch feature + training DAGs (Prefect/Airflow)
|
||||
pipelines/ batch feature + training scripts
|
||||
registry/ MLflow model registry integration
|
||||
experiments/ assignment + A/B + bandit policies
|
||||
notebooks/ research only; never imported by production code
|
||||
@@ -65,7 +65,7 @@ docs/ architecture notes, ADRs, API specs
|
||||
- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
|
||||
- ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
|
||||
- No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
|
||||
- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving), `mlops` (adds MLflow + Airflow), `ai` (adds Ollama + LiteLLM). Mix as needed.
|
||||
- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving), `mlops` (adds MLflow), `ai` (adds Ollama + LiteLLM). Mix as needed.
|
||||
|
||||
## Definition of done (per feature)
|
||||
|
||||
@@ -98,9 +98,19 @@ Ollama and LiteLLM are **shared Agap services**, not oO services — they live i
|
||||
|
||||
## Current phase
|
||||
|
||||
**M1 shipped. M2 (AI tips) in progress.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`.
|
||||
**M1 shipped (core + admin). M2 (AI tips) in progress.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`.
|
||||
|
||||
Active work: bandit promotion (#99 — offline sim + ADR-0012 pending) and M2 issues (#61 freshness SLAs, #78 signal abstraction, #93 model benchmark).
|
||||
Recent completions (M1 add-on):
|
||||
- ADR-0012 — ε-greedy v2 promotion (profile features, D=12) — 2026-04-26
|
||||
- Offline sim framework + MLflow integration — shipped in M1 add-on
|
||||
- Token-based admin auth for Playwright/CI — secured auth boundary
|
||||
|
||||
Active work (M2):
|
||||
- Signal abstraction for multi-source support (#78)
|
||||
- Per-user feature freshness SLAs (#61, ADR-0011 phase B)
|
||||
- LLM context assembler + tip generation scaffold (#79, #88)
|
||||
- Model benchmarking for tip generation (#93)
|
||||
- Admin UX refinements: feedback consolidation, settings placement (#100–102)
|
||||
|
||||
## What NOT to do
|
||||
|
||||
@@ -110,7 +120,7 @@ Active work: bandit promotion (#99 — offline sim + ADR-0012 pending) and M2 is
|
||||
- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
|
||||
- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
|
||||
- Don't call LLMs directly from application code. All LLM calls go through `ml/serving` (Python) via `LITELLM_URL`. The TS recommender never holds a model name.
|
||||
- Don't embed MLflow/Airflow/OpenWebUI in the admin panel. They are external services; link out to them. The admin shell links to `o.alogins.net/mlflow`, `/airflow`, `ai.alogins.net`.
|
||||
- Don't embed MLflow/OpenWebUI in the admin panel. They are external services; link out to them. The admin shell links to `o.alogins.net/mlflow`, `ai.alogins.net`.
|
||||
- Don't `nats.publish()` directly from feature code. All publishes go through the in-process `Bus` (`services/api/src/events/bus.ts`); the NATS adapter (`events/nats.ts`) bridges every publish to JetStream when `NATS_URL` is set. This keeps subscribers, the ring-buffer tail used by the admin event viewer, and JetStream all in lockstep.
|
||||
|
||||
## Admin app
|
||||
|
||||
Reference in New Issue
Block a user