Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- simulate/page.tsx: remove launch form — simulations are triggered via
Airflow DAG, not the admin UI. Page now shows run history + links to
Airflow and MLflow only (#109)
- docs.ts: use DOCS_ROOT env var (fallback: ../../docs for local dev) so
the path works in Docker standalone where CWD is /app (#110)
- Dockerfile.admin: copy docs/ into the runner image at /app/docs and set
DOCS_ROOT=/app/docs so listAllDocs() finds the files at runtime (#110)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose: pass ML_SERVING_URL, MLFLOW_URL, AIRFLOW_URL + creds to api service
- docker-compose: pass NEXT_PUBLIC_MLFLOW_URL/AIRFLOW_URL to admin service
- docker-compose: replace wget healthcheck with node fetch (wget not in node image)
- docker-compose: enable Airflow basic_auth API backend; add MLflow pip dep for DAGs
- Dockerfiles: tighten layer caching, add .dockerignore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a NATS JetStream consumer to ml/serving so the feature pipeline
can react to events without the API triggering every read.
- nats_consumer.py: durable push consumers for signals.> and feedback.>
streams; acks on success, naks for redeliver, up to NATS_MAX_DELIVER
attempts; per-consumer health state (last_msg_ts, processed, errors)
- main.py: FastAPI lifespan wires start/stop; /health exposes nats state
- requirements.txt: adds nats-py>=2.9.0
- Dockerfile.ml: copy all *.py from ml/serving (was missing prompts.py)
Handled subjects:
signals.task.synced → writes per-user sync metadata to STATE_DIR
signals.tip.feedback → logged for observability (reward via HTTP path)
Config: NATS_URL (empty = disabled), NATS_DURABLE_PREFIX, NATS_MAX_DELIVER
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Inside the container, llm.alogins.net times out (public-DNS route, not the
loopback path Caddy listens on). host.docker.internal:4000 reaches the Agap
LiteLLM directly and is equivalent for dev. Prod deploys override via env.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ollama and LiteLLM are shared Agap services (agap_git/openai/docker-compose.yml);
oO never starts them. Removes the ai profile, the litellm config, and the
--profile ai runbook; points ml-serving at https://llm.alogins.net by default
and adds host.docker.internal host-gateway so the container can hit Agap ollama
on the host.
Also updates the tip-generator model alias to qwen2.5:1.5b to match the model
actually pulled on Agap ollama (7b is ~4.7 GB and would blow VRAM budget).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Corrects mlflow image tag (2.14.3 → v2.14.3); the former tag does not exist
on ghcr.io/mlflow/mlflow and caused a manifest-unknown error on pull.
- Replaces wget/curl healthchecks with inline python urllib calls — the
python:3.12-slim (ml-serving) and ghcr.io/mlflow/mlflow images ship
neither wget nor curl, so both containers reported unhealthy despite
/health returning 200.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Issue 21 — event infrastructure:
- NormalizedEvent<T> + payload types in packages/shared-types/src/events/
- Bus.onPublish() hook for side-effect bridges
- NATS JetStream adapter (services/api/src/events/nats.ts): connects when
NATS_URL is set, creates signals.> and feedback.> streams, bridges all
in-process bus publishes to JetStream — no-ops gracefully when NATS is absent
- NATS service added to docker-compose (profile: events|full, port 4222/8222)
Issue 22 — Todoist background sync:
- services/api/src/signals/scheduler.ts: queries all active-token users every
15 min (TODOIST_SYNC_INTERVAL_MS), fan-out via todoistSource.fetchSignals()
which emits signals.task.synced; on-demand fetch remains as freshness fallback
- NATS_URL + TODOIST_SYNC_INTERVAL_MS added to config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>