docs: update CLAUDE.md with session learnings (#118 tracing, compose gotchas)

- Clarify compose profile requirement for build/up (silent no-op without --profile) - Add --force-recreate pattern for env-var-only changes - Document MLflow host_header and auth gotchas for container-to-container calls - Record MLflow tracing addition and #118 M4 tracking issue Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:41:57 +00:00
parent 95e1b342b4
commit c124ff4d24
1 changed files with 5 additions and 1 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -65,7 +65,8 @@ docs/              architecture notes, ADRs, API specs
 - One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
 - ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
 - No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving), `mlops` (adds MLflow), `ai` (adds Ollama + LiteLLM). Mix as needed.
+- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving + nats), `mlops` (adds MLflow), `ai` (adds Ollama + LiteLLM). Mix as needed. Always pass `--profile <name>` to `build`/`up` — without a profile, no services are selected and builds silently do nothing.
+- Docker rebuild: use `--force-recreate` on `up` when only env vars changed (no image rebuild needed); new env vars in `.env.local` are not picked up by a running container until it is recreated.
 - Run Python agent tests: `python3 -m pytest ml/agents/tests/ -x -q` (tests add repo root to `sys.path` themselves).
 - Run Python feature tests: `python3 -m pytest ml/features/ -x -q`
 - `ml/features/` files are Python mirrors of TS registries — TS is source of truth. Tests parse `registry.ts` with regex to detect drift; follow the same pattern whenever a new field is added to `ProfileFeature`.
@@ -95,6 +96,8 @@ Ollama and LiteLLM are **shared Agap services**, not oO services — they live i

 All `httpx` calls in `ml/` must use `trust_env=False` to bypass the system proxy — same rule as `bw` and curl. Pattern: `httpx.Client(trust_env=False, timeout=N)`.

+MLflow container-to-container calls: always pass `host_header="localhost"` to `MLflowClient` — MLflow's `--allowed-hosts` rejects `Host: mlflow` (the container DNS name) with 403. Auth credential is `MLFLOW_ADMIN_PASSWORD`. MLflow REST API lives at the origin root (`/api/2.0/mlflow`), not under the `/mlflow` UI prefix.
+
 **Multi-agent tip generation pipeline (ADR-0013):**
 1. Pre-compute agents (`ml/agents/<id>/`) run on a schedule, each emitting a snippet into `agent_outputs` with a per-agent TTL
 2. On request, `recommender` (TS) loads the eligible agent set (registry-driven, ADR-0014) and pulls the freshest non-expired snippets
@@ -116,6 +119,7 @@ Recent completions:
 - Semantic task clustering via nomic-embed-text + focus-area preferred_areas inference (#97, #113) — 2026-05-06: `ml/agents/clustering.py`, focus-area v2.0.0

 - Per-user feature freshness SLAs (#61) — 2026-05-06: `invalidated_by` mirrored into `ProfileFeature`; drift-detection test added
+- MLflow tracing added to `ml/serving` for all agent calls — 2026-05-06: `ml/serving/mlflow_client.py`; activated by `MLFLOW_TRACKING_URI=http://mlflow:5000` (default in compose `full` profile); requires `--profile mlops` for the MLflow container. Issue #118 (M4) tracks removal from production critical path.

 Active work (M2): *(all M2 items complete — see README for M3 planning)*