feat: MLOps external services, AI stack planning, admin MLOps hub
Infrastructure: - Add `mlops` compose profile: MLflow (basic-auth, /mlflow path) + Airflow (LocalExecutor, /airflow path) + airflow-db - infra/mlflow/basic_auth.ini for MLflow auth config - Caddy routes /mlflow* and /airflow* inside existing o.alogins.net block (see agap_git) - Dockerfile.admin: NEXT_PUBLIC_MLFLOW_URL / NEXT_PUBLIC_AIRFLOW_URL build args (default /mlflow, /airflow) Admin panel: - /admin/models: replace MLflow iframe with external link cards - /admin/experiments: replace LinUCB stats with MLOps hub (links to MLflow experiments/models + Airflow DAGs/datasets) - AdminShell: external nav links for MLflow ↗ and Airflow ↗ under MLOps section Docs & planning: - README: new AI stack section (Ollama/LiteLLM/OpenWebUI three-tier, tip generation pipeline, model aliases) - README: Phase 2 expanded with AI infra issues (#86-#93) and granular pipeline breakdown - README: Phase 4 expanded with LLM MLOps items (#94-#97) - CLAUDE.md: AI stack section, updated current phase (M1 shipped / M2 in progress), compose profiles, updated What NOT to do - docs/architecture/overview.md: AI stack section, updated decision flow diagram for Phase 2 LLM pipeline - ADR-0006: updated to reflect external services (path-based, not embedded) - Gitea issues #86-#97 created (M2: AI infra + pipeline; M4: LLM MLOps) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -28,15 +28,16 @@ Same stack as `apps/web`. Reuses `packages/shared-types`, the Auth.js session co
|
||||
| Heavy grids | **TanStack Table v8** | Sortable / paginated / virtualized tables for events, users, tips. |
|
||||
| Extra charts | **Recharts** | Fallback where Tremor falls short (histograms, distributions). |
|
||||
|
||||
### Embed, don't rebuild
|
||||
### Link out, don't embed
|
||||
|
||||
Specialized tooling is **reverse-proxied into the admin shell**, not reimplemented:
|
||||
Specialized MLOps tooling runs as **separate external services** with their own auth, linked from the admin shell — not embedded or reimplemented:
|
||||
|
||||
- **MLflow UI** → `/admin/models` (Caddy sub-path proxy)
|
||||
- **Grafana panels** → `/admin/infra` (iframed or embedded panels)
|
||||
- **MLflow** → `https://o.alogins.net/mlflow` — experiment tracking, model registry, artifact browser; own basic-auth for now; see M3 for SSO consolidation
|
||||
- **Airflow** → `https://o.alogins.net/airflow` — batch pipeline orchestration, dataset management; own web-auth for now
|
||||
- **Grafana panels** → `/admin/infra` (iframed panels) — infra metrics
|
||||
- **Marimo notebooks** → launch-out link from admin
|
||||
|
||||
This prevents reimplementing artifact browsers or graph renderers we'd never do as well.
|
||||
The admin shell links to these services; clicking them opens a new tab. The `/experiments` and `/models` admin pages are hub pages with direct links to the relevant MLflow/Airflow views.
|
||||
|
||||
### AuthZ
|
||||
|
||||
@@ -55,5 +56,7 @@ This prevents reimplementing artifact browsers or graph renderers we'd never do
|
||||
|
||||
- One more Next.js app in the monorepo. Build/dev added to Turborepo.
|
||||
- Tremor + shadcn/ui are added as dependencies. shadcn components are copied into `apps/admin/src/components/ui/` — no runtime version coupling.
|
||||
- MLflow and Grafana must be reachable from the Caddy reverse proxy; they are not embedded in the JS bundle.
|
||||
- MLflow (`o.alogins.net/mlflow*` → port 5000) and Airflow (`o.alogins.net/airflow*` → port 8080) are path-based routes in the existing `o.alogins.net` Caddy block, started via `docker compose --profile mlops up`.
|
||||
- Each service manages its own auth (MLflow: built-in basic-auth; Airflow: built-in web UI auth). M3 will consolidate both behind the shared OIDC provider.
|
||||
- The `NEXT_PUBLIC_MLFLOW_URL` and `NEXT_PUBLIC_AIRFLOW_URL` build args in `Dockerfile.admin` default to the production URLs; override for dev builds.
|
||||
- `admin_actions` audit log grows unboundedly — needs a retention policy before M4.
|
||||
|
||||
@@ -46,21 +46,42 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as
|
||||
- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
|
||||
- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
|
||||
- **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
|
||||
- **MLflow** for model registry; artifacts in MinIO/S3.
|
||||
- **MLflow** for model registry and experiment tracking; deployed at `o.alogins.net/mlflow`.
|
||||
- **Airflow** for batch pipelines; deployed at `o.alogins.net/airflow`.
|
||||
- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
|
||||
- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
|
||||
|
||||
## Decision flow for a new tip
|
||||
## AI stack
|
||||
|
||||
All LLM inference routes through **LiteLLM** (`llm.alogins.net`) backed by **Ollama** (local, `localhost:11434`). This means:
|
||||
- Model aliases (`tip-generator`, `embedder`, `judge`) decouple code from model names.
|
||||
- Swapping qwen2.5 → llama3.2 = one-line config change in LiteLLM, zero code change in oO.
|
||||
- Cloud fallback (Anthropic) is opt-in and gated behind `ANTHROPIC_API_KEY` — used only in offline simulation.
|
||||
|
||||
**OpenWebUI** (`ai.alogins.net`) is the human-facing interface for prompt iteration and model testing during development.
|
||||
|
||||
## Decision flow for a new tip (Phase 2 target)
|
||||
|
||||
```
|
||||
client ─► gateway ─► recommender
|
||||
│
|
||||
├─► candidates: integrations.fetchCandidates(user) + advice.library
|
||||
├─► context: FeatureAssembler(user, request)
|
||||
├─► policy: PolicyRegistry.get(policyName).pick(candidates, context)
|
||||
├─► shadows: run shadow policies in parallel, log their picks
|
||||
└─► persist: TipInstance{context_snapshot, policy, tip}
|
||||
◄─ tip
|
||||
client ─► gateway ─► recommender (TS)
|
||||
│
|
||||
▼
|
||||
ml/serving (Python)
|
||||
│
|
||||
├─► context: ml/features/context.py
|
||||
│ (tasks + reactions + time patterns → prompt)
|
||||
│
|
||||
├─► generate: LiteLLM → Ollama
|
||||
│ → N TipCandidates {content, kind, model, prompt_version}
|
||||
│
|
||||
├─► score: bandit policy scores each candidate
|
||||
│
|
||||
├─► shadows: shadow policies log picks without serving
|
||||
│
|
||||
└─► persist: tip_scores {candidate, policy, features, latency}
|
||||
◄─ best TipCandidate
|
||||
```
|
||||
|
||||
Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain.
|
||||
**Phase 1 (current):** candidates come from Todoist task list, no LLM. The bandit scores tasks directly.
|
||||
|
||||
Feedback: `POST /feedback → events.emit(reaction)` → online bandit update + `prompt_version` tracked for A/B analysis.
|
||||
|
||||
Reference in New Issue
Block a user