Infrastructure: - Add `mlops` compose profile: MLflow (basic-auth, /mlflow path) + Airflow (LocalExecutor, /airflow path) + airflow-db - infra/mlflow/basic_auth.ini for MLflow auth config - Caddy routes /mlflow* and /airflow* inside existing o.alogins.net block (see agap_git) - Dockerfile.admin: NEXT_PUBLIC_MLFLOW_URL / NEXT_PUBLIC_AIRFLOW_URL build args (default /mlflow, /airflow) Admin panel: - /admin/models: replace MLflow iframe with external link cards - /admin/experiments: replace LinUCB stats with MLOps hub (links to MLflow experiments/models + Airflow DAGs/datasets) - AdminShell: external nav links for MLflow ↗ and Airflow ↗ under MLOps section Docs & planning: - README: new AI stack section (Ollama/LiteLLM/OpenWebUI three-tier, tip generation pipeline, model aliases) - README: Phase 2 expanded with AI infra issues (#86-#93) and granular pipeline breakdown - README: Phase 4 expanded with LLM MLOps items (#94-#97) - CLAUDE.md: AI stack section, updated current phase (M1 shipped / M2 in progress), compose profiles, updated What NOT to do - docs/architecture/overview.md: AI stack section, updated decision flow diagram for Phase 2 LLM pipeline - ADR-0006: updated to reflect external services (path-based, not embedded) - Gitea issues #86-#97 created (M2: AI infra + pipeline; M4: LLM MLOps) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
88 lines
5.1 KiB
Markdown
88 lines
5.1 KiB
Markdown
# Architecture overview
|
|
|
|
## Guiding constraints
|
|
|
|
- The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
|
|
- Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003).
|
|
- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
|
|
- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`).
|
|
|
|
## Modules
|
|
|
|
| Module | Language | Responsibility | Owns data | Phase-0 process |
|
|
|---|---|---|---|---|
|
|
| `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith |
|
|
| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
|
|
| `profile` | TS | user profile, preferences, consents | profiles | Node monolith |
|
|
| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
|
|
| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
|
|
| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
|
|
| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
|
|
| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** |
|
|
| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) |
|
|
|
|
Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds.
|
|
|
|
## Data boundaries
|
|
|
|
Each service owns its schema; no cross-service DB access. When `recommender` needs profile data, it calls `profile` (read model), not its DB.
|
|
|
|
## Event flow
|
|
|
|
```
|
|
connector (integrations) ──emit──▶ events ──▶ feature pipelines (ml)
|
|
│
|
|
└──▶ recommender (context assembly)
|
|
```
|
|
|
|
User reactions (done / snooze / dismiss) are events too. They close the loop as rewards for bandit/RL policies.
|
|
|
|
## Why these choices
|
|
|
|
- **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
|
|
- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
|
|
- **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract.
|
|
- **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
|
|
- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
|
|
- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
|
|
- **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
|
|
- **MLflow** for model registry and experiment tracking; deployed at `o.alogins.net/mlflow`.
|
|
- **Airflow** for batch pipelines; deployed at `o.alogins.net/airflow`.
|
|
- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
|
|
- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
|
|
|
|
## AI stack
|
|
|
|
All LLM inference routes through **LiteLLM** (`llm.alogins.net`) backed by **Ollama** (local, `localhost:11434`). This means:
|
|
- Model aliases (`tip-generator`, `embedder`, `judge`) decouple code from model names.
|
|
- Swapping qwen2.5 → llama3.2 = one-line config change in LiteLLM, zero code change in oO.
|
|
- Cloud fallback (Anthropic) is opt-in and gated behind `ANTHROPIC_API_KEY` — used only in offline simulation.
|
|
|
|
**OpenWebUI** (`ai.alogins.net`) is the human-facing interface for prompt iteration and model testing during development.
|
|
|
|
## Decision flow for a new tip (Phase 2 target)
|
|
|
|
```
|
|
client ─► gateway ─► recommender (TS)
|
|
│
|
|
▼
|
|
ml/serving (Python)
|
|
│
|
|
├─► context: ml/features/context.py
|
|
│ (tasks + reactions + time patterns → prompt)
|
|
│
|
|
├─► generate: LiteLLM → Ollama
|
|
│ → N TipCandidates {content, kind, model, prompt_version}
|
|
│
|
|
├─► score: bandit policy scores each candidate
|
|
│
|
|
├─► shadows: shadow policies log picks without serving
|
|
│
|
|
└─► persist: tip_scores {candidate, policy, features, latency}
|
|
◄─ best TipCandidate
|
|
```
|
|
|
|
**Phase 1 (current):** candidates come from Todoist task list, no LLM. The bandit scores tasks directly.
|
|
|
|
Feedback: `POST /feedback → events.emit(reaction)` → online bandit update + `prompt_version` tracked for A/B analysis.
|