Files
oO/docs/architecture/overview.md
alvis 85367aeaa0 feat: MLOps external services, AI stack planning, admin MLOps hub
Infrastructure:
- Add `mlops` compose profile: MLflow (basic-auth, /mlflow path) + Airflow (LocalExecutor, /airflow path) + airflow-db
- infra/mlflow/basic_auth.ini for MLflow auth config
- Caddy routes /mlflow* and /airflow* inside existing o.alogins.net block (see agap_git)
- Dockerfile.admin: NEXT_PUBLIC_MLFLOW_URL / NEXT_PUBLIC_AIRFLOW_URL build args (default /mlflow, /airflow)

Admin panel:
- /admin/models: replace MLflow iframe with external link cards
- /admin/experiments: replace LinUCB stats with MLOps hub (links to MLflow experiments/models + Airflow DAGs/datasets)
- AdminShell: external nav links for MLflow ↗ and Airflow ↗ under MLOps section

Docs & planning:
- README: new AI stack section (Ollama/LiteLLM/OpenWebUI three-tier, tip generation pipeline, model aliases)
- README: Phase 2 expanded with AI infra issues (#86-#93) and granular pipeline breakdown
- README: Phase 4 expanded with LLM MLOps items (#94-#97)
- CLAUDE.md: AI stack section, updated current phase (M1 shipped / M2 in progress), compose profiles, updated What NOT to do
- docs/architecture/overview.md: AI stack section, updated decision flow diagram for Phase 2 LLM pipeline
- ADR-0006: updated to reflect external services (path-based, not embedded)
- Gitea issues #86-#97 created (M2: AI infra + pipeline; M4: LLM MLOps)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:20:44 +00:00

5.1 KiB

Architecture overview

Guiding constraints

  • The recommendation decision is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
  • Modularity lives in code boundaries. Deploy topology follows pressure, not anticipation (ADR-0003).
  • Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
  • Privacy is a Phase-0 feature, not a Phase-5 compliance project (see privacy.md).

Modules

Module Language Responsibility Owns data Phase-0 process
gateway TS BFF for web/mobile; auth-check; fan-out Node monolith
auth TS OAuth (Google; Apple in M1), sessions, JWT identities, sessions Node monolith
profile TS user profile, preferences, consents profiles Node monolith
integrations TS third-party connectors, token vault, signal fetch credentials, cursors Node monolith
events TS event-bus abstraction + durable log (M1) signal store Node monolith (in-proc emitter)
recommender TS orchestration: candidates → policy → tip; feedback sink tip history Node monolith
notifier TS push/email delivery, quiet hours, dedupe delivery log Node monolith (web push in M1)
ml/serving Python online scoring for policies/models — (stateless) separate process
ml/pipelines Python batch feature + training pipelines feature store, models separate (from M4)

Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). ml/serving is pre-extracted on language grounds.

Data boundaries

Each service owns its schema; no cross-service DB access. When recommender needs profile data, it calls profile (read model), not its DB.

Event flow

connector (integrations) ──emit──▶ events ──▶ feature pipelines (ml)
                                     │
                                     └──▶ recommender (context assembly)

User reactions (done / snooze / dismiss) are events too. They close the loop as rewards for bandit/RL policies.

Why these choices

  • Modular monolith + Python ML in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
  • NATS JetStream over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
  • Postgres for OLTP; per-module schemas in dev; separate databases once modules extract.
  • FastAPI + Pydantic for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
  • Protobuf for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
  • OpenAPI for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
  • Feast for feature store when we get there; homegrown adapter until then (Phase 1 seam).
  • MLflow for model registry and experiment tracking; deployed at o.alogins.net/mlflow.
  • Airflow for batch pipelines; deployed at o.alogins.net/airflow.
  • Auth.js embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
  • k3s as the first step beyond docker-compose — no "compose → full k8s" cliff.

AI stack

All LLM inference routes through LiteLLM (llm.alogins.net) backed by Ollama (local, localhost:11434). This means:

  • Model aliases (tip-generator, embedder, judge) decouple code from model names.
  • Swapping qwen2.5 → llama3.2 = one-line config change in LiteLLM, zero code change in oO.
  • Cloud fallback (Anthropic) is opt-in and gated behind ANTHROPIC_API_KEY — used only in offline simulation.

OpenWebUI (ai.alogins.net) is the human-facing interface for prompt iteration and model testing during development.

Decision flow for a new tip (Phase 2 target)

client ─► gateway ─► recommender (TS)
                          │
                          ▼
                     ml/serving (Python)
                          │
                          ├─► context:    ml/features/context.py
                          │               (tasks + reactions + time patterns → prompt)
                          │
                          ├─► generate:   LiteLLM → Ollama
                          │               → N TipCandidates {content, kind, model, prompt_version}
                          │
                          ├─► score:      bandit policy scores each candidate
                          │
                          ├─► shadows:    shadow policies log picks without serving
                          │
                          └─► persist:    tip_scores {candidate, policy, features, latency}
                          ◄─  best TipCandidate

Phase 1 (current): candidates come from Todoist task list, no LLM. The bandit scores tasks directly.

Feedback: POST /feedback → events.emit(reaction) → online bandit update + prompt_version tracked for A/B analysis.