# Architecture overview ## Guiding constraints - The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip. - Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003). - Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005). - Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`). ## Modules | Module | Language | Responsibility | Owns data | Phase-0 process | |---|---|---|---|---| | `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith | | `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith | | `profile` | TS | user profile, preferences, consents | profiles | Node monolith | | `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith | | `events` | TS | event-bus abstraction + durable log | signal store | Node monolith (in-proc emitter, bridges to NATS JetStream when `NATS_URL` set) | | `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith | | `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) | | `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** | | `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) | Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds. ## Data boundaries Each service owns its schema; no cross-service DB access. When `recommender` needs profile data, it calls `profile` (read model), not its DB. ## Event flow ``` connector (integrations) ──emit──▶ events ──▶ feature pipelines (ml) │ └──▶ recommender (context assembly) ``` User reactions (done / snooze / dismiss) are events too. They close the loop as rewards for bandit/RL policies. ## Why these choices - **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003). - **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it. - **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract. - **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it. - **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this. - **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few. - **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam). - **MLflow** for model registry and experiment tracking; deployed at `o.alogins.net/mlflow`. - **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships. - **Multi-agent recommendation** (ADR-0013) — pre-compute agents emit prompt snippets, an orchestrator LLM produces the tip. Replaced the ε-greedy bandit (ADR-0007/0012) for explainability, cold-start, and decoupling generation from selection. - **Registry-driven agents + unified Profile** (ADR-0014) — agents are plugins with declared manifests; per-user prefs, contexts, and per-key consents live in shared tables; auto-inferred parameters share a common framework. Adding an agent is a manifest change. - **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff. ## AI stack All LLM inference routes through **LiteLLM** (`llm.alogins.net`) backed by **Ollama** (local, `localhost:11434`). This means: - Model aliases (`tip-generator`, `embedder`, `judge`) decouple code from model names. - Swapping qwen2.5 → llama3.2 = one-line config change in LiteLLM, zero code change in oO. - Cloud fallback (Anthropic) is opt-in and gated behind `ANTHROPIC_API_KEY` — used only in offline simulation. **OpenWebUI** (`ai.alogins.net`) is the human-facing interface for prompt iteration and model testing during development. ## Decision flow for a new tip (M2, ADR-0013 + ADR-0014) ``` ┌────────────────────────────────────────────────┐ │ Pre-compute (every 15 min, per registered agent) │ │ ml/agents/ → prompt snippet → agent_outputs │ │ TTL per manifest; agent_version invalidates │ └────────────────────────────────────────────────┘ client ─► gateway ─► recommender (TS) │ ├─► profile: GET /api/profile │ (user, prefs, active context, consents) │ ├─► registry: GET /api/agents/registry │ (manifests; eligibility filter inputs) │ ├─► outputs: pull freshest non-expired agent_outputs │ for eligible agents (consents granted, │ not silenced by active context, enabled) │ ▼ ml/serving (Python) │ ├─► assemble: v4-orchestrator prompt │ = global prefs + active context + snippets │ ├─► generate: LiteLLM → Ollama → one tip │ └─► persist: tip_scores {tip, contributing agents, prompt_version, llm_model, latency} ◄─ tip ``` **Evolution:** - **Phase 1 (M1):** candidates from Todoist; ε-greedy bandit scored tasks directly (ADR-0007, ADR-0012). Superseded. - **Phase 2 early (M2):** LLM-generated candidates ranked by bandit. Superseded mid-milestone. - **Phase 2 current (M2):** multi-agent pipeline (ADR-0013), registry-driven and registry-extensible (ADR-0014). No bandit; the orchestrator LLM reasons over named agent snippets. Feedback: `POST /feedback → events.emit(reaction)`. No online ML reward loop (ADR-0013 §Consequences); reactions are logged in `tip_feedback` for observability and potential future supervised learning.