Issues closed: #86, #87, #88, #89, #90, #91, #79, #80, #82 infra: - docker-compose `ai` profile: Ollama + LiteLLM services - infra/litellm/litellm_config.yaml: tip-generator / embedder / judge aliases - .env.example: LITELLM_URL, LITELLM_MASTER_KEY, OLLAMA_URL ml/serving: - POST /generate: calls LiteLLM tip-generator alias, returns TipCandidate[] - JSON retry loop (2 retries with correction prompt on malformed response) - _parse_llm_json strips markdown fences ml/features: - context.py: build_context() assembles user signals → PromptContext (sorts overdue/high-priority tasks first for LLM prompt quality) shared-types: - TipKind, TipSource, TipCandidate types - Tip gains kind + rationale fields services/api: - recommender: 3-stage pipeline (assemble → score → serve) Stage 1: Todoist tasks + LLM candidates fetched in parallel Stage 2: egreedy bandit scores merged candidate pool Stage 3: serve + log with prompt_version, llm_model, tip_kind - tip_scores: prompt_version, llm_model, tip_kind columns + migrations - config: LITELLM_URL added - integrations: surface token_status in /integrations response tests: - ml/serving/tests/test_generate.py: 13 tests (retry, 502/503, fence variants) - ml/features/test_context.py: 9 tests (sorting, edge cases) - services/api recommender.unit.test.ts: 16 pure-function tests (inferReward, dueAgeDays) - services/api recommender.test.ts: 4 integration tests (tip_scores columns, LLM fallback) - shared-types: TipCandidate, rationale, full TipFeedback action set docs: - ADR-0008: LiteLLM AI gateway decision - overview.md: M2 pipeline description updated - ml/README.md: serving + features roles updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.4 KiB
Architecture overview
Guiding constraints
- The recommendation decision is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
- Modularity lives in code boundaries. Deploy topology follows pressure, not anticipation (ADR-0003).
- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see
privacy.md).
Modules
| Module | Language | Responsibility | Owns data | Phase-0 process |
|---|---|---|---|---|
gateway |
TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith |
auth |
TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
profile |
TS | user profile, preferences, consents | profiles | Node monolith |
integrations |
TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
events |
TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
recommender |
TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
notifier |
TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
ml/serving |
Python | online scoring for policies/models | — (stateless) | separate process |
ml/pipelines |
Python | batch feature + training pipelines | feature store, models | separate (from M4) |
Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). ml/serving is pre-extracted on language grounds.
Data boundaries
Each service owns its schema; no cross-service DB access. When recommender needs profile data, it calls profile (read model), not its DB.
Event flow
connector (integrations) ──emit──▶ events ──▶ feature pipelines (ml)
│
└──▶ recommender (context assembly)
User reactions (done / snooze / dismiss) are events too. They close the loop as rewards for bandit/RL policies.
Why these choices
- Modular monolith + Python ML in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
- NATS JetStream over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
- Postgres for OLTP; per-module schemas in dev; separate databases once modules extract.
- FastAPI + Pydantic for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
- Protobuf for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
- OpenAPI for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
- Feast for feature store when we get there; homegrown adapter until then (Phase 1 seam).
- MLflow for model registry and experiment tracking; deployed at
o.alogins.net/mlflow. - Airflow for batch pipelines; deployed at
o.alogins.net/airflow. - Auth.js embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
- k3s as the first step beyond docker-compose — no "compose → full k8s" cliff.
AI stack
All LLM inference routes through LiteLLM (llm.alogins.net) backed by Ollama (local, localhost:11434). This means:
- Model aliases (
tip-generator,embedder,judge) decouple code from model names. - Swapping qwen2.5 → llama3.2 = one-line config change in LiteLLM, zero code change in oO.
- Cloud fallback (Anthropic) is opt-in and gated behind
ANTHROPIC_API_KEY— used only in offline simulation.
OpenWebUI (ai.alogins.net) is the human-facing interface for prompt iteration and model testing during development.
Decision flow for a new tip (Phase 2 target)
client ─► gateway ─► recommender (TS)
│
▼
ml/serving (Python)
│
├─► context: ml/features/context.py
│ (tasks + reactions + time patterns → prompt)
│
├─► generate: LiteLLM → Ollama
│ → N TipCandidates {content, kind, model, prompt_version}
│
├─► score: bandit policy scores each candidate
│
├─► shadows: shadow policies log picks without serving
│
└─► persist: tip_scores {candidate, policy, features, latency}
◄─ best TipCandidate
Phase 1 (shipped M1): candidates come from Todoist task list, no LLM. The bandit scores tasks directly.
Phase 2 (shipped M2): LLM candidates are generated in parallel with Todoist fetch. Both pools are merged, scored by the bandit, and the winner served. tip_scores tracks prompt_version, llm_model, and tip_kind for every row.
Feedback: POST /feedback → events.emit(reaction) → online bandit update + prompt_version tracked for A/B analysis.