refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0

- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
2026-04-13 14:36:11 +00:00
parent cf4c7a0eb4
commit 7f173f88d3
13 changed files with 449 additions and 133 deletions
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -3,22 +3,25 @@
 ## Guiding constraints

 - The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
- Services are small and independently deployable, but we do **not** multiply services for its own sake. Split by team-of-ownership and by data lifecycle.
- Python for ML, TypeScript for applications, shared contracts regenerated from a single source of truth.
+- Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003).
+- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
+- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`).

-## Services
+## Modules

-| Service | Language | Responsibility | Owns data |
-|---|---|---|---|
-| `gateway` | TS (Node) | BFF for web/mobile; auth-checking; request fan-out | — |
-| `auth` | TS | OAuth (Google, Apple), sessions, token issuance | identities, sessions |
-| `profile` | TS | user profile, preferences, consents | profiles |
-| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors |
-| `events` | TS | event-bus ingress, normalization, durable log | signal store |
-| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history |
-| `ml/serving` | Python | online scoring for policies/models | — (stateless) |
-| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models |
-| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log |
+| Module | Language | Responsibility | Owns data | Phase-0 process |
+|---|---|---|---|---|
+| `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith |
+| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
+| `profile` | TS | user profile, preferences, consents | profiles | Node monolith |
+| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
+| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
+| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
+| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
+| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** |
+| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) |
+
+Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds.

 ## Data boundaries

@@ -36,9 +39,28 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as

 ## Why these choices

- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4.
- **Postgres** everywhere for OLTP. Per-service schemas, not per-service instances in dev.
+- **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
+- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
+- **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract.
 - **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
+- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
+- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
 - **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
 - **MLflow** for model registry; artifacts in MinIO/S3.
- **Auth.js or Ory** for identity — we will not write crypto.
+- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
+- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
+
+## Decision flow for a new tip
+
+```
+client ─► gateway ─► recommender
+                       │
+                       ├─► candidates:   integrations.fetchCandidates(user)  + advice.library
+                       ├─► context:      FeatureAssembler(user, request)
+                       ├─► policy:       PolicyRegistry.get(policyName).pick(candidates, context)
+                       ├─► shadows:      run shadow policies in parallel, log their picks
+                       └─► persist:      TipInstance{context_snapshot, policy, tip}
+                       ◄─  tip
+```
+
+Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain.