refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0

- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
2026-04-13 14:36:11 +00:00
parent cf4c7a0eb4
commit 7f173f88d3
13 changed files with 449 additions and 133 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,66 +8,73 @@ The magic is the product. Precision + timing + minimalism. The UI shows a single

 ## Prime directives

-1. **Modular, service-oriented from day one.** Even the prototype. We will scale to mobile (iOS/Android), many integrations, multi-tenant ML. Shortcuts that bake in a monolith are not acceptable.
-2. **Recommendation engine is the core.** Every other service feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
-3. **Python owns ML.** Everything training, features, serving for models is Python (FastAPI + PyTorch/scikit + MLflow/feast). Application services are TypeScript (Node, Next.js) unless there's a reason.
+1. **Modular by package, deployable by stage.** Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = **modular monolith + Python ML sidecar**. See ADR-0003.
+2. **Recommendation engine is the core.** Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
+3. **Python owns ML.** Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason.
 4. **OAuth-first for identity and integrations.** Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently.
-5. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished.
+5. **Privacy is a feature, not a phase.** Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed.
+6. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. The tip page is a watch face.

 ## Architecture (high level)

+The tree below is **logical module structure**. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003).
+
 ```
 apps/              user-facing clients
  web/             Next.js PWA — the first shipped client
  mobile-ios/      Swift/SwiftUI (Phase 3)
  mobile-android/  Kotlin/Compose (Phase 3)

-services/          backend microservices (each independently deployable)
-  gateway/         API gateway + BFF (GraphQL or tRPC)
+services/          backend modules — each owns a contract; may share a deployable
+  gateway/         BFF for clients; auth check; fan-out
  auth/            OAuth (Google, Apple, ...), sessions, JWT issuance
  profile/         user profile, preferences, consents
-  integrations/    third-party connectors (Todoist first); token vault
-  recommender/     Python; serves the "one best tip" decision
-  events/          event bus ingress (Kafka/NATS) + signal store
-  notifier/        push/email/web delivery of tips
+  integrations/    third-party connectors + token vault (Todoist first)
+  recommender/     orchestration: candidates → policy → tip; feedback sink
+  events/          event bus ingress + durable signal store
+  notifier/        push/email/web delivery (web push from Phase 1)

-packages/          shared libraries
-  shared-types/    OpenAPI/proto-generated types
+packages/          shared libraries (importable across services + apps)
+  shared-types/    HTTP types via OpenAPI; event types via protobuf (ADR-0005)
  sdk-js/          client SDK used by web + mobile webviews
  ui/              shared React components + design tokens

-ml/                Python MLOps
-  pipelines/       training / batch feature pipelines (Airflow/Prefect)
-  features/        feature definitions (Feast-style)
-  registry/        model registry (MLflow) integration
-  experiments/     A/B testing framework + bandit policies
-  serving/         online inference service (FastAPI)
-  notebooks/       research only — not production
+ml/                Python — separate deployable from day one
+  serving/         online scorer (FastAPI), called by recommender
+  features/        feature definitions + store adapter
+  pipelines/       batch feature + training DAGs (Prefect/Airflow)
+  registry/        MLflow model registry integration
+  experiments/     assignment + A/B + bandit policies
+  notebooks/       research only; never imported by production code

-infra/             docker-compose, k8s manifests, terraform, CI
+infra/             docker-compose (Phase 0), k3s/k8s (later), terraform, CI
 docs/              architecture notes, ADRs, API specs
 ```

-## Contracts between services
+**Phase 0 deployables:** one Node process (`services/*` bundled via modular monolith) + one Python process (`ml/serving`, stubbed until M1) + Postgres + NATS. Services **extract to their own process** when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003.

- **Events** (Kafka/NATS) — source of truth for user signals. All integrations emit normalized events; the recommender reads them.
- **HTTP/gRPC** — synchronous request/response (gateway → services).
- **Shared schemas** live in `packages/shared-types`; generated from a single OpenAPI / proto source. Do not redefine types per service.
+## Contracts between modules
+
+- **HTTP** (OpenAPI, in `packages/shared-types/http/`) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical.
+- **Events** (Protocol Buffers, in `packages/shared-types/events/`) — durable signals + feedback. Today: in-process event emitter. Tomorrow: NATS JetStream. Schema registry enforced in CI (ADR-0005).
+- Do not redefine types per module. Regenerate from `shared-types`.

 ## Conventions

- Every service ships a `README.md`, a `Dockerfile`, and a `/health` endpoint.
- One PR = one concern. Commits follow conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
+- Each module ships a `README.md` describing its contract, its `/health` story, and its extraction criteria (when it should become its own process).
+- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
 - ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
 - No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
+- Compose profiles (`core`, `full`) so devs can run a subset without 16 GB of RAM.

 ## Definition of done (per feature)

 1. Code + tests merged.
-2. Service's `README.md` updated.
+2. Module's `README.md` updated.
 3. If it changes a contract → `shared-types` regenerated + consumers updated.
 4. If it changes architecture → ADR added.
 5. Deployable via `docker compose up` locally.
+6. If it touches user data → a deletion path exists and is tested.

 ## Current phase

@@ -75,7 +82,9 @@ docs/              architecture notes, ADRs, API specs

 ## What NOT to do

- Don't copy Todoist's data into our DB. Store the OAuth token; fetch on demand.
- Don't implement auth by hand. Use a library (NextAuth / Auth.js, Ory, or Clerk-compatible). We will self-host.
+- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
+- Don't implement auth by hand. Phase 0 uses **Auth.js** behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
 - Don't hardwire a recommender. The "random todo" v0 must live behind the same interface the real ML model will implement (`POST /recommend` → `{tip}`). Swap internals, keep contract.
+- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
 - Don't build an admin UI before the user-facing black page is polished.
+- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
--- a/PLAN.md
+++ b/PLAN.md
@@ -1,71 +1,85 @@
 # Implementation plan

-Step-by-step build order for Phase 0 (prototype) and the seams that make Phases 1–5 cheap.
+Step-by-step build order for Phase 0 (walking skeleton) and the seams that make Phases 1–5 cheap.

-The principle: **build the contracts first, stub the internals.** Every service should exist with a `/health` endpoint and a minimal real implementation of its interface before any service is "finished". This gives us an end-to-end walking skeleton from week one.
+The principle: **build the contracts first, stub the internals.** Every module exposes its contract and a `/health` story before any module is "finished". End-to-end walking skeleton in the first week.
+
+**Packaging reminder (ADR-0003):** Phase 0 is a modular monolith — one Node process bundles `services/*` behind their HTTP contracts, plus `ml/serving` as a separate Python process. Contracts are identical whether the call is in-process or over the wire.

 ---

 ## Stage 0 — Foundations (days 1–3)

-1. **Monorepo tooling.** pnpm workspaces for JS/TS; uv or poetry for Python; turbo or nx for build graph; pre-commit (lint, typecheck, format).
-2. **Docker Compose dev env.** Postgres, NATS, MinIO (S3), Mailhog, all services wired with hot-reload.
-3. **CI skeleton** (Gitea Actions): lint → typecheck → unit test → build → publish images.
-4. **Secrets convention.** `.env.example` per service; prod secrets injected by orchestrator.
-5. **Shared types package.** OpenAPI source → generated TS + Python clients.
+1. **Monorepo tooling.** pnpm workspaces for TS; uv for Python; turbo for build graph; pre-commit (eslint, prettier, ruff, mypy, typecheck).
+2. **Docker Compose dev env** with profiles:
+   - `core` — Node monolith + `ml/serving` stub + Postgres.
+   - `full` — adds NATS, MinIO, MailHog. Needed from Stage 4 onward.
+3. **CI skeleton** (Gitea Actions): lint → typecheck → unit → build → publish images. Schema-registry check for protobuf events (added in Phase 1, but pipeline stub now).
+4. **Secrets convention.** `.env.example` per module; prod injected by orchestrator.
+5. **Shared types.** OpenAPI for HTTP, protobuf for events (ADR-0005). Generate TS; Python pydantic models hand-written initially (few consumers).
+6. **Import-boundary lint.** `eslint-plugin-boundaries` (or equivalent) prevents `services/integrations` from importing `services/recommender` internals. Contracts-only.

-Deliverable: `docker compose up` brings a green dashboard of `/health` endpoints.
+Exit: `docker compose --profile core up` brings a green dashboard of `/health` endpoints.

 ## Stage 1 — Identity & session (days 4–7)

-1. `services/auth`: Google OAuth2 (PKCE), session cookies, short-lived JWTs, refresh rotation. Library-backed (Auth.js or Ory Kratos + Hydra) — we do not roll our own.
-2. `services/profile`: minimal `User` record; created on first sign-in.
-3. `apps/web` sign-in page; gateway verifies JWT.
+1. `services/auth` module: Auth.js embedded in the Node monolith, Google provider only (Apple deferred). OIDC-shaped surface (ADR-0004): `/me`, `/logout`, JWKS, stub `/.well-known/openid-configuration`.
+2. `services/profile` module: `User` row created on first sign-in; consent record captured with ToS/PP version hash.
+3. `apps/web` sign-in page. Gateway (also in-process) verifies JWT.
+4. **Deletion endpoint** (yes, already): `DELETE /me` — revokes sessions, flips `deleted_at`, emits `user.deletion_requested`.

-Exit check: a user can sign in and fetch their own profile.
+Exit: a user can sign in, see their profile, and delete their account; deletion is observable end-to-end even though there's no data to erase yet.

 ## Stage 2 — Integrations framework (days 8–12)

-1. `services/integrations` with a **Connector** interface:
-   - `begin_oauth(user) → redirect_url`
-   - `finish_oauth(code, state) → StoredCredential`
-   - `fetch_signals(user, since) → Event[]`
-2. **Token vault**: column-level encryption (libsodium), key from env or KMS.
-3. **Todoist connector** as the first concrete implementation.
-4. Web "Connect" page: list of connectors, button per connector, callback handling.
+1. `services/integrations` module with a **Connector** interface:
+   - `beginOAuth(user) → {redirectUrl, state}`
+   - `finishOAuth(code, state) → StoredCredential`
+   - `fetchSignals(user, since?) → AsyncIterable<NormalizedEvent>`
+   - `act?(user, action) → void`
+   - `revoke(user) → void` — first-class; no revocation means no disconnect.
+2. **Token vault**: libsodium sealed box, key from env/KMS. One row per `(user, provider)` with provider-specific `meta` (e.g. Todoist `sync_token`).
+3. **Todoist connector**: OAuth2, Sync API incremental reads via `sync_token`, `act` to complete a task, `revoke` calls Todoist's token-revocation endpoint.
+4. Web `/connect`: list of connectors, per-connector consent screen (scopes + retention), connect/disconnect.

-Exit check: a user taps "Connect Todoist", completes the OAuth dance, and the integrations service can fetch their tasks on demand.
+Exit: a user can connect and disconnect Todoist; disconnect revokes at Todoist and wipes local credentials.

 ## Stage 3 — Recommender contract (days 13–16)

-1. `services/recommender` exposes `POST /recommend {user_id, context} → {tip}`.
-2. Policy interface (`Policy.pick(user, candidates, context) → tip`).
-3. **`RandomPolicy` v0** — fetches candidates from `integrations` (Todoist tasks), returns one uniformly at random.
-4. Tip shape is provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`.
-5. `apps/web` tip page: full black, one tip centered, tap = mark done → callback fires to integrations (complete Todoist task) + emits a feedback event.
+1. `services/recommender` module exposes `POST /recommend` and `POST /feedback`.
+2. **Policy registry** keyed by name. **Candidate sources** registered independently; v0 source = `integrations.todoist.tasks`.
+3. **`RandomPolicy` v0** — draws uniformly.
+4. **Tip shape** provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`.
+5. **`TipInstance` persisted** with `context_snapshot` — the features-seen-at-decision-time blob that makes offline replay possible later.
+6. `apps/web` tip page:
+   - `kind=todo` → tap = done (calls `integrations.todoist.act(complete)`).
+   - `kind=advice` → tap = acknowledge; long-press = save.
+   - Snooze / dismiss via long-press menu regardless of kind.
+   - Every reaction emits a feedback event even though it's in-process today.

-Exit check: three-page prototype works end-to-end for one user.
+Exit: three-page prototype works end-to-end.

-## Stage 4 — Hardening the prototype (days 17–20)
+## Stage 4 — Hardening (days 17–20)

-1. Error surfaces (Sentry), structured logs (pino / structlog), trace IDs across services.
-2. Rate limits + retries on outbound API calls.
-3. Integration tests: Playwright for the web flow, pact-style contract tests between services.
-4. Deploy to a single VM via docker-compose + Caddy.
+1. Observability: pino + structlog, Sentry per module, W3C traceparent across the monolith boundary and into `ml/serving`.
+2. Rate limits, retries with jitter, and circuit breakers on outbound (Todoist, Google).
+3. Integration tests: Playwright for the web flow (sign-in → connect → tip → delete). Contract tests between modules so the extractions later are safe.
+4. **Metrics baseline wired** (`docs/architecture/metrics.md`): activation, first-tip reaction, dwell, snooze:dismiss ratio, D1 retention.
+5. Deploy to a single VM via docker-compose + Caddy; Caddy auto-TLS; healthchecks wired to Caddy.

-Exit check: Phase 0 milestone closed.
+Exit: Phase 0 milestone closed; real users can be onboarded.

 ---

-## Seams prepared for later phases (do not implement yet, but do not foreclose)
+## Seams prepared for later phases (designed now, implemented later)

- **Event bus.** From day one, `integrations` and `recommender` speak through an async fn that today is an in-process call but will be NATS tomorrow. Keep the signature `(event: NormalizedEvent) → void`.
- **Feature store.** The recommender accepts a `context` blob; later, a feature service fills it. Do not inline feature lookups inside the policy.
- **Policy registry.** `PolicyFactory.get(name)` so A/B and bandit policies slot in without code changes to the gateway.
- **Python boundary.** Recommender is TS today, but its scoring function is isolated — moving to FastAPI in Phase 1 is a file move, not a refactor.
+- **Event bus abstraction.** `emit(event)` / `subscribe(topic, handler)` today is in-process; the production implementation in Phase 1 is NATS JetStream. Callsites never change.
+- **Feature assembler.** Recommender accepts a `context` blob from a `FeatureAssembler`; in Phase 0 it returns a hard-coded minimum; in Phase 1 it calls the feature store.
+- **Shadow-policy hook.** The recommender already supports running N policies in shadow per request; v0 runs zero shadows but the hook exists.
+- **Extraction-ready modules.** Every `services/*/` has a `serve.ts` that can be mounted in the monolith or booted standalone. Dockerfile targets both.

 ---

 ## Staffing assumption

-Work is parallelizable across ~3 streams: **infra/platform**, **backend services**, **web app**. Each Gitea issue notes which stream and which phase (milestone) it belongs to.
+Three parallel streams: **platform** (infra, CI, shared-types), **backend** (auth, profile, integrations, recommender), **web** (sign-in, connect, tip, PWA). `ml` joins in Phase 1. Each Gitea issue carries its stream label and milestone.
--- a/README.md
+++ b/README.md
@@ -69,48 +69,59 @@ docs/        architecture, adr, api

 ## Roadmap

-### Phase 0 — Prototype  *(M0)*
-Goal: a single user can sign in, connect Todoist, and see one random Todoist task on a black page.
- [ ] Monorepo scaffold, CI skeleton, docker-compose dev env
- [ ] `auth` service with Google OAuth
- [ ] `integrations/todoist` OAuth2 flow + encrypted token vault
- [ ] `recommender` service with `RandomPolicy` (v0)
- [ ] `apps/web` — three pages (sign-in, connect, tip)
- [ ] Deploy to a single VM via docker-compose
+### Phase 0 — Walking skeleton  *(M0)*
+Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.
+- [ ] Monorepo scaffold, CI skeleton, docker-compose dev env with `core`/`full` profiles
+- [ ] `auth` on Auth.js with Google provider; OIDC-shaped boundary (ADR-0004)
+- [ ] `integrations/todoist` OAuth2 flow + encrypted token vault + provider-side revocation
+- [ ] `recommender` with `RandomPolicy`; stable `POST /recommend` contract
+- [ ] `apps/web` — three pages (sign-in, connect, tip); PWA manifest; offline reaction queue
+- [ ] ToS + Privacy Policy + consent capture on first sign-in
+- [ ] Account-deletion endpoint: revokes providers, purges credentials, soft-deletes profile
+- [ ] Metrics baseline: activation, first-tip reaction rate, dwell, retention (see `docs/architecture/metrics.md`)
+- [ ] Deploy modular monolith + `ml/serving` stub to a single VM via docker-compose + Caddy

-### Phase 1 — Real signal  *(M1)*
-Goal: the tip is picked, not drawn from a hat. Still Todoist-only.
- [ ] Event bus (NATS) + ingestion from Todoist sync API
- [ ] Feature store skeleton (Feast or homegrown) and the first five features (time-of-day, overdue count, task age, priority, project)
- [ ] `ml/serving` FastAPI scoring endpoint; `recommender` calls it
- [ ] `ContextualBanditPolicy` v1 (LinUCB) replacing `RandomPolicy`
- [ ] Tip feedback loop: user reactions (done / snooze / dismiss) become rewards
+### Phase 1 — Real signal + in-the-moment delivery  *(M1)*
+Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.
+- [ ] Event bus (NATS JetStream) with protobuf schemas (ADR-0005) + schema-registry CI gate
+- [ ] Todoist event-driven sync (emit `signals.task.*`)
+- [ ] Feature store skeleton + first five features (hour-of-day, overdue count, task age, priority, project)
+- [ ] `ml/serving` FastAPI scorer; `RemotePolicy` wrapper in recommender
+- [ ] **Global-then-personalize bandit**: pooled LinUCB over shared features, per-user residual when data allows
+- [ ] Shadow-deploy infra: every new policy logs what it *would* have picked; promotion requires reward-parity
+- [ ] Feedback loop: reactions → rewards; delayed rewards for tasks completed in Todoist directly
+- [ ] **Web Push notifications** (VAPID) so the "magic" shows up without opening the app
+- [ ] `notifier` (lite): web-push delivery, quiet-hours honoured, dedupe
+- [ ] Apple OAuth added (deferred from M0)

-### Phase 2 — Multi-source user profile  *(M2)*
-Goal: oO knows more than tasks.
- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook
+### Phase 2 — Multi-source profile & trust  *(M2)*
+Goal: oO knows more than tasks, and users can see/control what we know.
+- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook ingress
 - [ ] Unified `Profile` model (identity, preferences, contexts, consents)
- [ ] Timing signals (location, idle, focus windows) via client-side probes
- [ ] Advice library (curated tips, not only todos) + mixing policy
+- [ ] Timing signals (Page Visibility, Idle Detection, coarse location) — opt-in, transparent
+- [ ] Advice library + mixing policy (todo vs advice vs ambient)
+- [ ] User-facing data dashboard: what's stored, what's computed, export, delete-by-category
+- [ ] Cost/usage observability

-### Phase 3 — Mobile & notifications  *(M3)*
+### Phase 3 — Native mobile  *(M3)*
 - [ ] iOS app (SwiftUI) with APNs push
 - [ ] Android app (Compose) with FCM push
- [ ] `notifier` service with quiet-hours + per-channel rate limits
- [ ] Rich notifications that deep-link to the tip page
+- [ ] `notifier` gains APNs + FCM channels, per-device rate limits
+- [ ] Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
+- [ ] Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold

 ### Phase 4 — MLOps at scale  *(M4)*
- [ ] Airflow/Prefect orchestrator for batch retrains
- [ ] MLflow model registry + shadow deploys
- [ ] Online `experiments` framework: A/B + multi-armed bandits as first-class
- [ ] Cohort analysis + cross-user collaborative features (opt-in)
- [ ] Model cards, fairness checks, drift monitoring
+- [ ] Prefect/Airflow for batch feature materialization + retraining
+- [ ] MLflow registry; shadow → A/B → launch pipeline as first-class
+- [ ] Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
+- [ ] Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
+- [ ] Drift monitoring (feature drift, prediction drift, reward drift); model cards per version

 ### Phase 5 — Production hardening  *(M5)*
- [ ] SOC2-style controls, audit logging, token rotation
- [ ] k8s deploy + horizontal autoscaling
- [ ] Multi-region failover, PITR backups
- [ ] Public integration SDK so third parties can add sources
+- [ ] Audit logging, rotation of provider tokens + internal signing keys
+- [ ] **k3s** on existing VM, then k8s + HPA once multi-node justified (no cliff)
+- [ ] Multi-region failover, Postgres PITR, event-bus mirroring
+- [ ] Public integration SDK; sandbox tenancy for third-party connectors
 - [ ] Billing + subscription tiers

 ---
@@ -123,4 +134,5 @@ Conventions and per-service guidance live in [`CLAUDE.md`](CLAUDE.md).

 ## License

-TBD.
+All rights reserved — 2026. Contact the owner for licensing inquiries.
+(We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)
--- a/docs/adr/0003-modular-monolith-phase0.md
+++ b/docs/adr/0003-modular-monolith-phase0.md
@@ -0,0 +1,31 @@
+# ADR-0003: Modular monolith for Phase 0, extract when justified
+
+## Status
+Accepted — 2026-04-13
+
+## Context
+The initial architecture called for seven independently-deployable services on day one (gateway, auth, profile, integrations, recommender, events, notifier). For a team of ~3 streams with zero users, this is premature. Each service adds CI, deploy, DB, observability, and release-coordination overhead. It also slows the walking skeleton, which is the most important thing to ship.
+
+Modularity — the thing we actually need — is a **code-boundary** property, not a **process-boundary** property. Well-bounded packages extract to services cheaply; poorly-bounded services rarely merge back.
+
+## Decision
+- **Phase 0:** one Node process bundles `services/*` as internal packages behind their HTTP contracts. `ml/serving` is a separate Python process (language boundary). Postgres + NATS complete the stack.
+- **Directory layout** under `services/` is unchanged. Each module is a self-contained package with its own README, schema migrations, and public interface.
+- **Communication** between modules goes through the same HTTP or event contracts it will use post-extraction. In Phase 0 these are resolved in-process via a thin dispatcher; swapping to HTTP/NATS is a transport change, not an API change.
+- **Extraction criteria** (trigger a service split when any apply):
+  1. Language boundary (already true for `ml/serving`).
+  2. Scaling hotspot: the module's load curve diverges materially from the rest.
+  3. SLA divergence: the module needs stricter availability or latency than the monolith.
+  4. Team ownership: a dedicated team takes the module and wants independent releases.
+  5. Regulatory isolation: credentials/PII need tighter blast-radius control.
+- **`events/` is special:** even inside the monolith we use an event-emitter abstraction whose production implementation is NATS JetStream. The async boundary matters for ML correctness; the process boundary doesn't.
+
+## Consequences
+- Faster Phase 0: one CI pipeline, one deploy, one observability config.
+- Cheap extraction: contracts are already HTTP/event-shaped.
+- Discipline required: no cross-module DB access, no reaching into another module's internals, even though it's physically possible. Enforced by lint/import rules.
+- Deploy story: docker-compose with two application containers (Node monolith + Python serving) until extraction begins. Compose profiles let devs bring up subsets.
+
+## Non-consequences
+- We are **not** monolith-forever. We fully expect `integrations/` and `recommender/` to extract once Phase 2+ traffic patterns justify it.
+- Frontend / mobile unaffected.
--- a/docs/adr/0004-auth-authjs-with-oidc-boundary.md
+++ b/docs/adr/0004-auth-authjs-with-oidc-boundary.md
@@ -0,0 +1,23 @@
+# ADR-0004: Auth.js for Phase 0, dedicated OIDC provider when mobile ships
+
+## Status
+Accepted — 2026-04-13
+
+## Context
+We need Google (and later Apple) sign-in, session management, and JWTs other services can verify. Options considered:
+- **Auth.js (NextAuth):** a library embedded in the Next.js web app. Fastest to ship. Tight coupling to the web runtime; awkward when a native mobile client also needs tokens.
+- **Ory Kratos + Hydra:** a standalone, self-hosted identity + OIDC provider. Much more powerful. Operationally heavy for a prototype.
+- **Roll our own:** not considered.
+
+Mobile apps are Phase 3+. Phase 0 needs the cheapest credible option that does not box us in.
+
+## Decision
+- **Phase 0:** use **Auth.js** inside the web app. Google provider only (Apple deferred — paid dev account + extra domain setup).
+- **Boundary:** from day one, the `auth` module exposes an **OIDC-shaped** HTTP surface (`/me`, `/logout`, JWT verification via public JWKS, `/.well-known/openid-configuration` stub). Other services verify JWTs against that surface, not against Auth.js internals. This means the day we replace the engine, only one module changes.
+- **JWT strategy:** short-lived (10 min) access JWT, rotating refresh token in an HttpOnly cookie. JWT contains `sub`, `email`, `scope`, `sid`.
+- **Trigger to migrate to Ory (or equivalent):** any of — (a) native mobile shipping, (b) a second client type that can't piggyback on Next.js sessions, (c) multi-tenant requirement.
+
+## Consequences
+- Ships in days, not weeks.
+- The OIDC-shaped boundary means the migration is scoped, not scary.
+- Slight duplication early: we maintain OIDC-surface code that Auth.js mostly handles internally. Worth it.
--- a/docs/adr/0005-event-schemas-protobuf.md
+++ b/docs/adr/0005-event-schemas-protobuf.md
@@ -0,0 +1,28 @@
+# ADR-0005: Protocol Buffers for event schemas, OpenAPI for HTTP
+
+## Status
+Accepted — 2026-04-13
+
+## Context
+Two contract surfaces exist:
+1. **HTTP** — synchronous, client ↔ server, human-readable debugging matters. OpenAPI is the default and generates decent TS clients.
+2. **Events** — durable, fan-out to ML consumers, schema evolution critical. Feature pipelines trained on old schemas will silently misbehave when producers change a field.
+
+Using OpenAPI for both means:
+- Python pydantic generation is awkward and hand-maintained in practice.
+- No wire-format discipline (JSON is loose).
+- No central schema registry, so schema drift is undetected until a model regresses.
+
+## Decision
+- **HTTP** contracts: OpenAPI 3.1 in `packages/shared-types/http/`. Generate TS clients; hand-write Python pydantic models for ML consumers (few, and they're shallow).
+- **Event** contracts: Protocol Buffers in `packages/shared-types/events/`. Generate TS and Python. All events carry an envelope: `{event_id, occurred_at, schema_version, producer, payload}`.
+- **Schema registry:** lightweight self-hosted (buf.build Schema Registry OSS or a tiny registry in `events/`). CI check blocks breaking changes without a version bump.
+- **Evolution rules:** additive only within a major version; `reserved` for removed fields; new `schema_version` for breaking changes; consumers advertise the versions they accept.
+
+## Consequences
+- One extra build step in `shared-types` (buf or protoc).
+- Breaking event changes cost something — good; they should.
+- ML pipelines can replay old events against new code with confidence.
+
+## Non-consequences
+- No gRPC. HTTP stays HTTP/JSON. Protobuf is only the wire format on the event bus.
--- a/docs/architecture/data-model.md
+++ b/docs/architecture/data-model.md
@@ -0,0 +1,87 @@
+# Data model
+
+Durable entities across modules. Per-module databases/schemas own these; cross-module access is only via the module's API.
+
+## Core entities
+
+```
+User                 auth + profile
+  id (uuid)
+  created_at
+  email                        (from IdP)
+  preferred_name?
+  deleted_at?                  soft-delete for 30-day recovery; hard-delete after
+
+IdentityLink         auth
+  user_id
+  provider                     "google" | "apple"
+  provider_sub                 subject from IdP
+  created_at
+
+Session              auth
+  user_id
+  sid (uuid)                   in JWT
+  issued_at
+  expires_at
+  revoked_at?
+
+Profile              profile
+  user_id (pk)
+  timezone
+  quiet_hours                  jsonb: [{start,end,days}]
+  contexts                     jsonb: [{name,predicate}]      introduced in Phase 2
+  consents                     jsonb: {integration: {read,write,retain_days}}
+
+Credential           integrations
+  user_id
+  provider                     "todoist" | "google_calendar" | ...
+  ciphertext                   sealed-box over {access, refresh, scopes, expires_at}
+  meta                         provider-specific (sync_token cursor for Todoist)
+  created_at
+  last_refreshed_at
+  revoked_at?
+
+Event                events
+  event_id (ulid)
+  user_id
+  schema_version
+  kind                         e.g. "signals.task.updated"
+  occurred_at
+  ingested_at
+  payload                      protobuf bytes
+
+TipInstance          recommender
+  tip_id (ulid)
+  user_id
+  policy_name                  "random" | "bandit.linucb" | "remote:v3"
+  policy_version
+  candidate_source             "todoist" | "advice.library" | ...
+  context_snapshot             jsonb: features seen at decision time
+  tip                          jsonb: {kind,title,body,source,deep_link,meta}
+  created_at
+  shown_at?                    set when the client reports render
+  reaction?                    "done" | "snooze" | "dismiss" | null
+  reacted_at?
+  delivery_id?                 fk if surfaced via notifier push
+
+Delivery             notifier
+  delivery_id
+  user_id
+  tip_id
+  channel                      "webpush" | "apns" | "fcm" | "email"
+  dispatched_at
+  delivered_at?
+  failure_reason?
+```
+
+## Foreign-key discipline
+
+There are no cross-module FKs. Each module owns its tables. References by id are soft; consistency is maintained by events (user-deleted → every module cascades its own cleanup).
+
+## Deletion
+
+`User.deleted_at` set → a `user.deletion_requested` event goes out → each module soft-deletes its rows → after 30 days a scheduled job hard-deletes. Credentials are **revoked at the provider** (not just erased locally) on soft-delete. See `privacy.md`.
+
+## Replay and reproducibility
+
+`TipInstance.context_snapshot` captures the exact features that produced the decision. This is what lets offline replay re-score historical tips against a new policy without touching the feature store.
--- a/docs/architecture/metrics.md
+++ b/docs/architecture/metrics.md
@@ -0,0 +1,43 @@
+# Metrics: measuring "magic"
+
+We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against.
+
+## North star
+
+**Week-2 tip-reaction rate** — of users who saw a tip in week 1, what fraction reacted to *any* tip in week 2? Captures "did this become part of your life."
+
+## Activation (single-session)
+
+- **Time-to-first-tip** — sign-in → tip rendered. Target: ≤ 60 s on the happy path.
+- **First-tip reaction rate** — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%.
+
+## Engagement
+
+- **Dwell-before-action** — seconds between tip render and first reaction. Too short = glance-away; too long = confused.
+- **Done rate / (Done + Snooze + Dismiss)** — the quality proxy. Rising = tips feel on-target.
+- **Snooze:Dismiss ratio** — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes.
+- **Return cadence** — median inter-session gap. Stable-and-short > spiky.
+
+## Retention
+
+- D1, D7, D28 retention. Cohort-sliced by connected integrations.
+- Churn signal: 7 days without a session.
+
+## ML health (from M1)
+
+- Policy latency p50/p95/p99 at the recommender boundary.
+- Feature null-rate per feature, per user.
+- Online/offline reward disagreement for shadowed policies.
+- Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates.
+
+## Privacy & trust
+
+- Account-deletion completion time (target: < 24 h).
+- Provider-revocation success rate on disconnect.
+- Number of active credentials per user (low = healthy).
+
+## How metrics become decisions
+
+- **Per-change.** Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge).
+- **Shadow > A/B > launch.** Policy changes ship in shadow first (log what it *would* have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin.
+- **Dashboards before features.** If we cannot measure a feature's impact on the north-star metric, we defer the feature.
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -3,22 +3,25 @@
 ## Guiding constraints

 - The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
- Services are small and independently deployable, but we do **not** multiply services for its own sake. Split by team-of-ownership and by data lifecycle.
- Python for ML, TypeScript for applications, shared contracts regenerated from a single source of truth.
+- Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003).
+- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
+- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`).

-## Services
+## Modules

-| Service | Language | Responsibility | Owns data |
-|---|---|---|---|
-| `gateway` | TS (Node) | BFF for web/mobile; auth-checking; request fan-out | — |
-| `auth` | TS | OAuth (Google, Apple), sessions, token issuance | identities, sessions |
-| `profile` | TS | user profile, preferences, consents | profiles |
-| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors |
-| `events` | TS | event-bus ingress, normalization, durable log | signal store |
-| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history |
-| `ml/serving` | Python | online scoring for policies/models | — (stateless) |
-| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models |
-| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log |
+| Module | Language | Responsibility | Owns data | Phase-0 process |
+|---|---|---|---|---|
+| `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith |
+| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
+| `profile` | TS | user profile, preferences, consents | profiles | Node monolith |
+| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
+| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
+| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
+| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
+| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** |
+| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) |
+
+Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds.

 ## Data boundaries

@@ -36,9 +39,28 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as

 ## Why these choices

- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4.
- **Postgres** everywhere for OLTP. Per-service schemas, not per-service instances in dev.
+- **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
+- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
+- **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract.
 - **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
+- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
+- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
 - **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
 - **MLflow** for model registry; artifacts in MinIO/S3.
- **Auth.js or Ory** for identity — we will not write crypto.
+- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
+- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
+
+## Decision flow for a new tip
+
+```
+client ─► gateway ─► recommender
+                       │
+                       ├─► candidates:   integrations.fetchCandidates(user)  + advice.library
+                       ├─► context:      FeatureAssembler(user, request)
+                       ├─► policy:       PolicyRegistry.get(policyName).pick(candidates, context)
+                       ├─► shadows:      run shadow policies in parallel, log their picks
+                       └─► persist:      TipInstance{context_snapshot, policy, tip}
+                       ◄─  tip
+```
+
+Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain.
--- a/docs/architecture/privacy.md
+++ b/docs/architecture/privacy.md
@@ -0,0 +1,40 @@
+# Privacy architecture
+
+Privacy is a Phase 0 feature, not a Phase 5 compliance project. This doc is the minimum.
+
+## Principles
+
+1. **Data minimization.** Store only what we need for the tip. Raw task titles stay at Todoist; we store references + computed features. If a feature doesn't lift a metric, its input data doesn't get stored.
+2. **User-visible controls.** Every connection shows exactly which scopes we hold and what we've computed. One tap disconnects and revokes.
+3. **Deletion is real.** Deleting an account revokes provider tokens, purges credentials immediately, and soft-deletes user data for a 30-day recovery window, then hard-deletes.
+4. **No surprise sharing.** Cross-user / collaborative features are opt-in, per category, per integration.
+5. **Encryption in transit and at rest.** TLS everywhere; column-level encryption for credentials; disk-level for backups.
+
+## Flows
+
+### Connect
+User taps "Connect Todoist" → consent screen lists: scopes requested, what we store, what we compute, retention, revocation instructions → OAuth → stored credential is immediately testable and shows in `/connect`.
+
+### Disconnect
+User taps disconnect → `Credential.revoked_at` set → provider-side revocation attempted (Todoist: token revocation endpoint) → credential erased on success → `credential.revoked` event → downstream modules drop associated cursors, caches, derived features for that `(user, provider)` pair.
+
+### Delete account
+User taps "Delete account" in settings → hard confirm → `User.deleted_at` set, all sessions revoked, `user.deletion_requested` event fanned out → every module processes its portion (credentials revoked + purged; profile scrubbed; tip history anonymized to aggregate stats only or purged, per retention policy; events purged on schedule) → within 24 hours account is non-recoverable operationally; within 30 days all rows are hard-deleted.
+
+### Export (Phase 2)
+`GET /me/export` returns a JSON bundle of everything we hold for the user: profile, consents, credentials-metadata (not secrets), events, tip history.
+
+## Scope boundaries
+
+Each integration declares the scopes it requests and the features it derives. The `Profile.consents` column is the source of truth; a scope removed from consent short-circuits derived-feature computation at the feature store.
+
+## Audit
+
+- Privileged actions (admin-initiated deletions, credential decryption outside the normal refresh path) go to an append-only audit log from Phase 0.
+- Per-user access log available via `GET /me/access-log` (Phase 2).
+
+## Legal surface (Phase 0 minimum)
+
+- Terms of Service + Privacy Policy documents shipped alongside the sign-in page.
+- Consent capture on first sign-in, with a versioned ToS/PP hash stored per user.
+- Data-subject request inbox (email) wired up before onboarding the first external user.
--- a/services/README.md
+++ b/services/README.md
@@ -1,13 +1,15 @@
 # services/

-Backend microservices. Each directory is independently deployable, ships a `Dockerfile`, a `/health` endpoint, and its own `README.md` describing its contract.
+Backend modules. Each owns a contract and ships its own `README.md`. In **Phase 0** these are internal packages inside a single Node process (ADR-0003); they extract to their own processes as pressure justifies.

-| Dir | Role | Phase introduced |
-|---|---|---|
-| `gateway/` | BFF for clients; auth check; fan-out to services | 0 |
-| `auth/` | OAuth (Google/Apple), sessions, JWT | 0 |
-| `profile/` | user profile, preferences, consents | 0 |
-| `integrations/` | third-party connectors + encrypted token vault (Todoist first) | 0 |
-| `recommender/` | `POST /recommend` — policy-driven tip selection | 0 |
-| `events/` | event bus ingress + durable signal store | 1 |
-| `notifier/` | push/email/web delivery with quiet-hours | 3 |
+| Dir | Role | Phase-0 shape | Extracts when |
+|---|---|---|---|
+| `gateway/` | BFF for clients; auth check; fan-out | in-proc router | never (stays as the edge) |
+| `auth/` | Google OAuth (Apple in M1), sessions, JWT | Auth.js behind OIDC shape | mobile native ships (M3) |
+| `profile/` | user profile, preferences, consents | in-proc module | team ownership diverges |
+| `integrations/` | connectors + encrypted token vault | in-proc module | credential blast-radius isolation |
+| `recommender/` | `POST /recommend` — policy-driven tip selection | in-proc; calls `ml/serving` from M1 | scaling hotspot |
+| `events/` | event bus + signal log | in-proc emitter (Phase 0); NATS (M1) | always a library + broker, not a service |
+| `notifier/` | push/email delivery + quiet hours | in-proc; **web push in M1** | SLA divergence or mobile push scale |
+
+Contracts that cross module lines (HTTP or events) come from `packages/shared-types/`. In-module imports across modules are forbidden by import lint.
--- a/services/integrations/README.md
+++ b/services/integrations/README.md
@@ -7,11 +7,14 @@ Third-party connectors and the token vault.
 ```ts
 interface Connector {
  id: string                                // e.g. "todoist"
+  scopes: string[]                          // human-readable list shown in consent UI
  beginOAuth(user): Promise<{ redirectUrl, state }>
  finishOAuth(code, state): Promise<StoredCredential>
  fetchSignals(user, since?): AsyncIterable<NormalizedEvent>
-  // optional write-back, e.g. mark task done
-  act?(user, action): Promise<void>
+  // incremental-sync cursor (Todoist sync_token, webhook timestamps, etc.)
+  // stored in Credential.meta; the connector owns its shape.
+  act?(user, action): Promise<void>          // optional write-back (complete task, etc.)
+  revoke(user): Promise<void>                // REQUIRED: provider-side token revocation on disconnect
 }
 ```

--- a/services/recommender/README.md
+++ b/services/recommender/README.md
@@ -16,12 +16,14 @@ POST /feedback
 ## Internals (stable seams)

 - **Candidate sources** — pluggable async generators. v0: Todoist tasks via `integrations`. Later: advice library, calendar nudges, health prompts.
- **Context assembler** — merges request context with features (inline now, feature-store later).
- **Policy** — `Policy.pick(candidates, context) → tip`. Registered by name:
+- **Feature assembler** — fills the `context` blob (inline in Phase 0; calls feature store from M1). Never inlined into policy code.
+- **Policy registry** — `Policy.pick(candidates, context) → tip`. Named entries:
  - `random` — v0 (Phase 0).
-  - `bandit.linucb` — v1 (Phase 1).
+  - `bandit.linucb.pooled` — v1 (Phase 1). **Global-then-personalize**: pooled features shared across users; per-user residual once data allows.
  - `remote` — delegates to `ml/serving` FastAPI scorer (Phase 1+).
+- **Shadow hook** — every request optionally runs N shadow policies in parallel and logs their picks + estimated rewards. Promotion from shadow → A/B → launch is a separate, deliberate step (ADR-0002).
+- **TipInstance persistence** — every decision writes `context_snapshot` (features seen at decision time). This is what makes offline replay honest.

 ## Phase 0 goal

-`RandomPolicy` only. The service, contract, and seams exist; the brain does not yet.
+`RandomPolicy` only. The service, contract, registry, shadow hook, and tip-instance persistence all exist; no ML yet.