From 7f173f88d326e9db56fdacb84f60f7ae7c4b6f26 Mon Sep 17 00:00:00 2001 From: alvis Date: Mon, 13 Apr 2026 14:36:11 +0000 Subject: [PATCH] =?UTF-8?q?refactor:=20architecture=20revision=20=E2=80=94?= =?UTF-8?q?=20modular=20monolith,=20auth-commit,=20event=20protobuf,=20pri?= =?UTF-8?q?vacy-from-day-0?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5) --- CLAUDE.md | 67 ++++++++------ PLAN.md | 90 +++++++++++-------- README.md | 78 +++++++++------- docs/adr/0003-modular-monolith-phase0.md | 31 +++++++ .../0004-auth-authjs-with-oidc-boundary.md | 23 +++++ docs/adr/0005-event-schemas-protobuf.md | 28 ++++++ docs/architecture/data-model.md | 87 ++++++++++++++++++ docs/architecture/metrics.md | 43 +++++++++ docs/architecture/overview.md | 56 ++++++++---- docs/architecture/privacy.md | 40 +++++++++ services/README.md | 22 ++--- services/integrations/README.md | 7 +- services/recommender/README.md | 10 ++- 13 files changed, 449 insertions(+), 133 deletions(-) create mode 100644 docs/adr/0003-modular-monolith-phase0.md create mode 100644 docs/adr/0004-auth-authjs-with-oidc-boundary.md create mode 100644 docs/adr/0005-event-schemas-protobuf.md create mode 100644 docs/architecture/data-model.md create mode 100644 docs/architecture/metrics.md create mode 100644 docs/architecture/privacy.md diff --git a/CLAUDE.md b/CLAUDE.md index 0b8de92..b5d092e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -8,66 +8,73 @@ The magic is the product. Precision + timing + minimalism. The UI shows a single ## Prime directives -1. **Modular, service-oriented from day one.** Even the prototype. We will scale to mobile (iOS/Android), many integrations, multi-tenant ML. Shortcuts that bake in a monolith are not acceptable. -2. **Recommendation engine is the core.** Every other service feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind. -3. **Python owns ML.** Everything training, features, serving for models is Python (FastAPI + PyTorch/scikit + MLflow/feast). Application services are TypeScript (Node, Next.js) unless there's a reason. +1. **Modular by package, deployable by stage.** Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = **modular monolith + Python ML sidecar**. See ADR-0003. +2. **Recommendation engine is the core.** Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind. +3. **Python owns ML.** Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason. 4. **OAuth-first for identity and integrations.** Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently. -5. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. +5. **Privacy is a feature, not a phase.** Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed. +6. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. The tip page is a watch face. ## Architecture (high level) +The tree below is **logical module structure**. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003). + ``` apps/ user-facing clients web/ Next.js PWA — the first shipped client mobile-ios/ Swift/SwiftUI (Phase 3) mobile-android/ Kotlin/Compose (Phase 3) -services/ backend microservices (each independently deployable) - gateway/ API gateway + BFF (GraphQL or tRPC) +services/ backend modules — each owns a contract; may share a deployable + gateway/ BFF for clients; auth check; fan-out auth/ OAuth (Google, Apple, ...), sessions, JWT issuance profile/ user profile, preferences, consents - integrations/ third-party connectors (Todoist first); token vault - recommender/ Python; serves the "one best tip" decision - events/ event bus ingress (Kafka/NATS) + signal store - notifier/ push/email/web delivery of tips + integrations/ third-party connectors + token vault (Todoist first) + recommender/ orchestration: candidates → policy → tip; feedback sink + events/ event bus ingress + durable signal store + notifier/ push/email/web delivery (web push from Phase 1) -packages/ shared libraries - shared-types/ OpenAPI/proto-generated types +packages/ shared libraries (importable across services + apps) + shared-types/ HTTP types via OpenAPI; event types via protobuf (ADR-0005) sdk-js/ client SDK used by web + mobile webviews ui/ shared React components + design tokens -ml/ Python MLOps - pipelines/ training / batch feature pipelines (Airflow/Prefect) - features/ feature definitions (Feast-style) - registry/ model registry (MLflow) integration - experiments/ A/B testing framework + bandit policies - serving/ online inference service (FastAPI) - notebooks/ research only — not production +ml/ Python — separate deployable from day one + serving/ online scorer (FastAPI), called by recommender + features/ feature definitions + store adapter + pipelines/ batch feature + training DAGs (Prefect/Airflow) + registry/ MLflow model registry integration + experiments/ assignment + A/B + bandit policies + notebooks/ research only; never imported by production code -infra/ docker-compose, k8s manifests, terraform, CI +infra/ docker-compose (Phase 0), k3s/k8s (later), terraform, CI docs/ architecture notes, ADRs, API specs ``` -## Contracts between services +**Phase 0 deployables:** one Node process (`services/*` bundled via modular monolith) + one Python process (`ml/serving`, stubbed until M1) + Postgres + NATS. Services **extract to their own process** when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003. -- **Events** (Kafka/NATS) — source of truth for user signals. All integrations emit normalized events; the recommender reads them. -- **HTTP/gRPC** — synchronous request/response (gateway → services). -- **Shared schemas** live in `packages/shared-types`; generated from a single OpenAPI / proto source. Do not redefine types per service. +## Contracts between modules + +- **HTTP** (OpenAPI, in `packages/shared-types/http/`) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical. +- **Events** (Protocol Buffers, in `packages/shared-types/events/`) — durable signals + feedback. Today: in-process event emitter. Tomorrow: NATS JetStream. Schema registry enforced in CI (ADR-0005). +- Do not redefine types per module. Regenerate from `shared-types`. ## Conventions -- Every service ships a `README.md`, a `Dockerfile`, and a `/health` endpoint. -- One PR = one concern. Commits follow conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`). +- Each module ships a `README.md` describing its contract, its `/health` story, and its extraction criteria (when it should become its own process). +- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`). - ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work. - No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later). +- Compose profiles (`core`, `full`) so devs can run a subset without 16 GB of RAM. ## Definition of done (per feature) 1. Code + tests merged. -2. Service's `README.md` updated. +2. Module's `README.md` updated. 3. If it changes a contract → `shared-types` regenerated + consumers updated. 4. If it changes architecture → ADR added. 5. Deployable via `docker compose up` locally. +6. If it touches user data → a deletion path exists and is tested. ## Current phase @@ -75,7 +82,9 @@ docs/ architecture notes, ADRs, API specs ## What NOT to do -- Don't copy Todoist's data into our DB. Store the OAuth token; fetch on demand. -- Don't implement auth by hand. Use a library (NextAuth / Auth.js, Ory, or Clerk-compatible). We will self-host. +- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand. +- Don't implement auth by hand. Phase 0 uses **Auth.js** behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships. - Don't hardwire a recommender. The "random todo" v0 must live behind the same interface the real ML model will implement (`POST /recommend` → `{tip}`). Swap internals, keep contract. +- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002). - Don't build an admin UI before the user-facing black page is polished. +- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003). diff --git a/PLAN.md b/PLAN.md index 38f6962..91f21d0 100644 --- a/PLAN.md +++ b/PLAN.md @@ -1,71 +1,85 @@ # Implementation plan -Step-by-step build order for Phase 0 (prototype) and the seams that make Phases 1–5 cheap. +Step-by-step build order for Phase 0 (walking skeleton) and the seams that make Phases 1–5 cheap. -The principle: **build the contracts first, stub the internals.** Every service should exist with a `/health` endpoint and a minimal real implementation of its interface before any service is "finished". This gives us an end-to-end walking skeleton from week one. +The principle: **build the contracts first, stub the internals.** Every module exposes its contract and a `/health` story before any module is "finished". End-to-end walking skeleton in the first week. + +**Packaging reminder (ADR-0003):** Phase 0 is a modular monolith — one Node process bundles `services/*` behind their HTTP contracts, plus `ml/serving` as a separate Python process. Contracts are identical whether the call is in-process or over the wire. --- ## Stage 0 — Foundations (days 1–3) -1. **Monorepo tooling.** pnpm workspaces for JS/TS; uv or poetry for Python; turbo or nx for build graph; pre-commit (lint, typecheck, format). -2. **Docker Compose dev env.** Postgres, NATS, MinIO (S3), Mailhog, all services wired with hot-reload. -3. **CI skeleton** (Gitea Actions): lint → typecheck → unit test → build → publish images. -4. **Secrets convention.** `.env.example` per service; prod secrets injected by orchestrator. -5. **Shared types package.** OpenAPI source → generated TS + Python clients. +1. **Monorepo tooling.** pnpm workspaces for TS; uv for Python; turbo for build graph; pre-commit (eslint, prettier, ruff, mypy, typecheck). +2. **Docker Compose dev env** with profiles: + - `core` — Node monolith + `ml/serving` stub + Postgres. + - `full` — adds NATS, MinIO, MailHog. Needed from Stage 4 onward. +3. **CI skeleton** (Gitea Actions): lint → typecheck → unit → build → publish images. Schema-registry check for protobuf events (added in Phase 1, but pipeline stub now). +4. **Secrets convention.** `.env.example` per module; prod injected by orchestrator. +5. **Shared types.** OpenAPI for HTTP, protobuf for events (ADR-0005). Generate TS; Python pydantic models hand-written initially (few consumers). +6. **Import-boundary lint.** `eslint-plugin-boundaries` (or equivalent) prevents `services/integrations` from importing `services/recommender` internals. Contracts-only. -Deliverable: `docker compose up` brings a green dashboard of `/health` endpoints. +Exit: `docker compose --profile core up` brings a green dashboard of `/health` endpoints. ## Stage 1 — Identity & session (days 4–7) -1. `services/auth`: Google OAuth2 (PKCE), session cookies, short-lived JWTs, refresh rotation. Library-backed (Auth.js or Ory Kratos + Hydra) — we do not roll our own. -2. `services/profile`: minimal `User` record; created on first sign-in. -3. `apps/web` sign-in page; gateway verifies JWT. +1. `services/auth` module: Auth.js embedded in the Node monolith, Google provider only (Apple deferred). OIDC-shaped surface (ADR-0004): `/me`, `/logout`, JWKS, stub `/.well-known/openid-configuration`. +2. `services/profile` module: `User` row created on first sign-in; consent record captured with ToS/PP version hash. +3. `apps/web` sign-in page. Gateway (also in-process) verifies JWT. +4. **Deletion endpoint** (yes, already): `DELETE /me` — revokes sessions, flips `deleted_at`, emits `user.deletion_requested`. -Exit check: a user can sign in and fetch their own profile. +Exit: a user can sign in, see their profile, and delete their account; deletion is observable end-to-end even though there's no data to erase yet. ## Stage 2 — Integrations framework (days 8–12) -1. `services/integrations` with a **Connector** interface: - - `begin_oauth(user) → redirect_url` - - `finish_oauth(code, state) → StoredCredential` - - `fetch_signals(user, since) → Event[]` -2. **Token vault**: column-level encryption (libsodium), key from env or KMS. -3. **Todoist connector** as the first concrete implementation. -4. Web "Connect" page: list of connectors, button per connector, callback handling. +1. `services/integrations` module with a **Connector** interface: + - `beginOAuth(user) → {redirectUrl, state}` + - `finishOAuth(code, state) → StoredCredential` + - `fetchSignals(user, since?) → AsyncIterable` + - `act?(user, action) → void` + - `revoke(user) → void` — first-class; no revocation means no disconnect. +2. **Token vault**: libsodium sealed box, key from env/KMS. One row per `(user, provider)` with provider-specific `meta` (e.g. Todoist `sync_token`). +3. **Todoist connector**: OAuth2, Sync API incremental reads via `sync_token`, `act` to complete a task, `revoke` calls Todoist's token-revocation endpoint. +4. Web `/connect`: list of connectors, per-connector consent screen (scopes + retention), connect/disconnect. -Exit check: a user taps "Connect Todoist", completes the OAuth dance, and the integrations service can fetch their tasks on demand. +Exit: a user can connect and disconnect Todoist; disconnect revokes at Todoist and wipes local credentials. ## Stage 3 — Recommender contract (days 13–16) -1. `services/recommender` exposes `POST /recommend {user_id, context} → {tip}`. -2. Policy interface (`Policy.pick(user, candidates, context) → tip`). -3. **`RandomPolicy` v0** — fetches candidates from `integrations` (Todoist tasks), returns one uniformly at random. -4. Tip shape is provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`. -5. `apps/web` tip page: full black, one tip centered, tap = mark done → callback fires to integrations (complete Todoist task) + emits a feedback event. +1. `services/recommender` module exposes `POST /recommend` and `POST /feedback`. +2. **Policy registry** keyed by name. **Candidate sources** registered independently; v0 source = `integrations.todoist.tasks`. +3. **`RandomPolicy` v0** — draws uniformly. +4. **Tip shape** provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`. +5. **`TipInstance` persisted** with `context_snapshot` — the features-seen-at-decision-time blob that makes offline replay possible later. +6. `apps/web` tip page: + - `kind=todo` → tap = done (calls `integrations.todoist.act(complete)`). + - `kind=advice` → tap = acknowledge; long-press = save. + - Snooze / dismiss via long-press menu regardless of kind. + - Every reaction emits a feedback event even though it's in-process today. -Exit check: three-page prototype works end-to-end for one user. +Exit: three-page prototype works end-to-end. -## Stage 4 — Hardening the prototype (days 17–20) +## Stage 4 — Hardening (days 17–20) -1. Error surfaces (Sentry), structured logs (pino / structlog), trace IDs across services. -2. Rate limits + retries on outbound API calls. -3. Integration tests: Playwright for the web flow, pact-style contract tests between services. -4. Deploy to a single VM via docker-compose + Caddy. +1. Observability: pino + structlog, Sentry per module, W3C traceparent across the monolith boundary and into `ml/serving`. +2. Rate limits, retries with jitter, and circuit breakers on outbound (Todoist, Google). +3. Integration tests: Playwright for the web flow (sign-in → connect → tip → delete). Contract tests between modules so the extractions later are safe. +4. **Metrics baseline wired** (`docs/architecture/metrics.md`): activation, first-tip reaction, dwell, snooze:dismiss ratio, D1 retention. +5. Deploy to a single VM via docker-compose + Caddy; Caddy auto-TLS; healthchecks wired to Caddy. -Exit check: Phase 0 milestone closed. +Exit: Phase 0 milestone closed; real users can be onboarded. --- -## Seams prepared for later phases (do not implement yet, but do not foreclose) +## Seams prepared for later phases (designed now, implemented later) -- **Event bus.** From day one, `integrations` and `recommender` speak through an async fn that today is an in-process call but will be NATS tomorrow. Keep the signature `(event: NormalizedEvent) → void`. -- **Feature store.** The recommender accepts a `context` blob; later, a feature service fills it. Do not inline feature lookups inside the policy. -- **Policy registry.** `PolicyFactory.get(name)` so A/B and bandit policies slot in without code changes to the gateway. -- **Python boundary.** Recommender is TS today, but its scoring function is isolated — moving to FastAPI in Phase 1 is a file move, not a refactor. +- **Event bus abstraction.** `emit(event)` / `subscribe(topic, handler)` today is in-process; the production implementation in Phase 1 is NATS JetStream. Callsites never change. +- **Feature assembler.** Recommender accepts a `context` blob from a `FeatureAssembler`; in Phase 0 it returns a hard-coded minimum; in Phase 1 it calls the feature store. +- **Shadow-policy hook.** The recommender already supports running N policies in shadow per request; v0 runs zero shadows but the hook exists. +- **Extraction-ready modules.** Every `services/*/` has a `serve.ts` that can be mounted in the monolith or booted standalone. Dockerfile targets both. --- ## Staffing assumption -Work is parallelizable across ~3 streams: **infra/platform**, **backend services**, **web app**. Each Gitea issue notes which stream and which phase (milestone) it belongs to. +Three parallel streams: **platform** (infra, CI, shared-types), **backend** (auth, profile, integrations, recommender), **web** (sign-in, connect, tip, PWA). `ml` joins in Phase 1. Each Gitea issue carries its stream label and milestone. \ No newline at end of file diff --git a/README.md b/README.md index 9886f63..474f5a1 100644 --- a/README.md +++ b/README.md @@ -69,48 +69,59 @@ docs/ architecture, adr, api ## Roadmap -### Phase 0 — Prototype *(M0)* -Goal: a single user can sign in, connect Todoist, and see one random Todoist task on a black page. -- [ ] Monorepo scaffold, CI skeleton, docker-compose dev env -- [ ] `auth` service with Google OAuth -- [ ] `integrations/todoist` OAuth2 flow + encrypted token vault -- [ ] `recommender` service with `RandomPolicy` (v0) -- [ ] `apps/web` — three pages (sign-in, connect, tip) -- [ ] Deploy to a single VM via docker-compose +### Phase 0 — Walking skeleton *(M0)* +Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works. +- [ ] Monorepo scaffold, CI skeleton, docker-compose dev env with `core`/`full` profiles +- [ ] `auth` on Auth.js with Google provider; OIDC-shaped boundary (ADR-0004) +- [ ] `integrations/todoist` OAuth2 flow + encrypted token vault + provider-side revocation +- [ ] `recommender` with `RandomPolicy`; stable `POST /recommend` contract +- [ ] `apps/web` — three pages (sign-in, connect, tip); PWA manifest; offline reaction queue +- [ ] ToS + Privacy Policy + consent capture on first sign-in +- [ ] Account-deletion endpoint: revokes providers, purges credentials, soft-deletes profile +- [ ] Metrics baseline: activation, first-tip reaction rate, dwell, retention (see `docs/architecture/metrics.md`) +- [ ] Deploy modular monolith + `ml/serving` stub to a single VM via docker-compose + Caddy -### Phase 1 — Real signal *(M1)* -Goal: the tip is picked, not drawn from a hat. Still Todoist-only. -- [ ] Event bus (NATS) + ingestion from Todoist sync API -- [ ] Feature store skeleton (Feast or homegrown) and the first five features (time-of-day, overdue count, task age, priority, project) -- [ ] `ml/serving` FastAPI scoring endpoint; `recommender` calls it -- [ ] `ContextualBanditPolicy` v1 (LinUCB) replacing `RandomPolicy` -- [ ] Tip feedback loop: user reactions (done / snooze / dismiss) become rewards +### Phase 1 — Real signal + in-the-moment delivery *(M1)* +Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web. +- [ ] Event bus (NATS JetStream) with protobuf schemas (ADR-0005) + schema-registry CI gate +- [ ] Todoist event-driven sync (emit `signals.task.*`) +- [ ] Feature store skeleton + first five features (hour-of-day, overdue count, task age, priority, project) +- [ ] `ml/serving` FastAPI scorer; `RemotePolicy` wrapper in recommender +- [ ] **Global-then-personalize bandit**: pooled LinUCB over shared features, per-user residual when data allows +- [ ] Shadow-deploy infra: every new policy logs what it *would* have picked; promotion requires reward-parity +- [ ] Feedback loop: reactions → rewards; delayed rewards for tasks completed in Todoist directly +- [ ] **Web Push notifications** (VAPID) so the "magic" shows up without opening the app +- [ ] `notifier` (lite): web-push delivery, quiet-hours honoured, dedupe +- [ ] Apple OAuth added (deferred from M0) -### Phase 2 — Multi-source user profile *(M2)* -Goal: oO knows more than tasks. -- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook +### Phase 2 — Multi-source profile & trust *(M2)* +Goal: oO knows more than tasks, and users can see/control what we know. +- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook ingress - [ ] Unified `Profile` model (identity, preferences, contexts, consents) -- [ ] Timing signals (location, idle, focus windows) via client-side probes -- [ ] Advice library (curated tips, not only todos) + mixing policy +- [ ] Timing signals (Page Visibility, Idle Detection, coarse location) — opt-in, transparent +- [ ] Advice library + mixing policy (todo vs advice vs ambient) +- [ ] User-facing data dashboard: what's stored, what's computed, export, delete-by-category +- [ ] Cost/usage observability -### Phase 3 — Mobile & notifications *(M3)* +### Phase 3 — Native mobile *(M3)* - [ ] iOS app (SwiftUI) with APNs push - [ ] Android app (Compose) with FCM push -- [ ] `notifier` service with quiet-hours + per-channel rate limits -- [ ] Rich notifications that deep-link to the tip page +- [ ] `notifier` gains APNs + FCM channels, per-device rate limits +- [ ] Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004) +- [ ] Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold ### Phase 4 — MLOps at scale *(M4)* -- [ ] Airflow/Prefect orchestrator for batch retrains -- [ ] MLflow model registry + shadow deploys -- [ ] Online `experiments` framework: A/B + multi-armed bandits as first-class -- [ ] Cohort analysis + cross-user collaborative features (opt-in) -- [ ] Model cards, fairness checks, drift monitoring +- [ ] Prefect/Airflow for batch feature materialization + retraining +- [ ] MLflow registry; shadow → A/B → launch pipeline as first-class +- [ ] Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B +- [ ] Cross-user collaborative features (opt-in only); cohort slicing; fairness checks +- [ ] Drift monitoring (feature drift, prediction drift, reward drift); model cards per version ### Phase 5 — Production hardening *(M5)* -- [ ] SOC2-style controls, audit logging, token rotation -- [ ] k8s deploy + horizontal autoscaling -- [ ] Multi-region failover, PITR backups -- [ ] Public integration SDK so third parties can add sources +- [ ] Audit logging, rotation of provider tokens + internal signing keys +- [ ] **k3s** on existing VM, then k8s + HPA once multi-node justified (no cliff) +- [ ] Multi-region failover, Postgres PITR, event-bus mirroring +- [ ] Public integration SDK; sandbox tenancy for third-party connectors - [ ] Billing + subscription tiers --- @@ -123,4 +134,5 @@ Conventions and per-service guidance live in [`CLAUDE.md`](CLAUDE.md). ## License -TBD. +All rights reserved — 2026. Contact the owner for licensing inquiries. +(We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.) diff --git a/docs/adr/0003-modular-monolith-phase0.md b/docs/adr/0003-modular-monolith-phase0.md new file mode 100644 index 0000000..2de0148 --- /dev/null +++ b/docs/adr/0003-modular-monolith-phase0.md @@ -0,0 +1,31 @@ +# ADR-0003: Modular monolith for Phase 0, extract when justified + +## Status +Accepted — 2026-04-13 + +## Context +The initial architecture called for seven independently-deployable services on day one (gateway, auth, profile, integrations, recommender, events, notifier). For a team of ~3 streams with zero users, this is premature. Each service adds CI, deploy, DB, observability, and release-coordination overhead. It also slows the walking skeleton, which is the most important thing to ship. + +Modularity — the thing we actually need — is a **code-boundary** property, not a **process-boundary** property. Well-bounded packages extract to services cheaply; poorly-bounded services rarely merge back. + +## Decision +- **Phase 0:** one Node process bundles `services/*` as internal packages behind their HTTP contracts. `ml/serving` is a separate Python process (language boundary). Postgres + NATS complete the stack. +- **Directory layout** under `services/` is unchanged. Each module is a self-contained package with its own README, schema migrations, and public interface. +- **Communication** between modules goes through the same HTTP or event contracts it will use post-extraction. In Phase 0 these are resolved in-process via a thin dispatcher; swapping to HTTP/NATS is a transport change, not an API change. +- **Extraction criteria** (trigger a service split when any apply): + 1. Language boundary (already true for `ml/serving`). + 2. Scaling hotspot: the module's load curve diverges materially from the rest. + 3. SLA divergence: the module needs stricter availability or latency than the monolith. + 4. Team ownership: a dedicated team takes the module and wants independent releases. + 5. Regulatory isolation: credentials/PII need tighter blast-radius control. +- **`events/` is special:** even inside the monolith we use an event-emitter abstraction whose production implementation is NATS JetStream. The async boundary matters for ML correctness; the process boundary doesn't. + +## Consequences +- Faster Phase 0: one CI pipeline, one deploy, one observability config. +- Cheap extraction: contracts are already HTTP/event-shaped. +- Discipline required: no cross-module DB access, no reaching into another module's internals, even though it's physically possible. Enforced by lint/import rules. +- Deploy story: docker-compose with two application containers (Node monolith + Python serving) until extraction begins. Compose profiles let devs bring up subsets. + +## Non-consequences +- We are **not** monolith-forever. We fully expect `integrations/` and `recommender/` to extract once Phase 2+ traffic patterns justify it. +- Frontend / mobile unaffected. \ No newline at end of file diff --git a/docs/adr/0004-auth-authjs-with-oidc-boundary.md b/docs/adr/0004-auth-authjs-with-oidc-boundary.md new file mode 100644 index 0000000..c31483a --- /dev/null +++ b/docs/adr/0004-auth-authjs-with-oidc-boundary.md @@ -0,0 +1,23 @@ +# ADR-0004: Auth.js for Phase 0, dedicated OIDC provider when mobile ships + +## Status +Accepted — 2026-04-13 + +## Context +We need Google (and later Apple) sign-in, session management, and JWTs other services can verify. Options considered: +- **Auth.js (NextAuth):** a library embedded in the Next.js web app. Fastest to ship. Tight coupling to the web runtime; awkward when a native mobile client also needs tokens. +- **Ory Kratos + Hydra:** a standalone, self-hosted identity + OIDC provider. Much more powerful. Operationally heavy for a prototype. +- **Roll our own:** not considered. + +Mobile apps are Phase 3+. Phase 0 needs the cheapest credible option that does not box us in. + +## Decision +- **Phase 0:** use **Auth.js** inside the web app. Google provider only (Apple deferred — paid dev account + extra domain setup). +- **Boundary:** from day one, the `auth` module exposes an **OIDC-shaped** HTTP surface (`/me`, `/logout`, JWT verification via public JWKS, `/.well-known/openid-configuration` stub). Other services verify JWTs against that surface, not against Auth.js internals. This means the day we replace the engine, only one module changes. +- **JWT strategy:** short-lived (10 min) access JWT, rotating refresh token in an HttpOnly cookie. JWT contains `sub`, `email`, `scope`, `sid`. +- **Trigger to migrate to Ory (or equivalent):** any of — (a) native mobile shipping, (b) a second client type that can't piggyback on Next.js sessions, (c) multi-tenant requirement. + +## Consequences +- Ships in days, not weeks. +- The OIDC-shaped boundary means the migration is scoped, not scary. +- Slight duplication early: we maintain OIDC-surface code that Auth.js mostly handles internally. Worth it. \ No newline at end of file diff --git a/docs/adr/0005-event-schemas-protobuf.md b/docs/adr/0005-event-schemas-protobuf.md new file mode 100644 index 0000000..ea51fd8 --- /dev/null +++ b/docs/adr/0005-event-schemas-protobuf.md @@ -0,0 +1,28 @@ +# ADR-0005: Protocol Buffers for event schemas, OpenAPI for HTTP + +## Status +Accepted — 2026-04-13 + +## Context +Two contract surfaces exist: +1. **HTTP** — synchronous, client ↔ server, human-readable debugging matters. OpenAPI is the default and generates decent TS clients. +2. **Events** — durable, fan-out to ML consumers, schema evolution critical. Feature pipelines trained on old schemas will silently misbehave when producers change a field. + +Using OpenAPI for both means: +- Python pydantic generation is awkward and hand-maintained in practice. +- No wire-format discipline (JSON is loose). +- No central schema registry, so schema drift is undetected until a model regresses. + +## Decision +- **HTTP** contracts: OpenAPI 3.1 in `packages/shared-types/http/`. Generate TS clients; hand-write Python pydantic models for ML consumers (few, and they're shallow). +- **Event** contracts: Protocol Buffers in `packages/shared-types/events/`. Generate TS and Python. All events carry an envelope: `{event_id, occurred_at, schema_version, producer, payload}`. +- **Schema registry:** lightweight self-hosted (buf.build Schema Registry OSS or a tiny registry in `events/`). CI check blocks breaking changes without a version bump. +- **Evolution rules:** additive only within a major version; `reserved` for removed fields; new `schema_version` for breaking changes; consumers advertise the versions they accept. + +## Consequences +- One extra build step in `shared-types` (buf or protoc). +- Breaking event changes cost something — good; they should. +- ML pipelines can replay old events against new code with confidence. + +## Non-consequences +- No gRPC. HTTP stays HTTP/JSON. Protobuf is only the wire format on the event bus. \ No newline at end of file diff --git a/docs/architecture/data-model.md b/docs/architecture/data-model.md new file mode 100644 index 0000000..4b744b8 --- /dev/null +++ b/docs/architecture/data-model.md @@ -0,0 +1,87 @@ +# Data model + +Durable entities across modules. Per-module databases/schemas own these; cross-module access is only via the module's API. + +## Core entities + +``` +User auth + profile + id (uuid) + created_at + email (from IdP) + preferred_name? + deleted_at? soft-delete for 30-day recovery; hard-delete after + +IdentityLink auth + user_id + provider "google" | "apple" + provider_sub subject from IdP + created_at + +Session auth + user_id + sid (uuid) in JWT + issued_at + expires_at + revoked_at? + +Profile profile + user_id (pk) + timezone + quiet_hours jsonb: [{start,end,days}] + contexts jsonb: [{name,predicate}] introduced in Phase 2 + consents jsonb: {integration: {read,write,retain_days}} + +Credential integrations + user_id + provider "todoist" | "google_calendar" | ... + ciphertext sealed-box over {access, refresh, scopes, expires_at} + meta provider-specific (sync_token cursor for Todoist) + created_at + last_refreshed_at + revoked_at? + +Event events + event_id (ulid) + user_id + schema_version + kind e.g. "signals.task.updated" + occurred_at + ingested_at + payload protobuf bytes + +TipInstance recommender + tip_id (ulid) + user_id + policy_name "random" | "bandit.linucb" | "remote:v3" + policy_version + candidate_source "todoist" | "advice.library" | ... + context_snapshot jsonb: features seen at decision time + tip jsonb: {kind,title,body,source,deep_link,meta} + created_at + shown_at? set when the client reports render + reaction? "done" | "snooze" | "dismiss" | null + reacted_at? + delivery_id? fk if surfaced via notifier push + +Delivery notifier + delivery_id + user_id + tip_id + channel "webpush" | "apns" | "fcm" | "email" + dispatched_at + delivered_at? + failure_reason? +``` + +## Foreign-key discipline + +There are no cross-module FKs. Each module owns its tables. References by id are soft; consistency is maintained by events (user-deleted → every module cascades its own cleanup). + +## Deletion + +`User.deleted_at` set → a `user.deletion_requested` event goes out → each module soft-deletes its rows → after 30 days a scheduled job hard-deletes. Credentials are **revoked at the provider** (not just erased locally) on soft-delete. See `privacy.md`. + +## Replay and reproducibility + +`TipInstance.context_snapshot` captures the exact features that produced the decision. This is what lets offline replay re-score historical tips against a new policy without touching the feature store. \ No newline at end of file diff --git a/docs/architecture/metrics.md b/docs/architecture/metrics.md new file mode 100644 index 0000000..2153970 --- /dev/null +++ b/docs/architecture/metrics.md @@ -0,0 +1,43 @@ +# Metrics: measuring "magic" + +We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against. + +## North star + +**Week-2 tip-reaction rate** — of users who saw a tip in week 1, what fraction reacted to *any* tip in week 2? Captures "did this become part of your life." + +## Activation (single-session) + +- **Time-to-first-tip** — sign-in → tip rendered. Target: ≤ 60 s on the happy path. +- **First-tip reaction rate** — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%. + +## Engagement + +- **Dwell-before-action** — seconds between tip render and first reaction. Too short = glance-away; too long = confused. +- **Done rate / (Done + Snooze + Dismiss)** — the quality proxy. Rising = tips feel on-target. +- **Snooze:Dismiss ratio** — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes. +- **Return cadence** — median inter-session gap. Stable-and-short > spiky. + +## Retention + +- D1, D7, D28 retention. Cohort-sliced by connected integrations. +- Churn signal: 7 days without a session. + +## ML health (from M1) + +- Policy latency p50/p95/p99 at the recommender boundary. +- Feature null-rate per feature, per user. +- Online/offline reward disagreement for shadowed policies. +- Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates. + +## Privacy & trust + +- Account-deletion completion time (target: < 24 h). +- Provider-revocation success rate on disconnect. +- Number of active credentials per user (low = healthy). + +## How metrics become decisions + +- **Per-change.** Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge). +- **Shadow > A/B > launch.** Policy changes ship in shadow first (log what it *would* have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin. +- **Dashboards before features.** If we cannot measure a feature's impact on the north-star metric, we defer the feature. \ No newline at end of file diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 68d7728..164e319 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -3,22 +3,25 @@ ## Guiding constraints - The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip. -- Services are small and independently deployable, but we do **not** multiply services for its own sake. Split by team-of-ownership and by data lifecycle. -- Python for ML, TypeScript for applications, shared contracts regenerated from a single source of truth. +- Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003). +- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005). +- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`). -## Services +## Modules -| Service | Language | Responsibility | Owns data | -|---|---|---|---| -| `gateway` | TS (Node) | BFF for web/mobile; auth-checking; request fan-out | — | -| `auth` | TS | OAuth (Google, Apple), sessions, token issuance | identities, sessions | -| `profile` | TS | user profile, preferences, consents | profiles | -| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | -| `events` | TS | event-bus ingress, normalization, durable log | signal store | -| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | -| `ml/serving` | Python | online scoring for policies/models | — (stateless) | -| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | -| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | +| Module | Language | Responsibility | Owns data | Phase-0 process | +|---|---|---|---|---| +| `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith | +| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith | +| `profile` | TS | user profile, preferences, consents | profiles | Node monolith | +| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith | +| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) | +| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith | +| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) | +| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** | +| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) | + +Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds. ## Data boundaries @@ -36,9 +39,28 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as ## Why these choices -- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4. -- **Postgres** everywhere for OLTP. Per-service schemas, not per-service instances in dev. +- **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003). +- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it. +- **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract. - **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it. +- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this. +- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few. - **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam). - **MLflow** for model registry; artifacts in MinIO/S3. -- **Auth.js or Ory** for identity — we will not write crypto. +- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships. +- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff. + +## Decision flow for a new tip + +``` +client ─► gateway ─► recommender + │ + ├─► candidates: integrations.fetchCandidates(user) + advice.library + ├─► context: FeatureAssembler(user, request) + ├─► policy: PolicyRegistry.get(policyName).pick(candidates, context) + ├─► shadows: run shadow policies in parallel, log their picks + └─► persist: TipInstance{context_snapshot, policy, tip} + ◄─ tip +``` + +Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain. diff --git a/docs/architecture/privacy.md b/docs/architecture/privacy.md new file mode 100644 index 0000000..0c04835 --- /dev/null +++ b/docs/architecture/privacy.md @@ -0,0 +1,40 @@ +# Privacy architecture + +Privacy is a Phase 0 feature, not a Phase 5 compliance project. This doc is the minimum. + +## Principles + +1. **Data minimization.** Store only what we need for the tip. Raw task titles stay at Todoist; we store references + computed features. If a feature doesn't lift a metric, its input data doesn't get stored. +2. **User-visible controls.** Every connection shows exactly which scopes we hold and what we've computed. One tap disconnects and revokes. +3. **Deletion is real.** Deleting an account revokes provider tokens, purges credentials immediately, and soft-deletes user data for a 30-day recovery window, then hard-deletes. +4. **No surprise sharing.** Cross-user / collaborative features are opt-in, per category, per integration. +5. **Encryption in transit and at rest.** TLS everywhere; column-level encryption for credentials; disk-level for backups. + +## Flows + +### Connect +User taps "Connect Todoist" → consent screen lists: scopes requested, what we store, what we compute, retention, revocation instructions → OAuth → stored credential is immediately testable and shows in `/connect`. + +### Disconnect +User taps disconnect → `Credential.revoked_at` set → provider-side revocation attempted (Todoist: token revocation endpoint) → credential erased on success → `credential.revoked` event → downstream modules drop associated cursors, caches, derived features for that `(user, provider)` pair. + +### Delete account +User taps "Delete account" in settings → hard confirm → `User.deleted_at` set, all sessions revoked, `user.deletion_requested` event fanned out → every module processes its portion (credentials revoked + purged; profile scrubbed; tip history anonymized to aggregate stats only or purged, per retention policy; events purged on schedule) → within 24 hours account is non-recoverable operationally; within 30 days all rows are hard-deleted. + +### Export (Phase 2) +`GET /me/export` returns a JSON bundle of everything we hold for the user: profile, consents, credentials-metadata (not secrets), events, tip history. + +## Scope boundaries + +Each integration declares the scopes it requests and the features it derives. The `Profile.consents` column is the source of truth; a scope removed from consent short-circuits derived-feature computation at the feature store. + +## Audit + +- Privileged actions (admin-initiated deletions, credential decryption outside the normal refresh path) go to an append-only audit log from Phase 0. +- Per-user access log available via `GET /me/access-log` (Phase 2). + +## Legal surface (Phase 0 minimum) + +- Terms of Service + Privacy Policy documents shipped alongside the sign-in page. +- Consent capture on first sign-in, with a versioned ToS/PP hash stored per user. +- Data-subject request inbox (email) wired up before onboarding the first external user. \ No newline at end of file diff --git a/services/README.md b/services/README.md index 5cbe83b..8e468fc 100644 --- a/services/README.md +++ b/services/README.md @@ -1,13 +1,15 @@ # services/ -Backend microservices. Each directory is independently deployable, ships a `Dockerfile`, a `/health` endpoint, and its own `README.md` describing its contract. +Backend modules. Each owns a contract and ships its own `README.md`. In **Phase 0** these are internal packages inside a single Node process (ADR-0003); they extract to their own processes as pressure justifies. -| Dir | Role | Phase introduced | -|---|---|---| -| `gateway/` | BFF for clients; auth check; fan-out to services | 0 | -| `auth/` | OAuth (Google/Apple), sessions, JWT | 0 | -| `profile/` | user profile, preferences, consents | 0 | -| `integrations/` | third-party connectors + encrypted token vault (Todoist first) | 0 | -| `recommender/` | `POST /recommend` — policy-driven tip selection | 0 | -| `events/` | event bus ingress + durable signal store | 1 | -| `notifier/` | push/email/web delivery with quiet-hours | 3 | +| Dir | Role | Phase-0 shape | Extracts when | +|---|---|---|---| +| `gateway/` | BFF for clients; auth check; fan-out | in-proc router | never (stays as the edge) | +| `auth/` | Google OAuth (Apple in M1), sessions, JWT | Auth.js behind OIDC shape | mobile native ships (M3) | +| `profile/` | user profile, preferences, consents | in-proc module | team ownership diverges | +| `integrations/` | connectors + encrypted token vault | in-proc module | credential blast-radius isolation | +| `recommender/` | `POST /recommend` — policy-driven tip selection | in-proc; calls `ml/serving` from M1 | scaling hotspot | +| `events/` | event bus + signal log | in-proc emitter (Phase 0); NATS (M1) | always a library + broker, not a service | +| `notifier/` | push/email delivery + quiet hours | in-proc; **web push in M1** | SLA divergence or mobile push scale | + +Contracts that cross module lines (HTTP or events) come from `packages/shared-types/`. In-module imports across modules are forbidden by import lint. diff --git a/services/integrations/README.md b/services/integrations/README.md index da343a8..3607e9f 100644 --- a/services/integrations/README.md +++ b/services/integrations/README.md @@ -7,11 +7,14 @@ Third-party connectors and the token vault. ```ts interface Connector { id: string // e.g. "todoist" + scopes: string[] // human-readable list shown in consent UI beginOAuth(user): Promise<{ redirectUrl, state }> finishOAuth(code, state): Promise fetchSignals(user, since?): AsyncIterable - // optional write-back, e.g. mark task done - act?(user, action): Promise + // incremental-sync cursor (Todoist sync_token, webhook timestamps, etc.) + // stored in Credential.meta; the connector owns its shape. + act?(user, action): Promise // optional write-back (complete task, etc.) + revoke(user): Promise // REQUIRED: provider-side token revocation on disconnect } ``` diff --git a/services/recommender/README.md b/services/recommender/README.md index 70be145..4f4f7c5 100644 --- a/services/recommender/README.md +++ b/services/recommender/README.md @@ -16,12 +16,14 @@ POST /feedback ## Internals (stable seams) - **Candidate sources** — pluggable async generators. v0: Todoist tasks via `integrations`. Later: advice library, calendar nudges, health prompts. -- **Context assembler** — merges request context with features (inline now, feature-store later). -- **Policy** — `Policy.pick(candidates, context) → tip`. Registered by name: +- **Feature assembler** — fills the `context` blob (inline in Phase 0; calls feature store from M1). Never inlined into policy code. +- **Policy registry** — `Policy.pick(candidates, context) → tip`. Named entries: - `random` — v0 (Phase 0). - - `bandit.linucb` — v1 (Phase 1). + - `bandit.linucb.pooled` — v1 (Phase 1). **Global-then-personalize**: pooled features shared across users; per-user residual once data allows. - `remote` — delegates to `ml/serving` FastAPI scorer (Phase 1+). +- **Shadow hook** — every request optionally runs N shadow policies in parallel and logs their picks + estimated rewards. Promotion from shadow → A/B → launch is a separate, deliberate step (ADR-0002). +- **TipInstance persistence** — every decision writes `context_snapshot` (features seen at decision time). This is what makes offline replay honest. ## Phase 0 goal -`RandomPolicy` only. The service, contract, and seams exist; the brain does not yet. +`RandomPolicy` only. The service, contract, registry, shadow hook, and tip-instance persistence all exist; no ML yet.