refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
This commit is contained in:
90
PLAN.md
90
PLAN.md
@@ -1,71 +1,85 @@
|
||||
# Implementation plan
|
||||
|
||||
Step-by-step build order for Phase 0 (prototype) and the seams that make Phases 1–5 cheap.
|
||||
Step-by-step build order for Phase 0 (walking skeleton) and the seams that make Phases 1–5 cheap.
|
||||
|
||||
The principle: **build the contracts first, stub the internals.** Every service should exist with a `/health` endpoint and a minimal real implementation of its interface before any service is "finished". This gives us an end-to-end walking skeleton from week one.
|
||||
The principle: **build the contracts first, stub the internals.** Every module exposes its contract and a `/health` story before any module is "finished". End-to-end walking skeleton in the first week.
|
||||
|
||||
**Packaging reminder (ADR-0003):** Phase 0 is a modular monolith — one Node process bundles `services/*` behind their HTTP contracts, plus `ml/serving` as a separate Python process. Contracts are identical whether the call is in-process or over the wire.
|
||||
|
||||
---
|
||||
|
||||
## Stage 0 — Foundations (days 1–3)
|
||||
|
||||
1. **Monorepo tooling.** pnpm workspaces for JS/TS; uv or poetry for Python; turbo or nx for build graph; pre-commit (lint, typecheck, format).
|
||||
2. **Docker Compose dev env.** Postgres, NATS, MinIO (S3), Mailhog, all services wired with hot-reload.
|
||||
3. **CI skeleton** (Gitea Actions): lint → typecheck → unit test → build → publish images.
|
||||
4. **Secrets convention.** `.env.example` per service; prod secrets injected by orchestrator.
|
||||
5. **Shared types package.** OpenAPI source → generated TS + Python clients.
|
||||
1. **Monorepo tooling.** pnpm workspaces for TS; uv for Python; turbo for build graph; pre-commit (eslint, prettier, ruff, mypy, typecheck).
|
||||
2. **Docker Compose dev env** with profiles:
|
||||
- `core` — Node monolith + `ml/serving` stub + Postgres.
|
||||
- `full` — adds NATS, MinIO, MailHog. Needed from Stage 4 onward.
|
||||
3. **CI skeleton** (Gitea Actions): lint → typecheck → unit → build → publish images. Schema-registry check for protobuf events (added in Phase 1, but pipeline stub now).
|
||||
4. **Secrets convention.** `.env.example` per module; prod injected by orchestrator.
|
||||
5. **Shared types.** OpenAPI for HTTP, protobuf for events (ADR-0005). Generate TS; Python pydantic models hand-written initially (few consumers).
|
||||
6. **Import-boundary lint.** `eslint-plugin-boundaries` (or equivalent) prevents `services/integrations` from importing `services/recommender` internals. Contracts-only.
|
||||
|
||||
Deliverable: `docker compose up` brings a green dashboard of `/health` endpoints.
|
||||
Exit: `docker compose --profile core up` brings a green dashboard of `/health` endpoints.
|
||||
|
||||
## Stage 1 — Identity & session (days 4–7)
|
||||
|
||||
1. `services/auth`: Google OAuth2 (PKCE), session cookies, short-lived JWTs, refresh rotation. Library-backed (Auth.js or Ory Kratos + Hydra) — we do not roll our own.
|
||||
2. `services/profile`: minimal `User` record; created on first sign-in.
|
||||
3. `apps/web` sign-in page; gateway verifies JWT.
|
||||
1. `services/auth` module: Auth.js embedded in the Node monolith, Google provider only (Apple deferred). OIDC-shaped surface (ADR-0004): `/me`, `/logout`, JWKS, stub `/.well-known/openid-configuration`.
|
||||
2. `services/profile` module: `User` row created on first sign-in; consent record captured with ToS/PP version hash.
|
||||
3. `apps/web` sign-in page. Gateway (also in-process) verifies JWT.
|
||||
4. **Deletion endpoint** (yes, already): `DELETE /me` — revokes sessions, flips `deleted_at`, emits `user.deletion_requested`.
|
||||
|
||||
Exit check: a user can sign in and fetch their own profile.
|
||||
Exit: a user can sign in, see their profile, and delete their account; deletion is observable end-to-end even though there's no data to erase yet.
|
||||
|
||||
## Stage 2 — Integrations framework (days 8–12)
|
||||
|
||||
1. `services/integrations` with a **Connector** interface:
|
||||
- `begin_oauth(user) → redirect_url`
|
||||
- `finish_oauth(code, state) → StoredCredential`
|
||||
- `fetch_signals(user, since) → Event[]`
|
||||
2. **Token vault**: column-level encryption (libsodium), key from env or KMS.
|
||||
3. **Todoist connector** as the first concrete implementation.
|
||||
4. Web "Connect" page: list of connectors, button per connector, callback handling.
|
||||
1. `services/integrations` module with a **Connector** interface:
|
||||
- `beginOAuth(user) → {redirectUrl, state}`
|
||||
- `finishOAuth(code, state) → StoredCredential`
|
||||
- `fetchSignals(user, since?) → AsyncIterable<NormalizedEvent>`
|
||||
- `act?(user, action) → void`
|
||||
- `revoke(user) → void` — first-class; no revocation means no disconnect.
|
||||
2. **Token vault**: libsodium sealed box, key from env/KMS. One row per `(user, provider)` with provider-specific `meta` (e.g. Todoist `sync_token`).
|
||||
3. **Todoist connector**: OAuth2, Sync API incremental reads via `sync_token`, `act` to complete a task, `revoke` calls Todoist's token-revocation endpoint.
|
||||
4. Web `/connect`: list of connectors, per-connector consent screen (scopes + retention), connect/disconnect.
|
||||
|
||||
Exit check: a user taps "Connect Todoist", completes the OAuth dance, and the integrations service can fetch their tasks on demand.
|
||||
Exit: a user can connect and disconnect Todoist; disconnect revokes at Todoist and wipes local credentials.
|
||||
|
||||
## Stage 3 — Recommender contract (days 13–16)
|
||||
|
||||
1. `services/recommender` exposes `POST /recommend {user_id, context} → {tip}`.
|
||||
2. Policy interface (`Policy.pick(user, candidates, context) → tip`).
|
||||
3. **`RandomPolicy` v0** — fetches candidates from `integrations` (Todoist tasks), returns one uniformly at random.
|
||||
4. Tip shape is provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`.
|
||||
5. `apps/web` tip page: full black, one tip centered, tap = mark done → callback fires to integrations (complete Todoist task) + emits a feedback event.
|
||||
1. `services/recommender` module exposes `POST /recommend` and `POST /feedback`.
|
||||
2. **Policy registry** keyed by name. **Candidate sources** registered independently; v0 source = `integrations.todoist.tasks`.
|
||||
3. **`RandomPolicy` v0** — draws uniformly.
|
||||
4. **Tip shape** provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`.
|
||||
5. **`TipInstance` persisted** with `context_snapshot` — the features-seen-at-decision-time blob that makes offline replay possible later.
|
||||
6. `apps/web` tip page:
|
||||
- `kind=todo` → tap = done (calls `integrations.todoist.act(complete)`).
|
||||
- `kind=advice` → tap = acknowledge; long-press = save.
|
||||
- Snooze / dismiss via long-press menu regardless of kind.
|
||||
- Every reaction emits a feedback event even though it's in-process today.
|
||||
|
||||
Exit check: three-page prototype works end-to-end for one user.
|
||||
Exit: three-page prototype works end-to-end.
|
||||
|
||||
## Stage 4 — Hardening the prototype (days 17–20)
|
||||
## Stage 4 — Hardening (days 17–20)
|
||||
|
||||
1. Error surfaces (Sentry), structured logs (pino / structlog), trace IDs across services.
|
||||
2. Rate limits + retries on outbound API calls.
|
||||
3. Integration tests: Playwright for the web flow, pact-style contract tests between services.
|
||||
4. Deploy to a single VM via docker-compose + Caddy.
|
||||
1. Observability: pino + structlog, Sentry per module, W3C traceparent across the monolith boundary and into `ml/serving`.
|
||||
2. Rate limits, retries with jitter, and circuit breakers on outbound (Todoist, Google).
|
||||
3. Integration tests: Playwright for the web flow (sign-in → connect → tip → delete). Contract tests between modules so the extractions later are safe.
|
||||
4. **Metrics baseline wired** (`docs/architecture/metrics.md`): activation, first-tip reaction, dwell, snooze:dismiss ratio, D1 retention.
|
||||
5. Deploy to a single VM via docker-compose + Caddy; Caddy auto-TLS; healthchecks wired to Caddy.
|
||||
|
||||
Exit check: Phase 0 milestone closed.
|
||||
Exit: Phase 0 milestone closed; real users can be onboarded.
|
||||
|
||||
---
|
||||
|
||||
## Seams prepared for later phases (do not implement yet, but do not foreclose)
|
||||
## Seams prepared for later phases (designed now, implemented later)
|
||||
|
||||
- **Event bus.** From day one, `integrations` and `recommender` speak through an async fn that today is an in-process call but will be NATS tomorrow. Keep the signature `(event: NormalizedEvent) → void`.
|
||||
- **Feature store.** The recommender accepts a `context` blob; later, a feature service fills it. Do not inline feature lookups inside the policy.
|
||||
- **Policy registry.** `PolicyFactory.get(name)` so A/B and bandit policies slot in without code changes to the gateway.
|
||||
- **Python boundary.** Recommender is TS today, but its scoring function is isolated — moving to FastAPI in Phase 1 is a file move, not a refactor.
|
||||
- **Event bus abstraction.** `emit(event)` / `subscribe(topic, handler)` today is in-process; the production implementation in Phase 1 is NATS JetStream. Callsites never change.
|
||||
- **Feature assembler.** Recommender accepts a `context` blob from a `FeatureAssembler`; in Phase 0 it returns a hard-coded minimum; in Phase 1 it calls the feature store.
|
||||
- **Shadow-policy hook.** The recommender already supports running N policies in shadow per request; v0 runs zero shadows but the hook exists.
|
||||
- **Extraction-ready modules.** Every `services/*/` has a `serve.ts` that can be mounted in the monolith or booted standalone. Dockerfile targets both.
|
||||
|
||||
---
|
||||
|
||||
## Staffing assumption
|
||||
|
||||
Work is parallelizable across ~3 streams: **infra/platform**, **backend services**, **web app**. Each Gitea issue notes which stream and which phase (milestone) it belongs to.
|
||||
Three parallel streams: **platform** (infra, CI, shared-types), **backend** (auth, profile, integrations, recommender), **web** (sign-in, connect, tip, PWA). `ml` joins in Phase 1. Each Gitea issue carries its stream label and milestone.
|
||||
Reference in New Issue
Block a user