refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
This commit is contained in:
67
CLAUDE.md
67
CLAUDE.md
@@ -8,66 +8,73 @@ The magic is the product. Precision + timing + minimalism. The UI shows a single
|
||||
|
||||
## Prime directives
|
||||
|
||||
1. **Modular, service-oriented from day one.** Even the prototype. We will scale to mobile (iOS/Android), many integrations, multi-tenant ML. Shortcuts that bake in a monolith are not acceptable.
|
||||
2. **Recommendation engine is the core.** Every other service feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
|
||||
3. **Python owns ML.** Everything training, features, serving for models is Python (FastAPI + PyTorch/scikit + MLflow/feast). Application services are TypeScript (Node, Next.js) unless there's a reason.
|
||||
1. **Modular by package, deployable by stage.** Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = **modular monolith + Python ML sidecar**. See ADR-0003.
|
||||
2. **Recommendation engine is the core.** Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
|
||||
3. **Python owns ML.** Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason.
|
||||
4. **OAuth-first for identity and integrations.** Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently.
|
||||
5. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished.
|
||||
5. **Privacy is a feature, not a phase.** Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed.
|
||||
6. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. The tip page is a watch face.
|
||||
|
||||
## Architecture (high level)
|
||||
|
||||
The tree below is **logical module structure**. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003).
|
||||
|
||||
```
|
||||
apps/ user-facing clients
|
||||
web/ Next.js PWA — the first shipped client
|
||||
mobile-ios/ Swift/SwiftUI (Phase 3)
|
||||
mobile-android/ Kotlin/Compose (Phase 3)
|
||||
|
||||
services/ backend microservices (each independently deployable)
|
||||
gateway/ API gateway + BFF (GraphQL or tRPC)
|
||||
services/ backend modules — each owns a contract; may share a deployable
|
||||
gateway/ BFF for clients; auth check; fan-out
|
||||
auth/ OAuth (Google, Apple, ...), sessions, JWT issuance
|
||||
profile/ user profile, preferences, consents
|
||||
integrations/ third-party connectors (Todoist first); token vault
|
||||
recommender/ Python; serves the "one best tip" decision
|
||||
events/ event bus ingress (Kafka/NATS) + signal store
|
||||
notifier/ push/email/web delivery of tips
|
||||
integrations/ third-party connectors + token vault (Todoist first)
|
||||
recommender/ orchestration: candidates → policy → tip; feedback sink
|
||||
events/ event bus ingress + durable signal store
|
||||
notifier/ push/email/web delivery (web push from Phase 1)
|
||||
|
||||
packages/ shared libraries
|
||||
shared-types/ OpenAPI/proto-generated types
|
||||
packages/ shared libraries (importable across services + apps)
|
||||
shared-types/ HTTP types via OpenAPI; event types via protobuf (ADR-0005)
|
||||
sdk-js/ client SDK used by web + mobile webviews
|
||||
ui/ shared React components + design tokens
|
||||
|
||||
ml/ Python MLOps
|
||||
pipelines/ training / batch feature pipelines (Airflow/Prefect)
|
||||
features/ feature definitions (Feast-style)
|
||||
registry/ model registry (MLflow) integration
|
||||
experiments/ A/B testing framework + bandit policies
|
||||
serving/ online inference service (FastAPI)
|
||||
notebooks/ research only — not production
|
||||
ml/ Python — separate deployable from day one
|
||||
serving/ online scorer (FastAPI), called by recommender
|
||||
features/ feature definitions + store adapter
|
||||
pipelines/ batch feature + training DAGs (Prefect/Airflow)
|
||||
registry/ MLflow model registry integration
|
||||
experiments/ assignment + A/B + bandit policies
|
||||
notebooks/ research only; never imported by production code
|
||||
|
||||
infra/ docker-compose, k8s manifests, terraform, CI
|
||||
infra/ docker-compose (Phase 0), k3s/k8s (later), terraform, CI
|
||||
docs/ architecture notes, ADRs, API specs
|
||||
```
|
||||
|
||||
## Contracts between services
|
||||
**Phase 0 deployables:** one Node process (`services/*` bundled via modular monolith) + one Python process (`ml/serving`, stubbed until M1) + Postgres + NATS. Services **extract to their own process** when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003.
|
||||
|
||||
- **Events** (Kafka/NATS) — source of truth for user signals. All integrations emit normalized events; the recommender reads them.
|
||||
- **HTTP/gRPC** — synchronous request/response (gateway → services).
|
||||
- **Shared schemas** live in `packages/shared-types`; generated from a single OpenAPI / proto source. Do not redefine types per service.
|
||||
## Contracts between modules
|
||||
|
||||
- **HTTP** (OpenAPI, in `packages/shared-types/http/`) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical.
|
||||
- **Events** (Protocol Buffers, in `packages/shared-types/events/`) — durable signals + feedback. Today: in-process event emitter. Tomorrow: NATS JetStream. Schema registry enforced in CI (ADR-0005).
|
||||
- Do not redefine types per module. Regenerate from `shared-types`.
|
||||
|
||||
## Conventions
|
||||
|
||||
- Every service ships a `README.md`, a `Dockerfile`, and a `/health` endpoint.
|
||||
- One PR = one concern. Commits follow conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
|
||||
- Each module ships a `README.md` describing its contract, its `/health` story, and its extraction criteria (when it should become its own process).
|
||||
- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
|
||||
- ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
|
||||
- No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
|
||||
- Compose profiles (`core`, `full`) so devs can run a subset without 16 GB of RAM.
|
||||
|
||||
## Definition of done (per feature)
|
||||
|
||||
1. Code + tests merged.
|
||||
2. Service's `README.md` updated.
|
||||
2. Module's `README.md` updated.
|
||||
3. If it changes a contract → `shared-types` regenerated + consumers updated.
|
||||
4. If it changes architecture → ADR added.
|
||||
5. Deployable via `docker compose up` locally.
|
||||
6. If it touches user data → a deletion path exists and is tested.
|
||||
|
||||
## Current phase
|
||||
|
||||
@@ -75,7 +82,9 @@ docs/ architecture notes, ADRs, API specs
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- Don't copy Todoist's data into our DB. Store the OAuth token; fetch on demand.
|
||||
- Don't implement auth by hand. Use a library (NextAuth / Auth.js, Ory, or Clerk-compatible). We will self-host.
|
||||
- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
|
||||
- Don't implement auth by hand. Phase 0 uses **Auth.js** behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
|
||||
- Don't hardwire a recommender. The "random todo" v0 must live behind the same interface the real ML model will implement (`POST /recommend` → `{tip}`). Swap internals, keep contract.
|
||||
- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
|
||||
- Don't build an admin UI before the user-facing black page is polished.
|
||||
- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
|
||||
|
||||
90
PLAN.md
90
PLAN.md
@@ -1,71 +1,85 @@
|
||||
# Implementation plan
|
||||
|
||||
Step-by-step build order for Phase 0 (prototype) and the seams that make Phases 1–5 cheap.
|
||||
Step-by-step build order for Phase 0 (walking skeleton) and the seams that make Phases 1–5 cheap.
|
||||
|
||||
The principle: **build the contracts first, stub the internals.** Every service should exist with a `/health` endpoint and a minimal real implementation of its interface before any service is "finished". This gives us an end-to-end walking skeleton from week one.
|
||||
The principle: **build the contracts first, stub the internals.** Every module exposes its contract and a `/health` story before any module is "finished". End-to-end walking skeleton in the first week.
|
||||
|
||||
**Packaging reminder (ADR-0003):** Phase 0 is a modular monolith — one Node process bundles `services/*` behind their HTTP contracts, plus `ml/serving` as a separate Python process. Contracts are identical whether the call is in-process or over the wire.
|
||||
|
||||
---
|
||||
|
||||
## Stage 0 — Foundations (days 1–3)
|
||||
|
||||
1. **Monorepo tooling.** pnpm workspaces for JS/TS; uv or poetry for Python; turbo or nx for build graph; pre-commit (lint, typecheck, format).
|
||||
2. **Docker Compose dev env.** Postgres, NATS, MinIO (S3), Mailhog, all services wired with hot-reload.
|
||||
3. **CI skeleton** (Gitea Actions): lint → typecheck → unit test → build → publish images.
|
||||
4. **Secrets convention.** `.env.example` per service; prod secrets injected by orchestrator.
|
||||
5. **Shared types package.** OpenAPI source → generated TS + Python clients.
|
||||
1. **Monorepo tooling.** pnpm workspaces for TS; uv for Python; turbo for build graph; pre-commit (eslint, prettier, ruff, mypy, typecheck).
|
||||
2. **Docker Compose dev env** with profiles:
|
||||
- `core` — Node monolith + `ml/serving` stub + Postgres.
|
||||
- `full` — adds NATS, MinIO, MailHog. Needed from Stage 4 onward.
|
||||
3. **CI skeleton** (Gitea Actions): lint → typecheck → unit → build → publish images. Schema-registry check for protobuf events (added in Phase 1, but pipeline stub now).
|
||||
4. **Secrets convention.** `.env.example` per module; prod injected by orchestrator.
|
||||
5. **Shared types.** OpenAPI for HTTP, protobuf for events (ADR-0005). Generate TS; Python pydantic models hand-written initially (few consumers).
|
||||
6. **Import-boundary lint.** `eslint-plugin-boundaries` (or equivalent) prevents `services/integrations` from importing `services/recommender` internals. Contracts-only.
|
||||
|
||||
Deliverable: `docker compose up` brings a green dashboard of `/health` endpoints.
|
||||
Exit: `docker compose --profile core up` brings a green dashboard of `/health` endpoints.
|
||||
|
||||
## Stage 1 — Identity & session (days 4–7)
|
||||
|
||||
1. `services/auth`: Google OAuth2 (PKCE), session cookies, short-lived JWTs, refresh rotation. Library-backed (Auth.js or Ory Kratos + Hydra) — we do not roll our own.
|
||||
2. `services/profile`: minimal `User` record; created on first sign-in.
|
||||
3. `apps/web` sign-in page; gateway verifies JWT.
|
||||
1. `services/auth` module: Auth.js embedded in the Node monolith, Google provider only (Apple deferred). OIDC-shaped surface (ADR-0004): `/me`, `/logout`, JWKS, stub `/.well-known/openid-configuration`.
|
||||
2. `services/profile` module: `User` row created on first sign-in; consent record captured with ToS/PP version hash.
|
||||
3. `apps/web` sign-in page. Gateway (also in-process) verifies JWT.
|
||||
4. **Deletion endpoint** (yes, already): `DELETE /me` — revokes sessions, flips `deleted_at`, emits `user.deletion_requested`.
|
||||
|
||||
Exit check: a user can sign in and fetch their own profile.
|
||||
Exit: a user can sign in, see their profile, and delete their account; deletion is observable end-to-end even though there's no data to erase yet.
|
||||
|
||||
## Stage 2 — Integrations framework (days 8–12)
|
||||
|
||||
1. `services/integrations` with a **Connector** interface:
|
||||
- `begin_oauth(user) → redirect_url`
|
||||
- `finish_oauth(code, state) → StoredCredential`
|
||||
- `fetch_signals(user, since) → Event[]`
|
||||
2. **Token vault**: column-level encryption (libsodium), key from env or KMS.
|
||||
3. **Todoist connector** as the first concrete implementation.
|
||||
4. Web "Connect" page: list of connectors, button per connector, callback handling.
|
||||
1. `services/integrations` module with a **Connector** interface:
|
||||
- `beginOAuth(user) → {redirectUrl, state}`
|
||||
- `finishOAuth(code, state) → StoredCredential`
|
||||
- `fetchSignals(user, since?) → AsyncIterable<NormalizedEvent>`
|
||||
- `act?(user, action) → void`
|
||||
- `revoke(user) → void` — first-class; no revocation means no disconnect.
|
||||
2. **Token vault**: libsodium sealed box, key from env/KMS. One row per `(user, provider)` with provider-specific `meta` (e.g. Todoist `sync_token`).
|
||||
3. **Todoist connector**: OAuth2, Sync API incremental reads via `sync_token`, `act` to complete a task, `revoke` calls Todoist's token-revocation endpoint.
|
||||
4. Web `/connect`: list of connectors, per-connector consent screen (scopes + retention), connect/disconnect.
|
||||
|
||||
Exit check: a user taps "Connect Todoist", completes the OAuth dance, and the integrations service can fetch their tasks on demand.
|
||||
Exit: a user can connect and disconnect Todoist; disconnect revokes at Todoist and wipes local credentials.
|
||||
|
||||
## Stage 3 — Recommender contract (days 13–16)
|
||||
|
||||
1. `services/recommender` exposes `POST /recommend {user_id, context} → {tip}`.
|
||||
2. Policy interface (`Policy.pick(user, candidates, context) → tip`).
|
||||
3. **`RandomPolicy` v0** — fetches candidates from `integrations` (Todoist tasks), returns one uniformly at random.
|
||||
4. Tip shape is provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`.
|
||||
5. `apps/web` tip page: full black, one tip centered, tap = mark done → callback fires to integrations (complete Todoist task) + emits a feedback event.
|
||||
1. `services/recommender` module exposes `POST /recommend` and `POST /feedback`.
|
||||
2. **Policy registry** keyed by name. **Candidate sources** registered independently; v0 source = `integrations.todoist.tasks`.
|
||||
3. **`RandomPolicy` v0** — draws uniformly.
|
||||
4. **Tip shape** provider-agnostic: `{id, kind: "todo"|"advice", title, body, source, deep_link, meta}`.
|
||||
5. **`TipInstance` persisted** with `context_snapshot` — the features-seen-at-decision-time blob that makes offline replay possible later.
|
||||
6. `apps/web` tip page:
|
||||
- `kind=todo` → tap = done (calls `integrations.todoist.act(complete)`).
|
||||
- `kind=advice` → tap = acknowledge; long-press = save.
|
||||
- Snooze / dismiss via long-press menu regardless of kind.
|
||||
- Every reaction emits a feedback event even though it's in-process today.
|
||||
|
||||
Exit check: three-page prototype works end-to-end for one user.
|
||||
Exit: three-page prototype works end-to-end.
|
||||
|
||||
## Stage 4 — Hardening the prototype (days 17–20)
|
||||
## Stage 4 — Hardening (days 17–20)
|
||||
|
||||
1. Error surfaces (Sentry), structured logs (pino / structlog), trace IDs across services.
|
||||
2. Rate limits + retries on outbound API calls.
|
||||
3. Integration tests: Playwright for the web flow, pact-style contract tests between services.
|
||||
4. Deploy to a single VM via docker-compose + Caddy.
|
||||
1. Observability: pino + structlog, Sentry per module, W3C traceparent across the monolith boundary and into `ml/serving`.
|
||||
2. Rate limits, retries with jitter, and circuit breakers on outbound (Todoist, Google).
|
||||
3. Integration tests: Playwright for the web flow (sign-in → connect → tip → delete). Contract tests between modules so the extractions later are safe.
|
||||
4. **Metrics baseline wired** (`docs/architecture/metrics.md`): activation, first-tip reaction, dwell, snooze:dismiss ratio, D1 retention.
|
||||
5. Deploy to a single VM via docker-compose + Caddy; Caddy auto-TLS; healthchecks wired to Caddy.
|
||||
|
||||
Exit check: Phase 0 milestone closed.
|
||||
Exit: Phase 0 milestone closed; real users can be onboarded.
|
||||
|
||||
---
|
||||
|
||||
## Seams prepared for later phases (do not implement yet, but do not foreclose)
|
||||
## Seams prepared for later phases (designed now, implemented later)
|
||||
|
||||
- **Event bus.** From day one, `integrations` and `recommender` speak through an async fn that today is an in-process call but will be NATS tomorrow. Keep the signature `(event: NormalizedEvent) → void`.
|
||||
- **Feature store.** The recommender accepts a `context` blob; later, a feature service fills it. Do not inline feature lookups inside the policy.
|
||||
- **Policy registry.** `PolicyFactory.get(name)` so A/B and bandit policies slot in without code changes to the gateway.
|
||||
- **Python boundary.** Recommender is TS today, but its scoring function is isolated — moving to FastAPI in Phase 1 is a file move, not a refactor.
|
||||
- **Event bus abstraction.** `emit(event)` / `subscribe(topic, handler)` today is in-process; the production implementation in Phase 1 is NATS JetStream. Callsites never change.
|
||||
- **Feature assembler.** Recommender accepts a `context` blob from a `FeatureAssembler`; in Phase 0 it returns a hard-coded minimum; in Phase 1 it calls the feature store.
|
||||
- **Shadow-policy hook.** The recommender already supports running N policies in shadow per request; v0 runs zero shadows but the hook exists.
|
||||
- **Extraction-ready modules.** Every `services/*/` has a `serve.ts` that can be mounted in the monolith or booted standalone. Dockerfile targets both.
|
||||
|
||||
---
|
||||
|
||||
## Staffing assumption
|
||||
|
||||
Work is parallelizable across ~3 streams: **infra/platform**, **backend services**, **web app**. Each Gitea issue notes which stream and which phase (milestone) it belongs to.
|
||||
Three parallel streams: **platform** (infra, CI, shared-types), **backend** (auth, profile, integrations, recommender), **web** (sign-in, connect, tip, PWA). `ml` joins in Phase 1. Each Gitea issue carries its stream label and milestone.
|
||||
78
README.md
78
README.md
@@ -69,48 +69,59 @@ docs/ architecture, adr, api
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Phase 0 — Prototype *(M0)*
|
||||
Goal: a single user can sign in, connect Todoist, and see one random Todoist task on a black page.
|
||||
- [ ] Monorepo scaffold, CI skeleton, docker-compose dev env
|
||||
- [ ] `auth` service with Google OAuth
|
||||
- [ ] `integrations/todoist` OAuth2 flow + encrypted token vault
|
||||
- [ ] `recommender` service with `RandomPolicy` (v0)
|
||||
- [ ] `apps/web` — three pages (sign-in, connect, tip)
|
||||
- [ ] Deploy to a single VM via docker-compose
|
||||
### Phase 0 — Walking skeleton *(M0)*
|
||||
Goal: a single user signs in with Google, connects Todoist, and sees one random Todoist task on a black page. Deletion works.
|
||||
- [ ] Monorepo scaffold, CI skeleton, docker-compose dev env with `core`/`full` profiles
|
||||
- [ ] `auth` on Auth.js with Google provider; OIDC-shaped boundary (ADR-0004)
|
||||
- [ ] `integrations/todoist` OAuth2 flow + encrypted token vault + provider-side revocation
|
||||
- [ ] `recommender` with `RandomPolicy`; stable `POST /recommend` contract
|
||||
- [ ] `apps/web` — three pages (sign-in, connect, tip); PWA manifest; offline reaction queue
|
||||
- [ ] ToS + Privacy Policy + consent capture on first sign-in
|
||||
- [ ] Account-deletion endpoint: revokes providers, purges credentials, soft-deletes profile
|
||||
- [ ] Metrics baseline: activation, first-tip reaction rate, dwell, retention (see `docs/architecture/metrics.md`)
|
||||
- [ ] Deploy modular monolith + `ml/serving` stub to a single VM via docker-compose + Caddy
|
||||
|
||||
### Phase 1 — Real signal *(M1)*
|
||||
Goal: the tip is picked, not drawn from a hat. Still Todoist-only.
|
||||
- [ ] Event bus (NATS) + ingestion from Todoist sync API
|
||||
- [ ] Feature store skeleton (Feast or homegrown) and the first five features (time-of-day, overdue count, task age, priority, project)
|
||||
- [ ] `ml/serving` FastAPI scoring endpoint; `recommender` calls it
|
||||
- [ ] `ContextualBanditPolicy` v1 (LinUCB) replacing `RandomPolicy`
|
||||
- [ ] Tip feedback loop: user reactions (done / snooze / dismiss) become rewards
|
||||
### Phase 1 — Real signal + in-the-moment delivery *(M1)*
|
||||
Goal: tips are picked, not drawn from a hat — and they arrive at the right moment on the web.
|
||||
- [ ] Event bus (NATS JetStream) with protobuf schemas (ADR-0005) + schema-registry CI gate
|
||||
- [ ] Todoist event-driven sync (emit `signals.task.*`)
|
||||
- [ ] Feature store skeleton + first five features (hour-of-day, overdue count, task age, priority, project)
|
||||
- [ ] `ml/serving` FastAPI scorer; `RemotePolicy` wrapper in recommender
|
||||
- [ ] **Global-then-personalize bandit**: pooled LinUCB over shared features, per-user residual when data allows
|
||||
- [ ] Shadow-deploy infra: every new policy logs what it *would* have picked; promotion requires reward-parity
|
||||
- [ ] Feedback loop: reactions → rewards; delayed rewards for tasks completed in Todoist directly
|
||||
- [ ] **Web Push notifications** (VAPID) so the "magic" shows up without opening the app
|
||||
- [ ] `notifier` (lite): web-push delivery, quiet-hours honoured, dedupe
|
||||
- [ ] Apple OAuth added (deferred from M0)
|
||||
|
||||
### Phase 2 — Multi-source user profile *(M2)*
|
||||
Goal: oO knows more than tasks.
|
||||
- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook
|
||||
### Phase 2 — Multi-source profile & trust *(M2)*
|
||||
Goal: oO knows more than tasks, and users can see/control what we know.
|
||||
- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook ingress
|
||||
- [ ] Unified `Profile` model (identity, preferences, contexts, consents)
|
||||
- [ ] Timing signals (location, idle, focus windows) via client-side probes
|
||||
- [ ] Advice library (curated tips, not only todos) + mixing policy
|
||||
- [ ] Timing signals (Page Visibility, Idle Detection, coarse location) — opt-in, transparent
|
||||
- [ ] Advice library + mixing policy (todo vs advice vs ambient)
|
||||
- [ ] User-facing data dashboard: what's stored, what's computed, export, delete-by-category
|
||||
- [ ] Cost/usage observability
|
||||
|
||||
### Phase 3 — Mobile & notifications *(M3)*
|
||||
### Phase 3 — Native mobile *(M3)*
|
||||
- [ ] iOS app (SwiftUI) with APNs push
|
||||
- [ ] Android app (Compose) with FCM push
|
||||
- [ ] `notifier` service with quiet-hours + per-channel rate limits
|
||||
- [ ] Rich notifications that deep-link to the tip page
|
||||
- [ ] `notifier` gains APNs + FCM channels, per-device rate limits
|
||||
- [ ] Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
|
||||
- [ ] Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold
|
||||
|
||||
### Phase 4 — MLOps at scale *(M4)*
|
||||
- [ ] Airflow/Prefect orchestrator for batch retrains
|
||||
- [ ] MLflow model registry + shadow deploys
|
||||
- [ ] Online `experiments` framework: A/B + multi-armed bandits as first-class
|
||||
- [ ] Cohort analysis + cross-user collaborative features (opt-in)
|
||||
- [ ] Model cards, fairness checks, drift monitoring
|
||||
- [ ] Prefect/Airflow for batch feature materialization + retraining
|
||||
- [ ] MLflow registry; shadow → A/B → launch pipeline as first-class
|
||||
- [ ] Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
|
||||
- [ ] Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
|
||||
- [ ] Drift monitoring (feature drift, prediction drift, reward drift); model cards per version
|
||||
|
||||
### Phase 5 — Production hardening *(M5)*
|
||||
- [ ] SOC2-style controls, audit logging, token rotation
|
||||
- [ ] k8s deploy + horizontal autoscaling
|
||||
- [ ] Multi-region failover, PITR backups
|
||||
- [ ] Public integration SDK so third parties can add sources
|
||||
- [ ] Audit logging, rotation of provider tokens + internal signing keys
|
||||
- [ ] **k3s** on existing VM, then k8s + HPA once multi-node justified (no cliff)
|
||||
- [ ] Multi-region failover, Postgres PITR, event-bus mirroring
|
||||
- [ ] Public integration SDK; sandbox tenancy for third-party connectors
|
||||
- [ ] Billing + subscription tiers
|
||||
|
||||
---
|
||||
@@ -123,4 +134,5 @@ Conventions and per-service guidance live in [`CLAUDE.md`](CLAUDE.md).
|
||||
|
||||
## License
|
||||
|
||||
TBD.
|
||||
All rights reserved — 2026. Contact the owner for licensing inquiries.
|
||||
(We'll switch to an OSS license for non-sensitive packages once the public SDK lands in Phase 5.)
|
||||
|
||||
31
docs/adr/0003-modular-monolith-phase0.md
Normal file
31
docs/adr/0003-modular-monolith-phase0.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# ADR-0003: Modular monolith for Phase 0, extract when justified
|
||||
|
||||
## Status
|
||||
Accepted — 2026-04-13
|
||||
|
||||
## Context
|
||||
The initial architecture called for seven independently-deployable services on day one (gateway, auth, profile, integrations, recommender, events, notifier). For a team of ~3 streams with zero users, this is premature. Each service adds CI, deploy, DB, observability, and release-coordination overhead. It also slows the walking skeleton, which is the most important thing to ship.
|
||||
|
||||
Modularity — the thing we actually need — is a **code-boundary** property, not a **process-boundary** property. Well-bounded packages extract to services cheaply; poorly-bounded services rarely merge back.
|
||||
|
||||
## Decision
|
||||
- **Phase 0:** one Node process bundles `services/*` as internal packages behind their HTTP contracts. `ml/serving` is a separate Python process (language boundary). Postgres + NATS complete the stack.
|
||||
- **Directory layout** under `services/` is unchanged. Each module is a self-contained package with its own README, schema migrations, and public interface.
|
||||
- **Communication** between modules goes through the same HTTP or event contracts it will use post-extraction. In Phase 0 these are resolved in-process via a thin dispatcher; swapping to HTTP/NATS is a transport change, not an API change.
|
||||
- **Extraction criteria** (trigger a service split when any apply):
|
||||
1. Language boundary (already true for `ml/serving`).
|
||||
2. Scaling hotspot: the module's load curve diverges materially from the rest.
|
||||
3. SLA divergence: the module needs stricter availability or latency than the monolith.
|
||||
4. Team ownership: a dedicated team takes the module and wants independent releases.
|
||||
5. Regulatory isolation: credentials/PII need tighter blast-radius control.
|
||||
- **`events/` is special:** even inside the monolith we use an event-emitter abstraction whose production implementation is NATS JetStream. The async boundary matters for ML correctness; the process boundary doesn't.
|
||||
|
||||
## Consequences
|
||||
- Faster Phase 0: one CI pipeline, one deploy, one observability config.
|
||||
- Cheap extraction: contracts are already HTTP/event-shaped.
|
||||
- Discipline required: no cross-module DB access, no reaching into another module's internals, even though it's physically possible. Enforced by lint/import rules.
|
||||
- Deploy story: docker-compose with two application containers (Node monolith + Python serving) until extraction begins. Compose profiles let devs bring up subsets.
|
||||
|
||||
## Non-consequences
|
||||
- We are **not** monolith-forever. We fully expect `integrations/` and `recommender/` to extract once Phase 2+ traffic patterns justify it.
|
||||
- Frontend / mobile unaffected.
|
||||
23
docs/adr/0004-auth-authjs-with-oidc-boundary.md
Normal file
23
docs/adr/0004-auth-authjs-with-oidc-boundary.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# ADR-0004: Auth.js for Phase 0, dedicated OIDC provider when mobile ships
|
||||
|
||||
## Status
|
||||
Accepted — 2026-04-13
|
||||
|
||||
## Context
|
||||
We need Google (and later Apple) sign-in, session management, and JWTs other services can verify. Options considered:
|
||||
- **Auth.js (NextAuth):** a library embedded in the Next.js web app. Fastest to ship. Tight coupling to the web runtime; awkward when a native mobile client also needs tokens.
|
||||
- **Ory Kratos + Hydra:** a standalone, self-hosted identity + OIDC provider. Much more powerful. Operationally heavy for a prototype.
|
||||
- **Roll our own:** not considered.
|
||||
|
||||
Mobile apps are Phase 3+. Phase 0 needs the cheapest credible option that does not box us in.
|
||||
|
||||
## Decision
|
||||
- **Phase 0:** use **Auth.js** inside the web app. Google provider only (Apple deferred — paid dev account + extra domain setup).
|
||||
- **Boundary:** from day one, the `auth` module exposes an **OIDC-shaped** HTTP surface (`/me`, `/logout`, JWT verification via public JWKS, `/.well-known/openid-configuration` stub). Other services verify JWTs against that surface, not against Auth.js internals. This means the day we replace the engine, only one module changes.
|
||||
- **JWT strategy:** short-lived (10 min) access JWT, rotating refresh token in an HttpOnly cookie. JWT contains `sub`, `email`, `scope`, `sid`.
|
||||
- **Trigger to migrate to Ory (or equivalent):** any of — (a) native mobile shipping, (b) a second client type that can't piggyback on Next.js sessions, (c) multi-tenant requirement.
|
||||
|
||||
## Consequences
|
||||
- Ships in days, not weeks.
|
||||
- The OIDC-shaped boundary means the migration is scoped, not scary.
|
||||
- Slight duplication early: we maintain OIDC-surface code that Auth.js mostly handles internally. Worth it.
|
||||
28
docs/adr/0005-event-schemas-protobuf.md
Normal file
28
docs/adr/0005-event-schemas-protobuf.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# ADR-0005: Protocol Buffers for event schemas, OpenAPI for HTTP
|
||||
|
||||
## Status
|
||||
Accepted — 2026-04-13
|
||||
|
||||
## Context
|
||||
Two contract surfaces exist:
|
||||
1. **HTTP** — synchronous, client ↔ server, human-readable debugging matters. OpenAPI is the default and generates decent TS clients.
|
||||
2. **Events** — durable, fan-out to ML consumers, schema evolution critical. Feature pipelines trained on old schemas will silently misbehave when producers change a field.
|
||||
|
||||
Using OpenAPI for both means:
|
||||
- Python pydantic generation is awkward and hand-maintained in practice.
|
||||
- No wire-format discipline (JSON is loose).
|
||||
- No central schema registry, so schema drift is undetected until a model regresses.
|
||||
|
||||
## Decision
|
||||
- **HTTP** contracts: OpenAPI 3.1 in `packages/shared-types/http/`. Generate TS clients; hand-write Python pydantic models for ML consumers (few, and they're shallow).
|
||||
- **Event** contracts: Protocol Buffers in `packages/shared-types/events/`. Generate TS and Python. All events carry an envelope: `{event_id, occurred_at, schema_version, producer, payload}`.
|
||||
- **Schema registry:** lightweight self-hosted (buf.build Schema Registry OSS or a tiny registry in `events/`). CI check blocks breaking changes without a version bump.
|
||||
- **Evolution rules:** additive only within a major version; `reserved` for removed fields; new `schema_version` for breaking changes; consumers advertise the versions they accept.
|
||||
|
||||
## Consequences
|
||||
- One extra build step in `shared-types` (buf or protoc).
|
||||
- Breaking event changes cost something — good; they should.
|
||||
- ML pipelines can replay old events against new code with confidence.
|
||||
|
||||
## Non-consequences
|
||||
- No gRPC. HTTP stays HTTP/JSON. Protobuf is only the wire format on the event bus.
|
||||
87
docs/architecture/data-model.md
Normal file
87
docs/architecture/data-model.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Data model
|
||||
|
||||
Durable entities across modules. Per-module databases/schemas own these; cross-module access is only via the module's API.
|
||||
|
||||
## Core entities
|
||||
|
||||
```
|
||||
User auth + profile
|
||||
id (uuid)
|
||||
created_at
|
||||
email (from IdP)
|
||||
preferred_name?
|
||||
deleted_at? soft-delete for 30-day recovery; hard-delete after
|
||||
|
||||
IdentityLink auth
|
||||
user_id
|
||||
provider "google" | "apple"
|
||||
provider_sub subject from IdP
|
||||
created_at
|
||||
|
||||
Session auth
|
||||
user_id
|
||||
sid (uuid) in JWT
|
||||
issued_at
|
||||
expires_at
|
||||
revoked_at?
|
||||
|
||||
Profile profile
|
||||
user_id (pk)
|
||||
timezone
|
||||
quiet_hours jsonb: [{start,end,days}]
|
||||
contexts jsonb: [{name,predicate}] introduced in Phase 2
|
||||
consents jsonb: {integration: {read,write,retain_days}}
|
||||
|
||||
Credential integrations
|
||||
user_id
|
||||
provider "todoist" | "google_calendar" | ...
|
||||
ciphertext sealed-box over {access, refresh, scopes, expires_at}
|
||||
meta provider-specific (sync_token cursor for Todoist)
|
||||
created_at
|
||||
last_refreshed_at
|
||||
revoked_at?
|
||||
|
||||
Event events
|
||||
event_id (ulid)
|
||||
user_id
|
||||
schema_version
|
||||
kind e.g. "signals.task.updated"
|
||||
occurred_at
|
||||
ingested_at
|
||||
payload protobuf bytes
|
||||
|
||||
TipInstance recommender
|
||||
tip_id (ulid)
|
||||
user_id
|
||||
policy_name "random" | "bandit.linucb" | "remote:v3"
|
||||
policy_version
|
||||
candidate_source "todoist" | "advice.library" | ...
|
||||
context_snapshot jsonb: features seen at decision time
|
||||
tip jsonb: {kind,title,body,source,deep_link,meta}
|
||||
created_at
|
||||
shown_at? set when the client reports render
|
||||
reaction? "done" | "snooze" | "dismiss" | null
|
||||
reacted_at?
|
||||
delivery_id? fk if surfaced via notifier push
|
||||
|
||||
Delivery notifier
|
||||
delivery_id
|
||||
user_id
|
||||
tip_id
|
||||
channel "webpush" | "apns" | "fcm" | "email"
|
||||
dispatched_at
|
||||
delivered_at?
|
||||
failure_reason?
|
||||
```
|
||||
|
||||
## Foreign-key discipline
|
||||
|
||||
There are no cross-module FKs. Each module owns its tables. References by id are soft; consistency is maintained by events (user-deleted → every module cascades its own cleanup).
|
||||
|
||||
## Deletion
|
||||
|
||||
`User.deleted_at` set → a `user.deletion_requested` event goes out → each module soft-deletes its rows → after 30 days a scheduled job hard-deletes. Credentials are **revoked at the provider** (not just erased locally) on soft-delete. See `privacy.md`.
|
||||
|
||||
## Replay and reproducibility
|
||||
|
||||
`TipInstance.context_snapshot` captures the exact features that produced the decision. This is what lets offline replay re-score historical tips against a new policy without touching the feature store.
|
||||
43
docs/architecture/metrics.md
Normal file
43
docs/architecture/metrics.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Metrics: measuring "magic"
|
||||
|
||||
We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against.
|
||||
|
||||
## North star
|
||||
|
||||
**Week-2 tip-reaction rate** — of users who saw a tip in week 1, what fraction reacted to *any* tip in week 2? Captures "did this become part of your life."
|
||||
|
||||
## Activation (single-session)
|
||||
|
||||
- **Time-to-first-tip** — sign-in → tip rendered. Target: ≤ 60 s on the happy path.
|
||||
- **First-tip reaction rate** — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%.
|
||||
|
||||
## Engagement
|
||||
|
||||
- **Dwell-before-action** — seconds between tip render and first reaction. Too short = glance-away; too long = confused.
|
||||
- **Done rate / (Done + Snooze + Dismiss)** — the quality proxy. Rising = tips feel on-target.
|
||||
- **Snooze:Dismiss ratio** — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes.
|
||||
- **Return cadence** — median inter-session gap. Stable-and-short > spiky.
|
||||
|
||||
## Retention
|
||||
|
||||
- D1, D7, D28 retention. Cohort-sliced by connected integrations.
|
||||
- Churn signal: 7 days without a session.
|
||||
|
||||
## ML health (from M1)
|
||||
|
||||
- Policy latency p50/p95/p99 at the recommender boundary.
|
||||
- Feature null-rate per feature, per user.
|
||||
- Online/offline reward disagreement for shadowed policies.
|
||||
- Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates.
|
||||
|
||||
## Privacy & trust
|
||||
|
||||
- Account-deletion completion time (target: < 24 h).
|
||||
- Provider-revocation success rate on disconnect.
|
||||
- Number of active credentials per user (low = healthy).
|
||||
|
||||
## How metrics become decisions
|
||||
|
||||
- **Per-change.** Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge).
|
||||
- **Shadow > A/B > launch.** Policy changes ship in shadow first (log what it *would* have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin.
|
||||
- **Dashboards before features.** If we cannot measure a feature's impact on the north-star metric, we defer the feature.
|
||||
@@ -3,22 +3,25 @@
|
||||
## Guiding constraints
|
||||
|
||||
- The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
|
||||
- Services are small and independently deployable, but we do **not** multiply services for its own sake. Split by team-of-ownership and by data lifecycle.
|
||||
- Python for ML, TypeScript for applications, shared contracts regenerated from a single source of truth.
|
||||
- Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003).
|
||||
- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
|
||||
- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`).
|
||||
|
||||
## Services
|
||||
## Modules
|
||||
|
||||
| Service | Language | Responsibility | Owns data |
|
||||
|---|---|---|---|
|
||||
| `gateway` | TS (Node) | BFF for web/mobile; auth-checking; request fan-out | — |
|
||||
| `auth` | TS | OAuth (Google, Apple), sessions, token issuance | identities, sessions |
|
||||
| `profile` | TS | user profile, preferences, consents | profiles |
|
||||
| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors |
|
||||
| `events` | TS | event-bus ingress, normalization, durable log | signal store |
|
||||
| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history |
|
||||
| `ml/serving` | Python | online scoring for policies/models | — (stateless) |
|
||||
| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models |
|
||||
| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log |
|
||||
| Module | Language | Responsibility | Owns data | Phase-0 process |
|
||||
|---|---|---|---|---|
|
||||
| `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith |
|
||||
| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
|
||||
| `profile` | TS | user profile, preferences, consents | profiles | Node monolith |
|
||||
| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
|
||||
| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
|
||||
| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
|
||||
| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
|
||||
| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** |
|
||||
| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) |
|
||||
|
||||
Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds.
|
||||
|
||||
## Data boundaries
|
||||
|
||||
@@ -36,9 +39,28 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as
|
||||
|
||||
## Why these choices
|
||||
|
||||
- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4.
|
||||
- **Postgres** everywhere for OLTP. Per-service schemas, not per-service instances in dev.
|
||||
- **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
|
||||
- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
|
||||
- **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract.
|
||||
- **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
|
||||
- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
|
||||
- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
|
||||
- **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
|
||||
- **MLflow** for model registry; artifacts in MinIO/S3.
|
||||
- **Auth.js or Ory** for identity — we will not write crypto.
|
||||
- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
|
||||
- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
|
||||
|
||||
## Decision flow for a new tip
|
||||
|
||||
```
|
||||
client ─► gateway ─► recommender
|
||||
│
|
||||
├─► candidates: integrations.fetchCandidates(user) + advice.library
|
||||
├─► context: FeatureAssembler(user, request)
|
||||
├─► policy: PolicyRegistry.get(policyName).pick(candidates, context)
|
||||
├─► shadows: run shadow policies in parallel, log their picks
|
||||
└─► persist: TipInstance{context_snapshot, policy, tip}
|
||||
◄─ tip
|
||||
```
|
||||
|
||||
Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain.
|
||||
|
||||
40
docs/architecture/privacy.md
Normal file
40
docs/architecture/privacy.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Privacy architecture
|
||||
|
||||
Privacy is a Phase 0 feature, not a Phase 5 compliance project. This doc is the minimum.
|
||||
|
||||
## Principles
|
||||
|
||||
1. **Data minimization.** Store only what we need for the tip. Raw task titles stay at Todoist; we store references + computed features. If a feature doesn't lift a metric, its input data doesn't get stored.
|
||||
2. **User-visible controls.** Every connection shows exactly which scopes we hold and what we've computed. One tap disconnects and revokes.
|
||||
3. **Deletion is real.** Deleting an account revokes provider tokens, purges credentials immediately, and soft-deletes user data for a 30-day recovery window, then hard-deletes.
|
||||
4. **No surprise sharing.** Cross-user / collaborative features are opt-in, per category, per integration.
|
||||
5. **Encryption in transit and at rest.** TLS everywhere; column-level encryption for credentials; disk-level for backups.
|
||||
|
||||
## Flows
|
||||
|
||||
### Connect
|
||||
User taps "Connect Todoist" → consent screen lists: scopes requested, what we store, what we compute, retention, revocation instructions → OAuth → stored credential is immediately testable and shows in `/connect`.
|
||||
|
||||
### Disconnect
|
||||
User taps disconnect → `Credential.revoked_at` set → provider-side revocation attempted (Todoist: token revocation endpoint) → credential erased on success → `credential.revoked` event → downstream modules drop associated cursors, caches, derived features for that `(user, provider)` pair.
|
||||
|
||||
### Delete account
|
||||
User taps "Delete account" in settings → hard confirm → `User.deleted_at` set, all sessions revoked, `user.deletion_requested` event fanned out → every module processes its portion (credentials revoked + purged; profile scrubbed; tip history anonymized to aggregate stats only or purged, per retention policy; events purged on schedule) → within 24 hours account is non-recoverable operationally; within 30 days all rows are hard-deleted.
|
||||
|
||||
### Export (Phase 2)
|
||||
`GET /me/export` returns a JSON bundle of everything we hold for the user: profile, consents, credentials-metadata (not secrets), events, tip history.
|
||||
|
||||
## Scope boundaries
|
||||
|
||||
Each integration declares the scopes it requests and the features it derives. The `Profile.consents` column is the source of truth; a scope removed from consent short-circuits derived-feature computation at the feature store.
|
||||
|
||||
## Audit
|
||||
|
||||
- Privileged actions (admin-initiated deletions, credential decryption outside the normal refresh path) go to an append-only audit log from Phase 0.
|
||||
- Per-user access log available via `GET /me/access-log` (Phase 2).
|
||||
|
||||
## Legal surface (Phase 0 minimum)
|
||||
|
||||
- Terms of Service + Privacy Policy documents shipped alongside the sign-in page.
|
||||
- Consent capture on first sign-in, with a versioned ToS/PP hash stored per user.
|
||||
- Data-subject request inbox (email) wired up before onboarding the first external user.
|
||||
@@ -1,13 +1,15 @@
|
||||
# services/
|
||||
|
||||
Backend microservices. Each directory is independently deployable, ships a `Dockerfile`, a `/health` endpoint, and its own `README.md` describing its contract.
|
||||
Backend modules. Each owns a contract and ships its own `README.md`. In **Phase 0** these are internal packages inside a single Node process (ADR-0003); they extract to their own processes as pressure justifies.
|
||||
|
||||
| Dir | Role | Phase introduced |
|
||||
|---|---|---|
|
||||
| `gateway/` | BFF for clients; auth check; fan-out to services | 0 |
|
||||
| `auth/` | OAuth (Google/Apple), sessions, JWT | 0 |
|
||||
| `profile/` | user profile, preferences, consents | 0 |
|
||||
| `integrations/` | third-party connectors + encrypted token vault (Todoist first) | 0 |
|
||||
| `recommender/` | `POST /recommend` — policy-driven tip selection | 0 |
|
||||
| `events/` | event bus ingress + durable signal store | 1 |
|
||||
| `notifier/` | push/email/web delivery with quiet-hours | 3 |
|
||||
| Dir | Role | Phase-0 shape | Extracts when |
|
||||
|---|---|---|---|
|
||||
| `gateway/` | BFF for clients; auth check; fan-out | in-proc router | never (stays as the edge) |
|
||||
| `auth/` | Google OAuth (Apple in M1), sessions, JWT | Auth.js behind OIDC shape | mobile native ships (M3) |
|
||||
| `profile/` | user profile, preferences, consents | in-proc module | team ownership diverges |
|
||||
| `integrations/` | connectors + encrypted token vault | in-proc module | credential blast-radius isolation |
|
||||
| `recommender/` | `POST /recommend` — policy-driven tip selection | in-proc; calls `ml/serving` from M1 | scaling hotspot |
|
||||
| `events/` | event bus + signal log | in-proc emitter (Phase 0); NATS (M1) | always a library + broker, not a service |
|
||||
| `notifier/` | push/email delivery + quiet hours | in-proc; **web push in M1** | SLA divergence or mobile push scale |
|
||||
|
||||
Contracts that cross module lines (HTTP or events) come from `packages/shared-types/`. In-module imports across modules are forbidden by import lint.
|
||||
|
||||
@@ -7,11 +7,14 @@ Third-party connectors and the token vault.
|
||||
```ts
|
||||
interface Connector {
|
||||
id: string // e.g. "todoist"
|
||||
scopes: string[] // human-readable list shown in consent UI
|
||||
beginOAuth(user): Promise<{ redirectUrl, state }>
|
||||
finishOAuth(code, state): Promise<StoredCredential>
|
||||
fetchSignals(user, since?): AsyncIterable<NormalizedEvent>
|
||||
// optional write-back, e.g. mark task done
|
||||
act?(user, action): Promise<void>
|
||||
// incremental-sync cursor (Todoist sync_token, webhook timestamps, etc.)
|
||||
// stored in Credential.meta; the connector owns its shape.
|
||||
act?(user, action): Promise<void> // optional write-back (complete task, etc.)
|
||||
revoke(user): Promise<void> // REQUIRED: provider-side token revocation on disconnect
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
@@ -16,12 +16,14 @@ POST /feedback
|
||||
## Internals (stable seams)
|
||||
|
||||
- **Candidate sources** — pluggable async generators. v0: Todoist tasks via `integrations`. Later: advice library, calendar nudges, health prompts.
|
||||
- **Context assembler** — merges request context with features (inline now, feature-store later).
|
||||
- **Policy** — `Policy.pick(candidates, context) → tip`. Registered by name:
|
||||
- **Feature assembler** — fills the `context` blob (inline in Phase 0; calls feature store from M1). Never inlined into policy code.
|
||||
- **Policy registry** — `Policy.pick(candidates, context) → tip`. Named entries:
|
||||
- `random` — v0 (Phase 0).
|
||||
- `bandit.linucb` — v1 (Phase 1).
|
||||
- `bandit.linucb.pooled` — v1 (Phase 1). **Global-then-personalize**: pooled features shared across users; per-user residual once data allows.
|
||||
- `remote` — delegates to `ml/serving` FastAPI scorer (Phase 1+).
|
||||
- **Shadow hook** — every request optionally runs N shadow policies in parallel and logs their picks + estimated rewards. Promotion from shadow → A/B → launch is a separate, deliberate step (ADR-0002).
|
||||
- **TipInstance persistence** — every decision writes `context_snapshot` (features seen at decision time). This is what makes offline replay honest.
|
||||
|
||||
## Phase 0 goal
|
||||
|
||||
`RandomPolicy` only. The service, contract, and seams exist; the brain does not yet.
|
||||
`RandomPolicy` only. The service, contract, registry, shadow hook, and tip-instance persistence all exist; no ML yet.
|
||||
|
||||
Reference in New Issue
Block a user