refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
This commit is contained in:
67
CLAUDE.md
67
CLAUDE.md
@@ -8,66 +8,73 @@ The magic is the product. Precision + timing + minimalism. The UI shows a single
|
||||
|
||||
## Prime directives
|
||||
|
||||
1. **Modular, service-oriented from day one.** Even the prototype. We will scale to mobile (iOS/Android), many integrations, multi-tenant ML. Shortcuts that bake in a monolith are not acceptable.
|
||||
2. **Recommendation engine is the core.** Every other service feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
|
||||
3. **Python owns ML.** Everything training, features, serving for models is Python (FastAPI + PyTorch/scikit + MLflow/feast). Application services are TypeScript (Node, Next.js) unless there's a reason.
|
||||
1. **Modular by package, deployable by stage.** Contracts live at package boundaries from day one so extraction to a service is cheap. Deploy topology evolves with real pressure (team size, scaling hotspots, language boundaries), not with wishful architecture. Phase 0 = **modular monolith + Python ML sidecar**. See ADR-0003.
|
||||
2. **Recommendation engine is the core.** Every other module feeds it or renders its output. Design schemas, event contracts, and APIs with that in mind.
|
||||
3. **Python owns ML.** Training, features, online scoring are Python (FastAPI + PyTorch/scikit + MLflow/Feast). Application code is TypeScript (Node, Next.js) unless there's a reason.
|
||||
4. **OAuth-first for identity and integrations.** Never ask users for passwords or raw API keys when a delegated-auth flow exists. Store provider tokens encrypted, refresh transparently.
|
||||
5. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished.
|
||||
5. **Privacy is a feature, not a phase.** Consent capture, token revocation, and account deletion exist from the first real user. Data minimization: store the token + derivatives we need, not the raw feed.
|
||||
6. **Feel-of-magic over feature count.** When in doubt, ship fewer things, polished. The tip page is a watch face.
|
||||
|
||||
## Architecture (high level)
|
||||
|
||||
The tree below is **logical module structure**. Directory layout is stable; how many processes you deploy is a stage decision (ADR-0003).
|
||||
|
||||
```
|
||||
apps/ user-facing clients
|
||||
web/ Next.js PWA — the first shipped client
|
||||
mobile-ios/ Swift/SwiftUI (Phase 3)
|
||||
mobile-android/ Kotlin/Compose (Phase 3)
|
||||
|
||||
services/ backend microservices (each independently deployable)
|
||||
gateway/ API gateway + BFF (GraphQL or tRPC)
|
||||
services/ backend modules — each owns a contract; may share a deployable
|
||||
gateway/ BFF for clients; auth check; fan-out
|
||||
auth/ OAuth (Google, Apple, ...), sessions, JWT issuance
|
||||
profile/ user profile, preferences, consents
|
||||
integrations/ third-party connectors (Todoist first); token vault
|
||||
recommender/ Python; serves the "one best tip" decision
|
||||
events/ event bus ingress (Kafka/NATS) + signal store
|
||||
notifier/ push/email/web delivery of tips
|
||||
integrations/ third-party connectors + token vault (Todoist first)
|
||||
recommender/ orchestration: candidates → policy → tip; feedback sink
|
||||
events/ event bus ingress + durable signal store
|
||||
notifier/ push/email/web delivery (web push from Phase 1)
|
||||
|
||||
packages/ shared libraries
|
||||
shared-types/ OpenAPI/proto-generated types
|
||||
packages/ shared libraries (importable across services + apps)
|
||||
shared-types/ HTTP types via OpenAPI; event types via protobuf (ADR-0005)
|
||||
sdk-js/ client SDK used by web + mobile webviews
|
||||
ui/ shared React components + design tokens
|
||||
|
||||
ml/ Python MLOps
|
||||
pipelines/ training / batch feature pipelines (Airflow/Prefect)
|
||||
features/ feature definitions (Feast-style)
|
||||
registry/ model registry (MLflow) integration
|
||||
experiments/ A/B testing framework + bandit policies
|
||||
serving/ online inference service (FastAPI)
|
||||
notebooks/ research only — not production
|
||||
ml/ Python — separate deployable from day one
|
||||
serving/ online scorer (FastAPI), called by recommender
|
||||
features/ feature definitions + store adapter
|
||||
pipelines/ batch feature + training DAGs (Prefect/Airflow)
|
||||
registry/ MLflow model registry integration
|
||||
experiments/ assignment + A/B + bandit policies
|
||||
notebooks/ research only; never imported by production code
|
||||
|
||||
infra/ docker-compose, k8s manifests, terraform, CI
|
||||
infra/ docker-compose (Phase 0), k3s/k8s (later), terraform, CI
|
||||
docs/ architecture notes, ADRs, API specs
|
||||
```
|
||||
|
||||
## Contracts between services
|
||||
**Phase 0 deployables:** one Node process (`services/*` bundled via modular monolith) + one Python process (`ml/serving`, stubbed until M1) + Postgres + NATS. Services **extract to their own process** when a real reason appears: language boundary, scaling hotspot, team ownership, or SLA divergence. See ADR-0003.
|
||||
|
||||
- **Events** (Kafka/NATS) — source of truth for user signals. All integrations emit normalized events; the recommender reads them.
|
||||
- **HTTP/gRPC** — synchronous request/response (gateway → services).
|
||||
- **Shared schemas** live in `packages/shared-types`; generated from a single OpenAPI / proto source. Do not redefine types per service.
|
||||
## Contracts between modules
|
||||
|
||||
- **HTTP** (OpenAPI, in `packages/shared-types/http/`) — synchronous request/response. In-process today; over the network once extracted. Signatures are identical.
|
||||
- **Events** (Protocol Buffers, in `packages/shared-types/events/`) — durable signals + feedback. Today: in-process event emitter. Tomorrow: NATS JetStream. Schema registry enforced in CI (ADR-0005).
|
||||
- Do not redefine types per module. Regenerate from `shared-types`.
|
||||
|
||||
## Conventions
|
||||
|
||||
- Every service ships a `README.md`, a `Dockerfile`, and a `/health` endpoint.
|
||||
- One PR = one concern. Commits follow conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
|
||||
- Each module ships a `README.md` describing its contract, its `/health` story, and its extraction criteria (when it should become its own process).
|
||||
- One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
|
||||
- ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
|
||||
- No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
|
||||
- Compose profiles (`core`, `full`) so devs can run a subset without 16 GB of RAM.
|
||||
|
||||
## Definition of done (per feature)
|
||||
|
||||
1. Code + tests merged.
|
||||
2. Service's `README.md` updated.
|
||||
2. Module's `README.md` updated.
|
||||
3. If it changes a contract → `shared-types` regenerated + consumers updated.
|
||||
4. If it changes architecture → ADR added.
|
||||
5. Deployable via `docker compose up` locally.
|
||||
6. If it touches user data → a deletion path exists and is tested.
|
||||
|
||||
## Current phase
|
||||
|
||||
@@ -75,7 +82,9 @@ docs/ architecture notes, ADRs, API specs
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- Don't copy Todoist's data into our DB. Store the OAuth token; fetch on demand.
|
||||
- Don't implement auth by hand. Use a library (NextAuth / Auth.js, Ory, or Clerk-compatible). We will self-host.
|
||||
- Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
|
||||
- Don't implement auth by hand. Phase 0 uses **Auth.js** behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
|
||||
- Don't hardwire a recommender. The "random todo" v0 must live behind the same interface the real ML model will implement (`POST /recommend` → `{tip}`). Swap internals, keep contract.
|
||||
- Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
|
||||
- Don't build an admin UI before the user-facing black page is polished.
|
||||
- Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
|
||||
|
||||
Reference in New Issue
Block a user