refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
This commit is contained in:
87
docs/architecture/data-model.md
Normal file
87
docs/architecture/data-model.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Data model
|
||||
|
||||
Durable entities across modules. Per-module databases/schemas own these; cross-module access is only via the module's API.
|
||||
|
||||
## Core entities
|
||||
|
||||
```
|
||||
User auth + profile
|
||||
id (uuid)
|
||||
created_at
|
||||
email (from IdP)
|
||||
preferred_name?
|
||||
deleted_at? soft-delete for 30-day recovery; hard-delete after
|
||||
|
||||
IdentityLink auth
|
||||
user_id
|
||||
provider "google" | "apple"
|
||||
provider_sub subject from IdP
|
||||
created_at
|
||||
|
||||
Session auth
|
||||
user_id
|
||||
sid (uuid) in JWT
|
||||
issued_at
|
||||
expires_at
|
||||
revoked_at?
|
||||
|
||||
Profile profile
|
||||
user_id (pk)
|
||||
timezone
|
||||
quiet_hours jsonb: [{start,end,days}]
|
||||
contexts jsonb: [{name,predicate}] introduced in Phase 2
|
||||
consents jsonb: {integration: {read,write,retain_days}}
|
||||
|
||||
Credential integrations
|
||||
user_id
|
||||
provider "todoist" | "google_calendar" | ...
|
||||
ciphertext sealed-box over {access, refresh, scopes, expires_at}
|
||||
meta provider-specific (sync_token cursor for Todoist)
|
||||
created_at
|
||||
last_refreshed_at
|
||||
revoked_at?
|
||||
|
||||
Event events
|
||||
event_id (ulid)
|
||||
user_id
|
||||
schema_version
|
||||
kind e.g. "signals.task.updated"
|
||||
occurred_at
|
||||
ingested_at
|
||||
payload protobuf bytes
|
||||
|
||||
TipInstance recommender
|
||||
tip_id (ulid)
|
||||
user_id
|
||||
policy_name "random" | "bandit.linucb" | "remote:v3"
|
||||
policy_version
|
||||
candidate_source "todoist" | "advice.library" | ...
|
||||
context_snapshot jsonb: features seen at decision time
|
||||
tip jsonb: {kind,title,body,source,deep_link,meta}
|
||||
created_at
|
||||
shown_at? set when the client reports render
|
||||
reaction? "done" | "snooze" | "dismiss" | null
|
||||
reacted_at?
|
||||
delivery_id? fk if surfaced via notifier push
|
||||
|
||||
Delivery notifier
|
||||
delivery_id
|
||||
user_id
|
||||
tip_id
|
||||
channel "webpush" | "apns" | "fcm" | "email"
|
||||
dispatched_at
|
||||
delivered_at?
|
||||
failure_reason?
|
||||
```
|
||||
|
||||
## Foreign-key discipline
|
||||
|
||||
There are no cross-module FKs. Each module owns its tables. References by id are soft; consistency is maintained by events (user-deleted → every module cascades its own cleanup).
|
||||
|
||||
## Deletion
|
||||
|
||||
`User.deleted_at` set → a `user.deletion_requested` event goes out → each module soft-deletes its rows → after 30 days a scheduled job hard-deletes. Credentials are **revoked at the provider** (not just erased locally) on soft-delete. See `privacy.md`.
|
||||
|
||||
## Replay and reproducibility
|
||||
|
||||
`TipInstance.context_snapshot` captures the exact features that produced the decision. This is what lets offline replay re-score historical tips against a new policy without touching the feature store.
|
||||
43
docs/architecture/metrics.md
Normal file
43
docs/architecture/metrics.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Metrics: measuring "magic"
|
||||
|
||||
We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against.
|
||||
|
||||
## North star
|
||||
|
||||
**Week-2 tip-reaction rate** — of users who saw a tip in week 1, what fraction reacted to *any* tip in week 2? Captures "did this become part of your life."
|
||||
|
||||
## Activation (single-session)
|
||||
|
||||
- **Time-to-first-tip** — sign-in → tip rendered. Target: ≤ 60 s on the happy path.
|
||||
- **First-tip reaction rate** — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%.
|
||||
|
||||
## Engagement
|
||||
|
||||
- **Dwell-before-action** — seconds between tip render and first reaction. Too short = glance-away; too long = confused.
|
||||
- **Done rate / (Done + Snooze + Dismiss)** — the quality proxy. Rising = tips feel on-target.
|
||||
- **Snooze:Dismiss ratio** — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes.
|
||||
- **Return cadence** — median inter-session gap. Stable-and-short > spiky.
|
||||
|
||||
## Retention
|
||||
|
||||
- D1, D7, D28 retention. Cohort-sliced by connected integrations.
|
||||
- Churn signal: 7 days without a session.
|
||||
|
||||
## ML health (from M1)
|
||||
|
||||
- Policy latency p50/p95/p99 at the recommender boundary.
|
||||
- Feature null-rate per feature, per user.
|
||||
- Online/offline reward disagreement for shadowed policies.
|
||||
- Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates.
|
||||
|
||||
## Privacy & trust
|
||||
|
||||
- Account-deletion completion time (target: < 24 h).
|
||||
- Provider-revocation success rate on disconnect.
|
||||
- Number of active credentials per user (low = healthy).
|
||||
|
||||
## How metrics become decisions
|
||||
|
||||
- **Per-change.** Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge).
|
||||
- **Shadow > A/B > launch.** Policy changes ship in shadow first (log what it *would* have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin.
|
||||
- **Dashboards before features.** If we cannot measure a feature's impact on the north-star metric, we defer the feature.
|
||||
@@ -3,22 +3,25 @@
|
||||
## Guiding constraints
|
||||
|
||||
- The **recommendation decision** is the hot path. Every architectural choice should shorten the distance between a new signal and a better tip.
|
||||
- Services are small and independently deployable, but we do **not** multiply services for its own sake. Split by team-of-ownership and by data lifecycle.
|
||||
- Python for ML, TypeScript for applications, shared contracts regenerated from a single source of truth.
|
||||
- Modularity lives in **code boundaries**. Deploy topology follows pressure, not anticipation (ADR-0003).
|
||||
- Python for ML, TypeScript for applications. Shared contracts regenerated from a single source of truth: OpenAPI for HTTP, protobuf for events (ADR-0005).
|
||||
- Privacy is a Phase-0 feature, not a Phase-5 compliance project (see `privacy.md`).
|
||||
|
||||
## Services
|
||||
## Modules
|
||||
|
||||
| Service | Language | Responsibility | Owns data |
|
||||
|---|---|---|---|
|
||||
| `gateway` | TS (Node) | BFF for web/mobile; auth-checking; request fan-out | — |
|
||||
| `auth` | TS | OAuth (Google, Apple), sessions, token issuance | identities, sessions |
|
||||
| `profile` | TS | user profile, preferences, consents | profiles |
|
||||
| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors |
|
||||
| `events` | TS | event-bus ingress, normalization, durable log | signal store |
|
||||
| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history |
|
||||
| `ml/serving` | Python | online scoring for policies/models | — (stateless) |
|
||||
| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models |
|
||||
| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log |
|
||||
| Module | Language | Responsibility | Owns data | Phase-0 process |
|
||||
|---|---|---|---|---|
|
||||
| `gateway` | TS | BFF for web/mobile; auth-check; fan-out | — | Node monolith |
|
||||
| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
|
||||
| `profile` | TS | user profile, preferences, consents | profiles | Node monolith |
|
||||
| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
|
||||
| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
|
||||
| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
|
||||
| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
|
||||
| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** |
|
||||
| `ml/pipelines` | Python | batch feature + training pipelines | feature store, models | separate (from M4) |
|
||||
|
||||
Extraction from the monolith is triggered by language boundary, scaling hotspot, SLA divergence, team ownership, or regulatory isolation (ADR-0003). `ml/serving` is pre-extracted on language grounds.
|
||||
|
||||
## Data boundaries
|
||||
|
||||
@@ -36,9 +39,28 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as
|
||||
|
||||
## Why these choices
|
||||
|
||||
- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4.
|
||||
- **Postgres** everywhere for OLTP. Per-service schemas, not per-service instances in dev.
|
||||
- **Modular monolith + Python ML** in Phase 0 to ship the walking skeleton fast without foreclosing decomposition (ADR-0003).
|
||||
- **NATS JetStream** over Kafka for Phase 1: lighter, single-binary, fits the "one VM" deployment. Swap to Kafka in Phase 4 if fan-out justifies it.
|
||||
- **Postgres** for OLTP; per-module schemas in dev; separate databases once modules extract.
|
||||
- **FastAPI + Pydantic** for ML serving — fast, typed, swappable runtime (ONNX, Triton) behind it.
|
||||
- **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
|
||||
- **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
|
||||
- **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
|
||||
- **MLflow** for model registry; artifacts in MinIO/S3.
|
||||
- **Auth.js or Ory** for identity — we will not write crypto.
|
||||
- **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
|
||||
- **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
|
||||
|
||||
## Decision flow for a new tip
|
||||
|
||||
```
|
||||
client ─► gateway ─► recommender
|
||||
│
|
||||
├─► candidates: integrations.fetchCandidates(user) + advice.library
|
||||
├─► context: FeatureAssembler(user, request)
|
||||
├─► policy: PolicyRegistry.get(policyName).pick(candidates, context)
|
||||
├─► shadows: run shadow policies in parallel, log their picks
|
||||
└─► persist: TipInstance{context_snapshot, policy, tip}
|
||||
◄─ tip
|
||||
```
|
||||
|
||||
Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain.
|
||||
|
||||
40
docs/architecture/privacy.md
Normal file
40
docs/architecture/privacy.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Privacy architecture
|
||||
|
||||
Privacy is a Phase 0 feature, not a Phase 5 compliance project. This doc is the minimum.
|
||||
|
||||
## Principles
|
||||
|
||||
1. **Data minimization.** Store only what we need for the tip. Raw task titles stay at Todoist; we store references + computed features. If a feature doesn't lift a metric, its input data doesn't get stored.
|
||||
2. **User-visible controls.** Every connection shows exactly which scopes we hold and what we've computed. One tap disconnects and revokes.
|
||||
3. **Deletion is real.** Deleting an account revokes provider tokens, purges credentials immediately, and soft-deletes user data for a 30-day recovery window, then hard-deletes.
|
||||
4. **No surprise sharing.** Cross-user / collaborative features are opt-in, per category, per integration.
|
||||
5. **Encryption in transit and at rest.** TLS everywhere; column-level encryption for credentials; disk-level for backups.
|
||||
|
||||
## Flows
|
||||
|
||||
### Connect
|
||||
User taps "Connect Todoist" → consent screen lists: scopes requested, what we store, what we compute, retention, revocation instructions → OAuth → stored credential is immediately testable and shows in `/connect`.
|
||||
|
||||
### Disconnect
|
||||
User taps disconnect → `Credential.revoked_at` set → provider-side revocation attempted (Todoist: token revocation endpoint) → credential erased on success → `credential.revoked` event → downstream modules drop associated cursors, caches, derived features for that `(user, provider)` pair.
|
||||
|
||||
### Delete account
|
||||
User taps "Delete account" in settings → hard confirm → `User.deleted_at` set, all sessions revoked, `user.deletion_requested` event fanned out → every module processes its portion (credentials revoked + purged; profile scrubbed; tip history anonymized to aggregate stats only or purged, per retention policy; events purged on schedule) → within 24 hours account is non-recoverable operationally; within 30 days all rows are hard-deleted.
|
||||
|
||||
### Export (Phase 2)
|
||||
`GET /me/export` returns a JSON bundle of everything we hold for the user: profile, consents, credentials-metadata (not secrets), events, tip history.
|
||||
|
||||
## Scope boundaries
|
||||
|
||||
Each integration declares the scopes it requests and the features it derives. The `Profile.consents` column is the source of truth; a scope removed from consent short-circuits derived-feature computation at the feature store.
|
||||
|
||||
## Audit
|
||||
|
||||
- Privileged actions (admin-initiated deletions, credential decryption outside the normal refresh path) go to an append-only audit log from Phase 0.
|
||||
- Per-user access log available via `GET /me/access-log` (Phase 2).
|
||||
|
||||
## Legal surface (Phase 0 minimum)
|
||||
|
||||
- Terms of Service + Privacy Policy documents shipped alongside the sign-in page.
|
||||
- Consent capture on first sign-in, with a versioned ToS/PP hash stored per user.
|
||||
- Data-subject request inbox (email) wired up before onboarding the first external user.
|
||||
Reference in New Issue
Block a user