- ADR-0003: modular monolith for Phase 0 with documented extraction triggers - ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships - ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate - New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature) - Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage - Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI - PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint - License decision in README (ARR with OSS plan in Phase 5)
2.2 KiB
2.2 KiB
Metrics: measuring "magic"
We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against.
North star
Week-2 tip-reaction rate — of users who saw a tip in week 1, what fraction reacted to any tip in week 2? Captures "did this become part of your life."
Activation (single-session)
- Time-to-first-tip — sign-in → tip rendered. Target: ≤ 60 s on the happy path.
- First-tip reaction rate — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%.
Engagement
- Dwell-before-action — seconds between tip render and first reaction. Too short = glance-away; too long = confused.
- Done rate / (Done + Snooze + Dismiss) — the quality proxy. Rising = tips feel on-target.
- Snooze:Dismiss ratio — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes.
- Return cadence — median inter-session gap. Stable-and-short > spiky.
Retention
- D1, D7, D28 retention. Cohort-sliced by connected integrations.
- Churn signal: 7 days without a session.
ML health (from M1)
- Policy latency p50/p95/p99 at the recommender boundary.
- Feature null-rate per feature, per user.
- Online/offline reward disagreement for shadowed policies.
- Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates.
Privacy & trust
- Account-deletion completion time (target: < 24 h).
- Provider-revocation success rate on disconnect.
- Number of active credentials per user (low = healthy).
How metrics become decisions
- Per-change. Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge).
- Shadow > A/B > launch. Policy changes ship in shadow first (log what it would have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin.
- Dashboards before features. If we cannot measure a feature's impact on the north-star metric, we defer the feature.