Files
oO/docs/architecture/metrics.md
alvis 7f173f88d3 refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers
- ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships
- ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate
- New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature)
- Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage
- Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI
- PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint
- License decision in README (ARR with OSS plan in Phase 5)
2026-04-13 14:36:11 +00:00

2.2 KiB

Metrics: measuring "magic"

We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against.

North star

Week-2 tip-reaction rate — of users who saw a tip in week 1, what fraction reacted to any tip in week 2? Captures "did this become part of your life."

Activation (single-session)

  • Time-to-first-tip — sign-in → tip rendered. Target: ≤ 60 s on the happy path.
  • First-tip reaction rate — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%.

Engagement

  • Dwell-before-action — seconds between tip render and first reaction. Too short = glance-away; too long = confused.
  • Done rate / (Done + Snooze + Dismiss) — the quality proxy. Rising = tips feel on-target.
  • Snooze:Dismiss ratio — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes.
  • Return cadence — median inter-session gap. Stable-and-short > spiky.

Retention

  • D1, D7, D28 retention. Cohort-sliced by connected integrations.
  • Churn signal: 7 days without a session.

ML health (from M1)

  • Policy latency p50/p95/p99 at the recommender boundary.
  • Feature null-rate per feature, per user.
  • Online/offline reward disagreement for shadowed policies.
  • Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates.

Privacy & trust

  • Account-deletion completion time (target: < 24 h).
  • Provider-revocation success rate on disconnect.
  • Number of active credentials per user (low = healthy).

How metrics become decisions

  • Per-change. Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge).
  • Shadow > A/B > launch. Policy changes ship in shadow first (log what it would have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin.
  • Dashboards before features. If we cannot measure a feature's impact on the north-star metric, we defer the feature.