# Metrics: measuring "magic" We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against. ## North star **Week-2 tip-reaction rate** — of users who saw a tip in week 1, what fraction reacted to *any* tip in week 2? Captures "did this become part of your life." ## Activation (single-session) - **Time-to-first-tip** — sign-in → tip rendered. Target: ≤ 60 s on the happy path. - **First-tip reaction rate** — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%. ## Engagement - **Dwell-before-action** — seconds between tip render and first reaction. Too short = glance-away; too long = confused. - **Done rate / (Done + Snooze + Dismiss)** — the quality proxy. Rising = tips feel on-target. - **Snooze:Dismiss ratio** — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes. - **Return cadence** — median inter-session gap. Stable-and-short > spiky. ## Retention - D1, D7, D28 retention. Cohort-sliced by connected integrations. - Churn signal: 7 days without a session. ## ML health (from M1) - Policy latency p50/p95/p99 at the recommender boundary. - Feature null-rate per feature, per user. - Online/offline reward disagreement for shadowed policies. - Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates. ## Privacy & trust - Account-deletion completion time (target: < 24 h). - Provider-revocation success rate on disconnect. - Number of active credentials per user (low = healthy). ## How metrics become decisions - **Per-change.** Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge). - **Shadow > A/B > launch.** Policy changes ship in shadow first (log what it *would* have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin. - **Dashboards before features.** If we cannot measure a feature's impact on the north-star metric, we defer the feature.