# Metrics: measuring "magic"

We cannot build a product whose core promise is "feels like magic" without proxies for it. These are the metrics every change is measured against.

## North star

**Week-2 tip-reaction rate** — of users who saw a tip in week 1, what fraction reacted to *any* tip in week 2? Captures "did this become part of your life."

## Activation (single-session)

- **Time-to-first-tip** — sign-in → tip rendered. Target: ≤ 60 s on the happy path.
- **First-tip reaction rate** — fraction of users who interact (done/snooze/dismiss/save) with their very first tip. Target: > 50%.

## Engagement

- **Dwell-before-action** — seconds between tip render and first reaction. Too short = glance-away; too long = confused.
- **Done rate / (Done + Snooze + Dismiss)** — the quality proxy. Rising = tips feel on-target.
- **Snooze:Dismiss ratio** — high snooze = "good tip, wrong moment" (timing problem). High dismiss = "wrong tip entirely" (relevance problem). These point at different fixes.
- **Return cadence** — median inter-session gap. Stable-and-short > spiky.

## Retention

- D1, D7, D28 retention. Cohort-sliced by connected integrations.
- Churn signal: 7 days without a session.

## ML health (from M1)

- Policy latency p50/p95/p99 at the recommender boundary.
- Feature null-rate per feature, per user.
- Online/offline reward disagreement for shadowed policies.
- Bandit regret proxy: observed reward vs an oracle's best-possible on the same candidates.

## Privacy & trust

- Account-deletion completion time (target: < 24 h).
- Provider-revocation success rate on disconnect.
- Number of active credentials per user (low = healthy).

## How metrics become decisions

- **Per-change.** Any policy or UX change declares which metric it expects to move and by how much. Missing the target triggers a review, not an automatic rollback (humans judge).
- **Shadow > A/B > launch.** Policy changes ship in shadow first (log what it *would* have recommended); then A/B on live traffic; then launch once online reward estimate ≥ incumbent by a CI margin.
- **Dashboards before features.** If we cannot measure a feature's impact on the north-star metric, we defer the feature.