Files
oO/docs/adr/0012-egreedy-v2-profile-features.md
alvis 37aec4fee1 chore: ADR-0007/0012 superseded status + admin users ID column
ADR-0007 and ADR-0012 both superseded by ADR-0013 as of 2026-05-01.
UsersTable gains a truncated ID column for quick user identification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:20:44 +00:00

5.9 KiB
Raw Blame History

ADR-0012 — ε-greedy v2: profile features in the bandit (D=7→12)

Status: Superseded by ADR-0013 — 2026-05-01 Date: 2026-04-25 (accepted) / 2026-04-26 (promoted)
Issue: #99

Context

ADR-0011 shipped a 5-feature user-profile registry (completion rate, dismiss rate, mean dwell, preferred hour, tip volume). POST /score and POST /score/egreedy already receive a profile_features dict on every call but ignore it — the comment in ml/serving/main.py explains why: extending the feature vector changes D, which resets every user's learned A/b matrices and discards accumulated signal. That loss requires a deliberate shadow-first rollout per ADR-0002, not an in-place update.

This ADR authorises egreedy-v2, which extends the active egreedy-v1 (D=7) with the 5 profile features (D=12) and defines how it ships safely.

Decision

New policy: egreedy-v2 (D=12)

Feature vector layout:

idx name encoding
01 hour_sin, hour_cos cyclical, current hour
2 is_overdue 0/1
3 task_age_norm age_days / 30, clipped 01
4 priority_norm (p 1) / 3
56 dow_sin, dow_cos cyclical, day of week
7 completion_rate_30d raw (already 01); null → 0
8 dismiss_rate_30d raw (already 01); null → 0
9 mean_dwell_norm dwell_ms / 600_000, clipped 01; null → 0
10 preferred_hour_alignment (cos(2π(pref now)/24) + 1) / 2; null → 0.5 (neutral)
11 tip_volume_norm log1p(n) / log1p(100), clipped 01; null → 0

Normalization rationale:

  • Rates are already in [0, 1]; no transform needed.
  • Dwell clips at 10 min — anything beyond that carries diminishing signal.
  • preferred_hour needs circular continuity; one-dimension approximation using cosine alignment with the current hour. At null (no established peak) we use 0.5 (the midpoint/neutral) rather than 0 (misleading "polar-opposite hour").
  • tip_volume uses log-scale because engagement counts are heavy-tailed.

Rollout sequence (per ADR-0002)

  1. Shadow (this ADR) — egreedy-v2-shadow registered in the recommender's shadow-policy map (disabled by default). Admin enables via /admin/policies.

    • Calls /score/egreedy/v2 fire-and-forget alongside the active egreedy-v1 call.
    • Publishes signals.tip.served with policy: shadow:egreedy-v2-shadow for logging.
    • No reward delivery to shadow — live shadow collects decision-agreement exposure only; reward measurement uses offline simulation.
    • State files: {user}_egreedy_v2.json — isolated from v1's {user}_egreedy.json.
  2. Offline sim — run runner.py --policies egreedy-v1 egreedy-v2 --n-rounds 20 using the rule judge and persona-level profile features (synthetic values in personas.py). Gate: v2 mean reward ≥ v1 mean reward.

  3. Promote — if sim gate passes, change the remotePolicy() call in recommender.ts from /score/egreedy to /score/egreedy/v2 and change reward delivery to /reward/egreedy/v2. No DB migration; old per-user v1 state files are left on disk (available for rollback; clean up after 30 days).

State-file migration

No migration of A/b matrices from v1 → v2. A D×D→D'×D' transform would require assumptions about the new dimensions that we cannot justify without data. v2 starts from the identity prior and learns from scratch in shadow/sim. The reward penalty from cold-start is the correct price for the dimension extension.

Admin control

GET /api/admin/policies surfaces egreedy-v2-shadow with active: false. Toggle via POST /api/admin/policies/egreedy-v2-shadow/toggle.

Consequences

Good:

  • Profile features (preferred hour, completion/dismiss rates, volume) allow the bandit to personalise timing recommendations beyond what the candidate-level features encode.
  • Normalization is deterministic, bounded [0, 1], and numerically stable; no scaling artefacts as the population grows.
  • Shadow-first rollout protects real users from a cold-start regression.

Trade-offs:

  • Cold-start: v2 state files begin from the identity prior. During shadow, v2 makes random-ish decisions for early users. This is expected and intentional.
  • Synthetic persona profiles in personas.py approximate real user distributions; the offline sim is evidence, not proof. The promotion gate requires the sim to run after v2 has accumulated enough behavioral data (suggest ≥100 shadow calls per policy per user before running the final sim).
  • The one-dim preferred-hour encoding loses some circular information compared to two-dim sin/cos. If preferred-hour alignment becomes a dominant signal, revisit with D=13 in a follow-up ADR.

Alternatives considered

Warm-start via projection — project v1's 7-dim theta into D=12 by padding with zeros. Rejected: zero initialization for the profile dims is equivalent, and projecting theta without the corresponding A matrix cannot be done correctly.

D=13 with two preferred-hour dims — cleaner circular encoding, but contradicts the D=12 target in the issue spec and complicates the sim comparison. Deferred.

In-place v1 promotion without shadow — violates ADR-0002.

Promotion record (2026-04-26)

Offline sim (runner.py --policies egreedy-v1 egreedy-v2 --judge rule --n-users 5 --n-rounds 20 --seed 42):

policy total reward mean reward pulls
egreedy-v1 64.20 0.6420 100
egreedy-v2 62.90 0.6290 100

Gate passed (v2 mean ≥ v1 mean). Per-persona: v2 wins deadline-driven, evening-relaxed, low-priority-first; v1 wins consistent-responder, overdue-ignorer.

Changes applied:

  • recommender.ts remotePolicy(): /score/egreedy/score/egreedy/v2
  • recommender.ts sendRewardWithRetry(): /reward/egreedy/reward/egreedy/v2, added profile_features to payload
  • Shadow entry egreedy-v2-shadow left in registry (active: false) for rollback.