feat: bandit consumes profile features (egreedy-v2, D=12) #99

Closed
opened 2026-04-25 00:40:45 +00:00 by alvis · 1 comment
Owner

Split out from #81 phase B.3.

Goal

Extend the bandit feature vector to include the user-profile features shipped in #81 phase A (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d). Today the bandit ignores profile_features even though the recommender ships them on every /score call.

Why this is its own issue

Changing D=7 → D=12 resets every user's learned A/b matrices. Per ADR-0002 a new policy must ship as a shadow first and only promote after offline + online agreement with the incumbent. That's a multi-step process, not an incremental fix.

Tasks

  • Add egreedy-v2 policy in ml/serving with D=12 and a separate state-file path so it can run alongside egreedy-v1
  • Decide profile-feature normalization (rates already 0–1; preferred_hour cyclical; tip_volume needs log/clip)
  • Wire egreedy-v2 as a shadow policy via the existing shadow-policy registry (#56 / recommender.ts)
  • Run offline sim comparing v1 vs v2 (per ADR-0007's pattern)
  • If v2 wins, promote per ADR-0002 (shadow → active policy switch)
  • ADR-0012 recording the dimension change + migration approach

Pre-requisites

  • Enough behavioral data for profile features to carry signal (currently most stored values are zero/null; the events shipped in #81 phase B.2 mean fresh users will accumulate quickly).
  • Offline simulation framework operational (already shipped — see ADR-0007).

Out of scope

Adding new profile features beyond the existing 5; that goes in the registry without a policy change.

Split out from #81 phase B.3. ## Goal Extend the bandit feature vector to include the user-profile features shipped in #81 phase A (`completion_rate_30d`, `dismiss_rate_30d`, `mean_dwell_ms_30d`, `preferred_hour`, `tip_volume_30d`). Today the bandit ignores `profile_features` even though the recommender ships them on every `/score` call. ## Why this is its own issue Changing `D=7 → D=12` resets every user's learned `A`/`b` matrices. Per ADR-0002 a new policy must ship as a shadow first and only promote after offline + online agreement with the incumbent. That's a multi-step process, not an incremental fix. ## Tasks - [ ] Add `egreedy-v2` policy in `ml/serving` with `D=12` and a separate state-file path so it can run alongside `egreedy-v1` - [ ] Decide profile-feature normalization (rates already 0–1; preferred_hour cyclical; tip_volume needs log/clip) - [ ] Wire `egreedy-v2` as a shadow policy via the existing shadow-policy registry (#56 / `recommender.ts`) - [ ] Run offline sim comparing v1 vs v2 (per ADR-0007's pattern) - [ ] If v2 wins, promote per ADR-0002 (shadow → active policy switch) - [ ] ADR-0012 recording the dimension change + migration approach ## Pre-requisites - Enough behavioral data for profile features to carry signal (currently most stored values are zero/null; the events shipped in #81 phase B.2 mean fresh users will accumulate quickly). - Offline simulation framework operational (already shipped — see ADR-0007). ## Out of scope Adding new profile features beyond the existing 5; that goes in the registry without a policy change.
Author
Owner

Scaffolding shipped in 2d7cf21.

Done:

  • egreedy-v2 endpoints in ml/serving (D=12) with normalization helpers
  • ADR-0012
  • egreedy-v2-shadow registered in recommender.ts (disabled by default)
  • Sim runner + personas carry synthetic profile_features
  • 18 new Python tests; all 56 Python + 170 TS green

Remaining (needs shadow data first):

  • Run offline sim egreedy-v1 vs egreedy-v2
  • Promote if sim wins per ADR-0002
Scaffolding shipped in 2d7cf21. Done: - egreedy-v2 endpoints in ml/serving (D=12) with normalization helpers - ADR-0012 - egreedy-v2-shadow registered in recommender.ts (disabled by default) - Sim runner + personas carry synthetic profile_features - 18 new Python tests; all 56 Python + 170 TS green Remaining (needs shadow data first): - Run offline sim egreedy-v1 vs egreedy-v2 - Promote if sim wins per ADR-0002
alvis closed this issue 2026-04-26 03:09:33 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#99