feat: bandit consumes profile features (egreedy-v2, D=12) #99

New Issue

alvis · 2026-04-25T00:40:45Z

alvis commented

2026-04-25 00:40:45 +00:00

Split out from #81 phase B.3.

Goal

Extend the bandit feature vector to include the user-profile features shipped in #81 phase A (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d). Today the bandit ignores profile_features even though the recommender ships them on every /score call.

Why this is its own issue

Changing D=7 → D=12 resets every user's learned A/b matrices. Per ADR-0002 a new policy must ship as a shadow first and only promote after offline + online agreement with the incumbent. That's a multi-step process, not an incremental fix.

Tasks

Add egreedy-v2 policy in ml/serving with D=12 and a separate state-file path so it can run alongside egreedy-v1
Decide profile-feature normalization (rates already 0–1; preferred_hour cyclical; tip_volume needs log/clip)
Wire egreedy-v2 as a shadow policy via the existing shadow-policy registry (#56 / recommender.ts)
Run offline sim comparing v1 vs v2 (per ADR-0007's pattern)
If v2 wins, promote per ADR-0002 (shadow → active policy switch)
ADR-0012 recording the dimension change + migration approach

Pre-requisites

Enough behavioral data for profile features to carry signal (currently most stored values are zero/null; the events shipped in #81 phase B.2 mean fresh users will accumulate quickly).
Offline simulation framework operational (already shipped — see ADR-0007).

Out of scope

Adding new profile features beyond the existing 5; that goes in the registry without a policy change.

Split out from #81 phase B.3. ## Goal Extend the bandit feature vector to include the user-profile features shipped in #81 phase A (`completion_rate_30d`, `dismiss_rate_30d`, `mean_dwell_ms_30d`, `preferred_hour`, `tip_volume_30d`). Today the bandit ignores `profile_features` even though the recommender ships them on every `/score` call. ## Why this is its own issue Changing `D=7 → D=12` resets every user's learned `A`/`b` matrices. Per ADR-0002 a new policy must ship as a shadow first and only promote after offline + online agreement with the incumbent. That's a multi-step process, not an incremental fix. ## Tasks - [ ] Add `egreedy-v2` policy in `ml/serving` with `D=12` and a separate state-file path so it can run alongside `egreedy-v1` - [ ] Decide profile-feature normalization (rates already 0–1; preferred_hour cyclical; tip_volume needs log/clip) - [ ] Wire `egreedy-v2` as a shadow policy via the existing shadow-policy registry (#56 / `recommender.ts`) - [ ] Run offline sim comparing v1 vs v2 (per ADR-0007's pattern) - [ ] If v2 wins, promote per ADR-0002 (shadow → active policy switch) - [ ] ADR-0012 recording the dimension change + migration approach ## Pre-requisites - Enough behavioral data for profile features to carry signal (currently most stored values are zero/null; the events shipped in #81 phase B.2 mean fresh users will accumulate quickly). - Offline simulation framework operational (already shipped — see ADR-0007). ## Out of scope Adding new profile features beyond the existing 5; that goes in the registry without a policy change.

alvis referenced this issue

2026-04-25 00:41:09 +00:00

feat: feature registry + user profile builder #81

alvis referenced this issue from a commit

2026-04-25 00:41:21 +00:00

docs(adr-0011): point B.3 at new issue #99

alvis referenced this issue from a commit

2026-04-25 10:00:43 +00:00

feat(ml): egreedy-v2 shadow policy — D=12 with profile features (#99)

alvis commented

2026-04-25 10:01:23 +00:00

Scaffolding shipped in 2d7cf21.

Done:

egreedy-v2 endpoints in ml/serving (D=12) with normalization helpers
ADR-0012
egreedy-v2-shadow registered in recommender.ts (disabled by default)
Sim runner + personas carry synthetic profile_features
18 new Python tests; all 56 Python + 170 TS green

Remaining (needs shadow data first):

Run offline sim egreedy-v1 vs egreedy-v2
Promote if sim wins per ADR-0002

Scaffolding shipped in 2d7cf21. Done: - egreedy-v2 endpoints in ml/serving (D=12) with normalization helpers - ADR-0012 - egreedy-v2-shadow registered in recommender.ts (disabled by default) - Sim runner + personas carry synthetic profile_features - 18 new Python tests; all 56 Python + 170 TS green Remaining (needs shadow data first): - Run offline sim egreedy-v1 vs egreedy-v2 - Promote if sim wins per ADR-0002

alvis closed this issue

2026-04-26 03:09:33 +00:00

alvis referenced this issue from a commit

2026-04-26 12:08:52 +00:00

docs(ml): serving README + update ml/README and CLAUDE.md for #98

alvis referenced this issue from a commit

2026-04-26 12:08:52 +00:00

feat(bandit): promote egreedy-v2 (D=12, profile features) as active policy (#99)

alvis referenced this issue from a commit

2026-04-26 12:08:52 +00:00

docs(observability): add services/api README; update ml/serving + recommender docs (#18)

Sign in to join this conversation.