ContextualBanditPolicy (LinUCB) replacing Random #25

Closed
opened 2026-04-13 14:23:20 +00:00 by alvis · 1 comment
Owner

Implement LinUCB over the v1 features. Per-user arms. Persist bandit state. Offline replay harness before enabling for real users.

Implement LinUCB over the v1 features. Per-user arms. Persist bandit state. Offline replay harness before enabling for real users.
alvis added this to the M1 — Real signal milestone 2026-04-13 14:23:20 +00:00
alvis added the ml label 2026-04-13 14:23:20 +00:00
Author
Owner

Superseded by #51 (global-then-personalize LinUCB) which addresses the per-user-reward-starvation problem, and #50 which adds shadow-deploy infra so we do not replace Random in one step. Closing.

Superseded by #51 (**global-then-personalize** LinUCB) which addresses the per-user-reward-starvation problem, and #50 which adds shadow-deploy infra so we do not replace Random in one step. Closing.
alvis closed this issue 2026-04-13 14:35:55 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#25