ContextualBanditPolicy (LinUCB) replacing Random #25

New Issue

alvis · 2026-04-13T14:23:20Z

alvis commented

2026-04-13 14:23:20 +00:00

Implement LinUCB over the v1 features. Per-user arms. Persist bandit state. Offline replay harness before enabling for real users.

alvis added this to the M1 — Real signal milestone 2026-04-13 14:23:20 +00:00

alvis added the ml label 2026-04-13 14:23:20 +00:00

alvis commented

2026-04-13 14:35:55 +00:00

Superseded by #51 (global-then-personalize LinUCB) which addresses the per-user-reward-starvation problem, and #50 which adds shadow-deploy infra so we do not replace Random in one step. Closing.

Superseded by #51 (**global-then-personalize** LinUCB) which addresses the per-user-reward-starvation problem, and #50 which adds shadow-deploy infra so we do not replace Random in one step. Closing.

alvis closed this issue

2026-04-13 14:35:55 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: alvis/oO#25