Bandit v1: global-then-personalize LinUCB #57

Closed
opened 2026-04-13 14:35:37 +00:00 by alvis · 1 comment
Owner

Replaces the single-issue 'LinUCB replacing Random'. Implement pooled LinUCB over the five v1 features first (global), then add per-user residual arms as each user accumulates ≥N reactions. Persist bandit state in Postgres (not in memory). Offline replay harness against Phase-0 TipInstance history before any online rollout.nnCloses the starvation problem we would hit with a per-user-from-day-one bandit.

Replaces the single-issue 'LinUCB replacing Random'. Implement pooled LinUCB over the five v1 features first (global), then add per-user residual arms as each user accumulates ≥N reactions. Persist bandit state in Postgres (not in memory). Offline replay harness against Phase-0 `TipInstance` history before any online rollout.nnCloses the starvation problem we would hit with a per-user-from-day-one bandit.
alvis added this to the M1 — Real signal milestone 2026-04-13 14:35:37 +00:00
alvis added the ml label 2026-04-13 14:35:37 +00:00
Author
Owner

Superseded by ε-greedy v1 (ADR-0007). LinUCB remains available as fallback but ε-greedy won offline sim (+10.7% reward). Closing in favor of new research issue for next-gen policies.

Superseded by ε-greedy v1 (ADR-0007). LinUCB remains available as fallback but ε-greedy won offline sim (+10.7% reward). Closing in favor of new research issue for next-gen policies.
alvis closed this issue 2026-04-16 15:23:03 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#57