Bandit v1: global-then-personalize LinUCB #57

New Issue

alvis · 2026-04-13T14:35:37Z

alvis commented

2026-04-13 14:35:37 +00:00

Replaces the single-issue 'LinUCB replacing Random'. Implement pooled LinUCB over the five v1 features first (global), then add per-user residual arms as each user accumulates ≥N reactions. Persist bandit state in Postgres (not in memory). Offline replay harness against Phase-0 TipInstance history before any online rollout.nnCloses the starvation problem we would hit with a per-user-from-day-one bandit.

Replaces the single-issue 'LinUCB replacing Random'. Implement pooled LinUCB over the five v1 features first (global), then add per-user residual arms as each user accumulates ≥N reactions. Persist bandit state in Postgres (not in memory). Offline replay harness against Phase-0 `TipInstance` history before any online rollout.nnCloses the starvation problem we would hit with a per-user-from-day-one bandit.

alvis added this to the M1 — Real signal milestone 2026-04-13 14:35:37 +00:00

alvis added the ml label 2026-04-13 14:35:37 +00:00

alvis commented

2026-04-16 15:23:03 +00:00

Superseded by ε-greedy v1 (ADR-0007). LinUCB remains available as fallback but ε-greedy won offline sim (+10.7% reward). Closing in favor of new research issue for next-gen policies.

alvis closed this issue

2026-04-16 15:23:03 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: alvis/oO#57