feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework

- Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy
  replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606)
- Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward):
  dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3
- Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id}
  with d=7 feature vector (base 5 + sin/cos day-of-week encoding)
- Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges,
  two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events
- Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables
- Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0
- Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls
- Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture
- Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns
- ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-16 07:44:37 +00:00
parent c5ea18ec6e
commit faf44c18fc
48 changed files with 6151 additions and 40 deletions

View File

@@ -0,0 +1,62 @@
"""Generate synthetic task pools for simulation."""
from __future__ import annotations
import random
_TEMPLATES = [
"Send weekly report to team",
"Review pull request #{n}",
"Schedule meeting with {name}",
"Update project documentation",
"Fix bug in authentication module",
"Prepare presentation for stakeholders",
"Call back {name}",
"Submit expense report",
"Review quarterly goals",
"Clean up inbox",
"Follow up on proposal to {name}",
"Complete onboarding checklist",
"Write tests for feature #{n}",
"Deploy hotfix to production",
"Respond to support ticket #{n}",
"Draft release notes",
"Update dependencies",
"Review design mockups",
"Archive old tickets",
"Check in with {name}",
]
_NAMES = ["Alice", "Bob", "Carol", "David", "Eve", "Frank", "Grace"]
def generate_task_pool(n: int = 10, seed: int | None = None) -> list[dict]:
"""Return n synthetic tasks with randomly sampled features."""
rng = random.Random(seed)
tasks = []
for i in range(n):
priority = rng.choices([1, 2, 3, 4], weights=[0.3, 0.3, 0.25, 0.15])[0]
# age_days: most tasks fresh, a few stale
age_days = rng.choices(
[0.0, 0.5, 1.0, 3.0, 7.0, 14.0],
weights=[0.35, 0.20, 0.20, 0.12, 0.08, 0.05],
)[0] + rng.random() * 0.5
# is_overdue only meaningful when age > 0
is_overdue = age_days > 0.5 and rng.random() < 0.65
template = rng.choice(_TEMPLATES)
content = template.format(n=rng.randint(100, 999), name=rng.choice(_NAMES))
tasks.append({
"id": f"sim:{i}",
"content": content,
"source": "sim",
"features": {
"is_overdue": is_overdue,
"task_age_days": age_days if is_overdue else 0.0,
"priority": priority,
},
})
return tasks