feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework

- Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606) - Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward): dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3 - Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id} with d=7 feature vector (base 5 + sin/cos day-of-week encoding) - Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events - Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables - Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0 - Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls - Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture - Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns - ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 07:44:37 +00:00
parent c5ea18ec6e
commit faf44c18fc
48 changed files with 6151 additions and 40 deletions
--- a/ml/experiments/sim/task_generator.py
+++ b/ml/experiments/sim/task_generator.py
@@ -0,0 +1,62 @@
+"""Generate synthetic task pools for simulation."""
+
+from __future__ import annotations
+
+import random
+
+_TEMPLATES = [
+    "Send weekly report to team",
+    "Review pull request #{n}",
+    "Schedule meeting with {name}",
+    "Update project documentation",
+    "Fix bug in authentication module",
+    "Prepare presentation for stakeholders",
+    "Call back {name}",
+    "Submit expense report",
+    "Review quarterly goals",
+    "Clean up inbox",
+    "Follow up on proposal to {name}",
+    "Complete onboarding checklist",
+    "Write tests for feature #{n}",
+    "Deploy hotfix to production",
+    "Respond to support ticket #{n}",
+    "Draft release notes",
+    "Update dependencies",
+    "Review design mockups",
+    "Archive old tickets",
+    "Check in with {name}",
+]
+
+_NAMES = ["Alice", "Bob", "Carol", "David", "Eve", "Frank", "Grace"]
+
+
+def generate_task_pool(n: int = 10, seed: int | None = None) -> list[dict]:
+    """Return n synthetic tasks with randomly sampled features."""
+    rng = random.Random(seed)
+
+    tasks = []
+    for i in range(n):
+        priority = rng.choices([1, 2, 3, 4], weights=[0.3, 0.3, 0.25, 0.15])[0]
+        # age_days: most tasks fresh, a few stale
+        age_days = rng.choices(
+            [0.0, 0.5, 1.0, 3.0, 7.0, 14.0],
+            weights=[0.35, 0.20, 0.20, 0.12, 0.08, 0.05],
+        )[0] + rng.random() * 0.5
+        # is_overdue only meaningful when age > 0
+        is_overdue = age_days > 0.5 and rng.random() < 0.65
+
+        template = rng.choice(_TEMPLATES)
+        content = template.format(n=rng.randint(100, 999), name=rng.choice(_NAMES))
+
+        tasks.append({
+            "id": f"sim:{i}",
+            "content": content,
+            "source": "sim",
+            "features": {
+                "is_overdue": is_overdue,
+                "task_age_days": age_days if is_overdue else 0.0,
+                "priority": priority,
+            },
+        })
+
+    return tasks