feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework
- Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy
replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606)
- Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward):
dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3
- Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id}
with d=7 feature vector (base 5 + sin/cos day-of-week encoding)
- Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges,
two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events
- Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables
- Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0
- Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls
- Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture
- Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns
- ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -76,7 +76,7 @@ Goal: a single user signs in with Google, connects Todoist, and sees one random
|
|||||||
- [x] `integrations/todoist` — OAuth2 flow, token stored in DB, disconnect supported
|
- [x] `integrations/todoist` — OAuth2 flow, token stored in DB, disconnect supported
|
||||||
- [x] `recommender` with `RandomPolicy`; stable `POST /recommend` contract; 30s task cache
|
- [x] `recommender` with `RandomPolicy`; stable `POST /recommend` contract; 30s task cache
|
||||||
- [x] `apps/web` — sign-in, connect, tip pages; PWA manifest + icons
|
- [x] `apps/web` — sign-in, connect, tip pages; PWA manifest + icons
|
||||||
- [x] Feedback endpoint (done/dismiss/snooze); marks task complete in Todoist
|
- [x] Feedback: `done / snooze / dismiss`; reward inferred from dwell-time (`inferReward`); marks task complete in Todoist
|
||||||
- [x] Deploy modular monolith to Agap VM via Caddy at `o.alogins.net`
|
- [x] Deploy modular monolith to Agap VM via Caddy at `o.alogins.net`
|
||||||
- [x] ToS + Privacy Policy pages (`/legal/terms`, `/legal/privacy`); implicit consent on sign-in
|
- [x] ToS + Privacy Policy pages (`/legal/terms`, `/legal/privacy`); implicit consent on sign-in
|
||||||
- [x] Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
|
- [x] Account deletion: revokes tokens, purges data, soft-deletes profile; button on /connect
|
||||||
@@ -87,10 +87,11 @@ Goal: tips are picked, not drawn from a hat — and they arrive at the right mom
|
|||||||
- [x] Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
|
- [x] Event bus scaffold: typed in-process EventEmitter with 500-event ring buffer; subjects match future NATS JetStream — swap is mechanical
|
||||||
- [x] Todoist sync emits `signals.task.synced`; tip served/feedback emit `signals.tip.*`
|
- [x] Todoist sync emits `signals.task.synced`; tip served/feedback emit `signals.tip.*`
|
||||||
- [x] Features extracted per task: `is_overdue`, `task_age_days`, `priority`; context: `hour_of_day`, `day_of_week`
|
- [x] Features extracted per task: `is_overdue`, `task_age_days`, `priority`; context: `hour_of_day`, `day_of_week`
|
||||||
- [x] `ml/serving` LinUCB bandit (d=5, alpha=1.0); per-user state persisted to disk; `/score` + `/reward` + `/reset` + `/stats` + `/features` endpoints
|
- [x] `ml/serving` LinUCB (d=5) + **ε-greedy v1** (d=7, ε=0.10, day-of-week sin/cos features); per-user state persisted to disk
|
||||||
- [x] `RemotePolicy` in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to `tip_scores`
|
- [x] `RemotePolicy` in recommender: calls ml/serving, falls back to RandomPolicy on timeout/error; logs explainability to `tip_scores`
|
||||||
- [x] Feedback loop: reactions mapped to rewards (done=+1, helpful=+0.5, snooze=0, not_helpful=−0.5, dismiss=−1) → online LinUCB update
|
- [x] Feedback loop: dwell-time inferred reward (`inferReward`) → online model update; `done` in 15 s–2 min = +1.0 (magic zone)
|
||||||
- [x] In-app **helpful / not helpful** coarse signal (#62) — long-press action sheet on tip page
|
- [x] Offline simulation framework (`ml/experiments/sim`): rule/LLM/claude-code judges, two-policy comparison, results persisted to `sim_runs` + `sim_events`
|
||||||
|
- [x] **ε-greedy v1 promoted to active policy** (ADR-0007) — +10.7% mean reward vs LinUCB in offline sim
|
||||||
- [x] **Web Push** (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
|
- [x] **Web Push** (VAPID): SW, subscribe/unsubscribe API, "notify me" button on tip page
|
||||||
- [x] Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
|
- [x] Shadow-policy registry: run N shadow policies per request, log picks without serving them (#56)
|
||||||
- [ ] Quiet-hours + dedupe for push delivery
|
- [ ] Quiet-hours + dedupe for push delivery
|
||||||
|
|||||||
37
apps/admin/README.md
Normal file
37
apps/admin/README.md
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
# apps/admin — oO Admin Console
|
||||||
|
|
||||||
|
Next.js 15 app. Deployed at `admin.o.alogins.net` (dev: `http://localhost:3080`).
|
||||||
|
|
||||||
|
## Contract
|
||||||
|
|
||||||
|
- All routes are admin-only. The Next.js middleware calls `GET /api/user/me` on every request
|
||||||
|
and checks `role === 'admin'`. First admin is seeded via `ADMIN_SEED_EMAIL` env var at API startup.
|
||||||
|
- Admin write actions are appended to the `admin_actions` audit log in the DB.
|
||||||
|
|
||||||
|
## Pages
|
||||||
|
|
||||||
|
| Route | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `/` | Overview: DAU/WAU KPI cards, tips served, reaction breakdown, activation funnel |
|
||||||
|
| `/users` | User list (paginated) |
|
||||||
|
| `/users/:id` | User detail: identity, consents, integrations, tip stats, reward history; revoke-integration + reset-bandit actions |
|
||||||
|
| `/audit` | Admin action audit log |
|
||||||
|
| `/events` | Event stream viewer (stub — pending API history endpoint) |
|
||||||
|
|
||||||
|
## Dev
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pnpm --filter @oo/admin dev # starts on :3080
|
||||||
|
# also run the API: pnpm --filter @oo/api dev (port 3078)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Extraction criteria
|
||||||
|
|
||||||
|
Stays as a Next.js app in the monorepo permanently — it's not a candidate for extraction.
|
||||||
|
It gets richer (more pages, embedded MLflow/Grafana) but not split.
|
||||||
|
|
||||||
|
## Known issues
|
||||||
|
|
||||||
|
- `@tremor/react 3.x` declares a peer dep on React 18; the workspace uses React 19.
|
||||||
|
Works in practice. Will resolve naturally when Tremor ships React 19 support or when
|
||||||
|
we switch to Tremor v4 (which targets React 18+).
|
||||||
14
apps/admin/next.config.ts
Normal file
14
apps/admin/next.config.ts
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
import type { NextConfig } from 'next';
|
||||||
|
|
||||||
|
const nextConfig: NextConfig = {
|
||||||
|
async rewrites() {
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
source: '/api/:path*',
|
||||||
|
destination: `${process.env.NEXT_PUBLIC_API_URL ?? 'http://localhost:3078'}/api/:path*`,
|
||||||
|
},
|
||||||
|
];
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
export default nextConfig;
|
||||||
32
apps/admin/package.json
Normal file
32
apps/admin/package.json
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"name": "@oo/admin",
|
||||||
|
"version": "0.0.0",
|
||||||
|
"private": true,
|
||||||
|
"scripts": {
|
||||||
|
"dev": "next dev -p 3080",
|
||||||
|
"build": "next build",
|
||||||
|
"start": "next start -p 3080",
|
||||||
|
"lint": "next lint",
|
||||||
|
"type-check": "tsc --noEmit",
|
||||||
|
"clean": "rm -rf .next"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"@oo/shared-types": "workspace:*",
|
||||||
|
"@tremor/react": "^3.18.3",
|
||||||
|
"@tanstack/react-table": "^8.20.5",
|
||||||
|
"next": "^15.1.6",
|
||||||
|
"react": "^19.0.0",
|
||||||
|
"react-dom": "^19.0.0",
|
||||||
|
"recharts": "^2.15.3",
|
||||||
|
"marked": "^14.1.4"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@types/node": "^22.10.5",
|
||||||
|
"@types/react": "^19.0.0",
|
||||||
|
"@types/react-dom": "^19.0.0",
|
||||||
|
"tailwindcss": "^3.4.17",
|
||||||
|
"autoprefixer": "^10.4.20",
|
||||||
|
"postcss": "^8.5.1",
|
||||||
|
"typescript": "^5.7.3"
|
||||||
|
}
|
||||||
|
}
|
||||||
6
apps/admin/postcss.config.js
Normal file
6
apps/admin/postcss.config.js
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
module.exports = {
|
||||||
|
plugins: {
|
||||||
|
tailwindcss: {},
|
||||||
|
autoprefixer: {},
|
||||||
|
},
|
||||||
|
};
|
||||||
499
apps/admin/src/app/simulations/page.tsx
Normal file
499
apps/admin/src/app/simulations/page.tsx
Normal file
@@ -0,0 +1,499 @@
|
|||||||
|
'use client';
|
||||||
|
|
||||||
|
import { useEffect, useRef, useState } from 'react';
|
||||||
|
import { AdminShell } from '@/components/AdminShell';
|
||||||
|
import {
|
||||||
|
type PolicySummary,
|
||||||
|
type SimEvent,
|
||||||
|
type SimRun,
|
||||||
|
getSimRun,
|
||||||
|
getSimRuns,
|
||||||
|
startSimulation,
|
||||||
|
} from '@/lib/api';
|
||||||
|
|
||||||
|
const KNOWN_POLICIES = ['linucb-v1', 'egreedy-v1'];
|
||||||
|
const ACTIONS = ['done', 'snooze', 'dismiss'];
|
||||||
|
// Shown as reference only — actual reward is dwell-time inferred for 'done'
|
||||||
|
const ACTION_REWARDS: Record<string, number> = {
|
||||||
|
done: 1.0, snooze: 0.1, dismiss: -1.0,
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── SVG reward curve ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function RewardCurve({ summary, policies }: { summary: Record<string, PolicySummary>; policies: string[] }) {
|
||||||
|
const W = 520, H = 160, PAD = { t: 10, r: 10, b: 30, l: 40 };
|
||||||
|
const iW = W - PAD.l - PAD.r;
|
||||||
|
const iH = H - PAD.t - PAD.b;
|
||||||
|
|
||||||
|
const allVals = policies.flatMap((p) => summary[p]?.cumulative_rewards ?? []);
|
||||||
|
const minY = Math.min(0, ...allVals);
|
||||||
|
const maxY = Math.max(1, ...allVals);
|
||||||
|
const n = Math.max(...policies.map((p) => (summary[p]?.cumulative_rewards ?? []).length));
|
||||||
|
|
||||||
|
const xScale = (i: number) => PAD.l + (i / Math.max(1, n - 1)) * iW;
|
||||||
|
const yScale = (v: number) => PAD.t + iH - ((v - minY) / (maxY - minY)) * iH;
|
||||||
|
|
||||||
|
const COLORS = ['#818cf8', '#34d399', '#f87171', '#fbbf24'];
|
||||||
|
|
||||||
|
const path = (vals: number[]) =>
|
||||||
|
vals
|
||||||
|
.map((v, i) => `${i === 0 ? 'M' : 'L'}${xScale(i).toFixed(1)},${yScale(v).toFixed(1)}`)
|
||||||
|
.join(' ');
|
||||||
|
|
||||||
|
// Axis labels
|
||||||
|
const yLabels = [minY, (minY + maxY) / 2, maxY];
|
||||||
|
|
||||||
|
return (
|
||||||
|
<svg width={W} height={H} className="overflow-visible">
|
||||||
|
{/* Grid */}
|
||||||
|
{yLabels.map((v, i) => (
|
||||||
|
<g key={i}>
|
||||||
|
<line
|
||||||
|
x1={PAD.l} y1={yScale(v)} x2={W - PAD.r} y2={yScale(v)}
|
||||||
|
stroke="#374151" strokeWidth={0.5} strokeDasharray="3,3"
|
||||||
|
/>
|
||||||
|
<text x={PAD.l - 4} y={yScale(v) + 4} textAnchor="end" fontSize={10} fill="#9ca3af">
|
||||||
|
{v.toFixed(1)}
|
||||||
|
</text>
|
||||||
|
</g>
|
||||||
|
))}
|
||||||
|
{/* Zero line */}
|
||||||
|
{minY < 0 && (
|
||||||
|
<line x1={PAD.l} y1={yScale(0)} x2={W - PAD.r} y2={yScale(0)}
|
||||||
|
stroke="#6b7280" strokeWidth={1} />
|
||||||
|
)}
|
||||||
|
{/* Curves */}
|
||||||
|
{policies.map((p, pi) => {
|
||||||
|
const vals = summary[p]?.cumulative_rewards ?? [];
|
||||||
|
if (!vals.length) return null;
|
||||||
|
return (
|
||||||
|
<g key={p}>
|
||||||
|
<path d={path(vals)} fill="none" stroke={COLORS[pi % COLORS.length]} strokeWidth={2} />
|
||||||
|
<circle
|
||||||
|
cx={xScale(vals.length - 1)} cy={yScale(vals[vals.length - 1])}
|
||||||
|
r={3} fill={COLORS[pi % COLORS.length]}
|
||||||
|
/>
|
||||||
|
</g>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
{/* X axis */}
|
||||||
|
<line x1={PAD.l} y1={H - PAD.b} x2={W - PAD.r} y2={H - PAD.b} stroke="#4b5563" />
|
||||||
|
<text x={W / 2} y={H - 2} textAnchor="middle" fontSize={10} fill="#6b7280">Round</text>
|
||||||
|
{/* Legend */}
|
||||||
|
{policies.map((p, pi) => (
|
||||||
|
<g key={p} transform={`translate(${PAD.l + pi * 130},${H - PAD.b + 14})`}>
|
||||||
|
<rect width={12} height={3} y={3} fill={COLORS[pi % COLORS.length]} />
|
||||||
|
<text x={16} y={8} fontSize={10} fill="#d1d5db">{p}</text>
|
||||||
|
</g>
|
||||||
|
))}
|
||||||
|
</svg>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Action distribution table ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
function ActionTable({
|
||||||
|
summary,
|
||||||
|
policies,
|
||||||
|
}: {
|
||||||
|
summary: Record<string, PolicySummary>;
|
||||||
|
policies: string[];
|
||||||
|
}) {
|
||||||
|
return (
|
||||||
|
<table className="text-sm w-full">
|
||||||
|
<thead>
|
||||||
|
<tr className="text-left text-gray-500 border-b border-gray-800">
|
||||||
|
<th className="py-1 pr-4 font-medium">Action</th>
|
||||||
|
{policies.map((p) => (
|
||||||
|
<th key={p} className="py-1 pr-4 font-medium">{p}</th>
|
||||||
|
))}
|
||||||
|
<th className="py-1 font-medium text-gray-400">Reward</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{ACTIONS.map((action) => (
|
||||||
|
<tr key={action} className="border-b border-gray-900">
|
||||||
|
<td className="py-1.5 pr-4 text-gray-300">{action}</td>
|
||||||
|
{policies.map((p) => {
|
||||||
|
const n = summary[p]?.action_counts?.[action] ?? 0;
|
||||||
|
const total = Object.values(summary[p]?.action_counts ?? {}).reduce(
|
||||||
|
(a, b) => a + b, 0
|
||||||
|
);
|
||||||
|
const pct = total > 0 ? ((n / total) * 100).toFixed(1) : '—';
|
||||||
|
return (
|
||||||
|
<td key={p} className="py-1.5 pr-4 text-gray-200">
|
||||||
|
{n} <span className="text-gray-500 text-xs">({pct}%)</span>
|
||||||
|
</td>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
<td className={`py-1.5 text-xs font-mono ${ACTION_REWARDS[action] > 0 ? 'text-green-400' : ACTION_REWARDS[action] < 0 ? 'text-red-400' : 'text-gray-500'}`}>
|
||||||
|
{ACTION_REWARDS[action] >= 0 ? '+' : ''}{ACTION_REWARDS[action]}
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
))}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Per-persona breakdown ───────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function PersonaTable({
|
||||||
|
breakdown,
|
||||||
|
policies,
|
||||||
|
}: {
|
||||||
|
breakdown: Record<string, Record<string, { reward: number; n: number }>>;
|
||||||
|
policies: string[];
|
||||||
|
}) {
|
||||||
|
const personas = Object.keys(breakdown);
|
||||||
|
return (
|
||||||
|
<table className="text-sm w-full">
|
||||||
|
<thead>
|
||||||
|
<tr className="text-left text-gray-500 border-b border-gray-800">
|
||||||
|
<th className="py-1 pr-6 font-medium">Persona</th>
|
||||||
|
{policies.map((p) => (
|
||||||
|
<th key={p} className="py-1 pr-6 font-medium">{p}<br /><span className="font-normal text-xs">mean reward</span></th>
|
||||||
|
))}
|
||||||
|
<th className="py-1 font-medium">Winner</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{personas.map((persona) => {
|
||||||
|
const pdata = breakdown[persona];
|
||||||
|
const best = policies.reduce((a, b) =>
|
||||||
|
(pdata[a]?.reward ?? -Infinity) >= (pdata[b]?.reward ?? -Infinity) ? a : b
|
||||||
|
);
|
||||||
|
return (
|
||||||
|
<tr key={persona} className="border-b border-gray-900">
|
||||||
|
<td className="py-1.5 pr-6 text-gray-300">{persona}</td>
|
||||||
|
{policies.map((p) => {
|
||||||
|
const d = pdata[p];
|
||||||
|
const mean = d && d.n > 0 ? (d.reward / d.n).toFixed(3) : '—';
|
||||||
|
return (
|
||||||
|
<td key={p} className={`py-1.5 pr-6 font-mono text-xs ${p === best ? 'text-green-400' : 'text-gray-400'}`}>
|
||||||
|
{mean}
|
||||||
|
</td>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
<td className="py-1.5 text-xs text-indigo-400">{best}</td>
|
||||||
|
</tr>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Run detail panel ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function RunDetail({ runId, onClose }: { runId: string; onClose: () => void }) {
|
||||||
|
const [data, setData] = useState<{ run: SimRun; events: SimEvent[] } | null>(null);
|
||||||
|
const [error, setError] = useState('');
|
||||||
|
const pollRef = useRef<ReturnType<typeof setInterval> | null>(null);
|
||||||
|
|
||||||
|
const load = async () => {
|
||||||
|
try {
|
||||||
|
const d = await getSimRun(runId);
|
||||||
|
setData(d);
|
||||||
|
if (d.run.status !== 'running' && d.run.status !== 'pending') {
|
||||||
|
if (pollRef.current) clearInterval(pollRef.current);
|
||||||
|
}
|
||||||
|
} catch (e: unknown) {
|
||||||
|
setError(e instanceof Error ? e.message : 'Failed to load');
|
||||||
|
if (pollRef.current) clearInterval(pollRef.current);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
load();
|
||||||
|
pollRef.current = setInterval(load, 3000);
|
||||||
|
return () => { if (pollRef.current) clearInterval(pollRef.current); };
|
||||||
|
}, [runId]);
|
||||||
|
|
||||||
|
const run = data?.run;
|
||||||
|
const summary: Record<string, PolicySummary> | null = run?.summaryJson
|
||||||
|
? JSON.parse(run.summaryJson)
|
||||||
|
: null;
|
||||||
|
const breakdown: Record<string, Record<string, { reward: number; n: number }>> | null =
|
||||||
|
run?.personaBreakdownJson ? JSON.parse(run.personaBreakdownJson) : null;
|
||||||
|
const policies = run ? [run.policyA, run.policyB] : [];
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="fixed inset-0 bg-black/70 z-50 flex items-start justify-center pt-16 px-4 overflow-auto">
|
||||||
|
<div className="bg-gray-950 border border-gray-800 rounded-lg w-full max-w-3xl p-6 space-y-6">
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<div>
|
||||||
|
<h2 className="text-lg font-semibold">Simulation {runId}</h2>
|
||||||
|
{run && (
|
||||||
|
<p className="text-xs text-gray-500 mt-0.5">
|
||||||
|
{run.nUsers} users × {run.nRounds} rounds × {run.tasksPerRound} tasks
|
||||||
|
{' · '}{run.useLlm ? 'LLM judge' : 'Rule judge'}
|
||||||
|
</p>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
<button onClick={onClose} className="text-gray-500 hover:text-white text-sm">✕ Close</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{error && <p className="text-red-400 text-sm">{error}</p>}
|
||||||
|
|
||||||
|
{run && (
|
||||||
|
<div className="flex items-center gap-3">
|
||||||
|
<StatusBadge status={run.status} />
|
||||||
|
{run.winner && run.status === 'done' && (
|
||||||
|
<span className="px-2 py-0.5 bg-indigo-900/60 border border-indigo-700 rounded text-indigo-300 text-xs font-medium">
|
||||||
|
Winner: {run.winner}
|
||||||
|
</span>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{summary && (
|
||||||
|
<>
|
||||||
|
{/* Metric cards */}
|
||||||
|
<div className="grid grid-cols-2 gap-3">
|
||||||
|
{policies.map((p) => (
|
||||||
|
<div key={p} className="bg-gray-900 border border-gray-800 rounded p-4 space-y-2">
|
||||||
|
<div className="text-xs font-medium text-gray-400 truncate">{p}</div>
|
||||||
|
<div className="flex gap-4">
|
||||||
|
<Metric label="Total reward" value={summary[p]?.total_reward.toFixed(2)} />
|
||||||
|
<Metric label="Mean/pull" value={summary[p]?.mean_reward.toFixed(3)} />
|
||||||
|
<Metric label="Pulls" value={String(summary[p]?.n_pulls)} />
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Cumulative reward chart */}
|
||||||
|
<div className="space-y-2">
|
||||||
|
<h3 className="text-sm font-medium text-gray-400">Cumulative reward over rounds</h3>
|
||||||
|
<div className="bg-gray-900 border border-gray-800 rounded p-3">
|
||||||
|
<RewardCurve summary={summary} policies={policies} />
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Action distribution */}
|
||||||
|
<div className="space-y-2">
|
||||||
|
<h3 className="text-sm font-medium text-gray-400">Action distribution</h3>
|
||||||
|
<ActionTable summary={summary} policies={policies} />
|
||||||
|
</div>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{breakdown && (
|
||||||
|
<div className="space-y-2">
|
||||||
|
<h3 className="text-sm font-medium text-gray-400">Per-persona mean reward</h3>
|
||||||
|
<PersonaTable breakdown={breakdown} policies={policies} />
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{run?.status === 'running' && (
|
||||||
|
<p className="text-yellow-400 text-xs animate-pulse">Simulation running — auto-refreshing every 3s…</p>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Status badge ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function StatusBadge({ status }: { status: string }) {
|
||||||
|
const styles: Record<string, string> = {
|
||||||
|
pending: 'bg-gray-800 text-gray-400',
|
||||||
|
running: 'bg-yellow-900/60 text-yellow-300 border border-yellow-700',
|
||||||
|
done: 'bg-green-900/60 text-green-300 border border-green-700',
|
||||||
|
failed: 'bg-red-900/60 text-red-300 border border-red-700',
|
||||||
|
};
|
||||||
|
return (
|
||||||
|
<span className={`px-2 py-0.5 rounded text-xs font-medium ${styles[status] ?? styles.pending}`}>
|
||||||
|
{status}
|
||||||
|
</span>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function Metric({ label, value }: { label: string; value: string | undefined }) {
|
||||||
|
return (
|
||||||
|
<div>
|
||||||
|
<div className="text-[10px] text-gray-500 mb-0.5">{label}</div>
|
||||||
|
<div className="text-sm font-mono text-white">{value ?? '—'}</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Main page ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export default function SimulationsPage() {
|
||||||
|
const [runs, setRuns] = useState<SimRun[]>([]);
|
||||||
|
const [loading, setLoading] = useState(false);
|
||||||
|
const [error, setError] = useState('');
|
||||||
|
const [selectedId, setSelectedId] = useState<string | null>(null);
|
||||||
|
|
||||||
|
// Form state
|
||||||
|
const [nUsers, setNUsers] = useState(5);
|
||||||
|
const [nRounds, setNRounds] = useState(20);
|
||||||
|
const [tasksPerRound, setTasksPerRound] = useState(8);
|
||||||
|
const [useLlm, setUseLlm] = useState(false);
|
||||||
|
const [policyA, setPolicyA] = useState('linucb-v1');
|
||||||
|
const [policyB, setPolicyB] = useState('egreedy-v1');
|
||||||
|
const [launching, setLaunching] = useState(false);
|
||||||
|
const [launchError, setLaunchError] = useState('');
|
||||||
|
|
||||||
|
const loadRuns = async () => {
|
||||||
|
setLoading(true);
|
||||||
|
try {
|
||||||
|
const { runs: r } = await getSimRuns();
|
||||||
|
setRuns(r);
|
||||||
|
} catch (e: unknown) {
|
||||||
|
setError(e instanceof Error ? e.message : 'Failed to load');
|
||||||
|
} finally {
|
||||||
|
setLoading(false);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
useEffect(() => { loadRuns(); }, []);
|
||||||
|
|
||||||
|
const handleStart = async () => {
|
||||||
|
if (policyA === policyB) {
|
||||||
|
setLaunchError('Policies must be different');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
setLaunching(true);
|
||||||
|
setLaunchError('');
|
||||||
|
try {
|
||||||
|
const { id } = await startSimulation({
|
||||||
|
nUsers,
|
||||||
|
nRounds,
|
||||||
|
tasksPerRound,
|
||||||
|
useLlm,
|
||||||
|
policies: [policyA, policyB],
|
||||||
|
});
|
||||||
|
await loadRuns();
|
||||||
|
setSelectedId(id);
|
||||||
|
} catch (e: unknown) {
|
||||||
|
setLaunchError(e instanceof Error ? e.message : 'Failed to start');
|
||||||
|
} finally {
|
||||||
|
setLaunching(false);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<AdminShell>
|
||||||
|
<div className="space-y-6 max-w-4xl">
|
||||||
|
<div>
|
||||||
|
<h1 className="text-xl font-semibold">Simulations</h1>
|
||||||
|
<p className="text-sm text-gray-500 mt-1">
|
||||||
|
Compare recommendation policies offline using synthetic users and LLM-judged reactions.
|
||||||
|
ml/serving must be running.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Launch form */}
|
||||||
|
<div className="bg-gray-900 border border-gray-800 rounded-lg p-5 space-y-4">
|
||||||
|
<h2 className="text-sm font-semibold text-gray-300">New simulation</h2>
|
||||||
|
<div className="grid grid-cols-2 gap-4 sm:grid-cols-3">
|
||||||
|
<Field label="Policy A">
|
||||||
|
<select value={policyA} onChange={(e) => setPolicyA(e.target.value)} className={selectCls}>
|
||||||
|
{KNOWN_POLICIES.map((p) => <option key={p}>{p}</option>)}
|
||||||
|
</select>
|
||||||
|
</Field>
|
||||||
|
<Field label="Policy B">
|
||||||
|
<select value={policyB} onChange={(e) => setPolicyB(e.target.value)} className={selectCls}>
|
||||||
|
{KNOWN_POLICIES.map((p) => <option key={p}>{p}</option>)}
|
||||||
|
</select>
|
||||||
|
</Field>
|
||||||
|
<Field label="Users">
|
||||||
|
<input type="number" min={1} max={20} value={nUsers}
|
||||||
|
onChange={(e) => setNUsers(Number(e.target.value))} className={inputCls} />
|
||||||
|
</Field>
|
||||||
|
<Field label="Rounds">
|
||||||
|
<input type="number" min={5} max={100} value={nRounds}
|
||||||
|
onChange={(e) => setNRounds(Number(e.target.value))} className={inputCls} />
|
||||||
|
</Field>
|
||||||
|
<Field label="Tasks/round">
|
||||||
|
<input type="number" min={3} max={20} value={tasksPerRound}
|
||||||
|
onChange={(e) => setTasksPerRound(Number(e.target.value))} className={inputCls} />
|
||||||
|
</Field>
|
||||||
|
<Field label="Judge">
|
||||||
|
<label className="flex items-center gap-2 cursor-pointer mt-1">
|
||||||
|
<input type="checkbox" checked={useLlm} onChange={(e) => setUseLlm(e.target.checked)}
|
||||||
|
className="accent-indigo-500" />
|
||||||
|
<span className="text-sm text-gray-300">Claude Haiku</span>
|
||||||
|
</label>
|
||||||
|
{!useLlm && <p className="text-[10px] text-gray-500 mt-0.5">Deterministic rule judge</p>}
|
||||||
|
{useLlm && <p className="text-[10px] text-yellow-500 mt-0.5">Requires ANTHROPIC_API_KEY</p>}
|
||||||
|
</Field>
|
||||||
|
</div>
|
||||||
|
{launchError && <p className="text-red-400 text-xs">{launchError}</p>}
|
||||||
|
<button
|
||||||
|
onClick={handleStart}
|
||||||
|
disabled={launching}
|
||||||
|
className="bg-indigo-600 hover:bg-indigo-500 disabled:opacity-50 text-white rounded px-4 py-1.5 text-sm"
|
||||||
|
>
|
||||||
|
{launching ? 'Starting…' : 'Run simulation'}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Runs list */}
|
||||||
|
<div className="space-y-2">
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<h2 className="text-sm font-semibold text-gray-300">Past runs</h2>
|
||||||
|
<button onClick={loadRuns} className="text-xs text-gray-500 hover:text-white">Refresh</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{loading && <p className="text-gray-500 text-sm">Loading…</p>}
|
||||||
|
{error && <p className="text-red-400 text-sm">{error}</p>}
|
||||||
|
|
||||||
|
{runs.length === 0 && !loading && (
|
||||||
|
<p className="text-gray-600 text-sm">No simulation runs yet.</p>
|
||||||
|
)}
|
||||||
|
|
||||||
|
<div className="space-y-1">
|
||||||
|
{runs.map((run) => (
|
||||||
|
<button
|
||||||
|
key={run.id}
|
||||||
|
onClick={() => setSelectedId(run.id)}
|
||||||
|
className="w-full text-left bg-gray-900 hover:bg-gray-800 border border-gray-800 rounded px-4 py-3 flex items-center justify-between gap-4"
|
||||||
|
>
|
||||||
|
<div className="flex items-center gap-3 min-w-0">
|
||||||
|
<StatusBadge status={run.status} />
|
||||||
|
<span className="text-sm text-gray-300 font-mono truncate">{run.id}</span>
|
||||||
|
<span className="text-xs text-gray-500 hidden sm:inline">
|
||||||
|
{run.policyA} vs {run.policyB}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div className="flex items-center gap-4 flex-shrink-0">
|
||||||
|
{run.winner && (
|
||||||
|
<span className="text-xs text-indigo-400">→ {run.winner}</span>
|
||||||
|
)}
|
||||||
|
<span className="text-xs text-gray-600">{run.nUsers}u × {run.nRounds}r</span>
|
||||||
|
<span className="text-xs text-gray-600">
|
||||||
|
{new Date(run.createdAt).toLocaleString()}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
</button>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{selectedId && (
|
||||||
|
<RunDetail runId={selectedId} onClose={() => setSelectedId(null)} />
|
||||||
|
)}
|
||||||
|
</AdminShell>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Small helpers ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
const inputCls =
|
||||||
|
'w-full bg-gray-800 border border-gray-700 rounded px-2.5 py-1.5 text-sm text-gray-200 focus:outline-none focus:border-indigo-500';
|
||||||
|
const selectCls =
|
||||||
|
'w-full bg-gray-800 border border-gray-700 rounded px-2.5 py-1.5 text-sm text-gray-200 focus:outline-none focus:border-indigo-500';
|
||||||
|
|
||||||
|
function Field({ label, children }: { label: string; children: React.ReactNode }) {
|
||||||
|
return (
|
||||||
|
<div>
|
||||||
|
<label className="block text-xs text-gray-500 mb-1">{label}</label>
|
||||||
|
{children}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
@@ -11,6 +11,7 @@ const NAV = [
|
|||||||
{ href: '/tips', label: 'Rec log' },
|
{ href: '/tips', label: 'Rec log' },
|
||||||
{ href: '/reward-analytics', label: 'Rewards' },
|
{ href: '/reward-analytics', label: 'Rewards' },
|
||||||
{ href: '/experiments', label: 'Experiments' },
|
{ href: '/experiments', label: 'Experiments' },
|
||||||
|
{ href: '/simulations', label: 'Simulations' },
|
||||||
{ href: '/models', label: 'Models' },
|
{ href: '/models', label: 'Models' },
|
||||||
{ href: '/data-quality', label: 'Data quality' },
|
{ href: '/data-quality', label: 'Data quality' },
|
||||||
{ href: '/ops', label: 'Ops' },
|
{ href: '/ops', label: 'Ops' },
|
||||||
|
|||||||
@@ -220,3 +220,67 @@ export function saveQuery(name: string, querySql: string) {
|
|||||||
export function deleteSavedQuery(id: string) {
|
export function deleteSavedQuery(id: string) {
|
||||||
return apiFetch<{ ok: boolean }>(`/admin/saved-queries/${id}`, { method: 'DELETE' });
|
return apiFetch<{ ok: boolean }>(`/admin/saved-queries/${id}`, { method: 'DELETE' });
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── Simulation ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export interface PolicySummary {
|
||||||
|
total_reward: number;
|
||||||
|
mean_reward: number;
|
||||||
|
n_pulls: number;
|
||||||
|
cumulative_rewards: number[];
|
||||||
|
action_counts: Record<string, number>;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface SimRun {
|
||||||
|
id: string;
|
||||||
|
policyA: string;
|
||||||
|
policyB: string;
|
||||||
|
nUsers: number;
|
||||||
|
nRounds: number;
|
||||||
|
tasksPerRound: number;
|
||||||
|
useLlm: boolean;
|
||||||
|
status: 'pending' | 'running' | 'done' | 'failed';
|
||||||
|
summaryJson: string | null;
|
||||||
|
winner: string | null;
|
||||||
|
personaBreakdownJson: string | null;
|
||||||
|
createdAt: string;
|
||||||
|
finishedAt: string | null;
|
||||||
|
isRunning?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface SimEvent {
|
||||||
|
id: string;
|
||||||
|
runId: string;
|
||||||
|
round: number;
|
||||||
|
userId: string;
|
||||||
|
persona: string;
|
||||||
|
policy: string;
|
||||||
|
tipContent: string;
|
||||||
|
priority: number;
|
||||||
|
isOverdue: boolean;
|
||||||
|
action: string;
|
||||||
|
rewardMilli: number;
|
||||||
|
hour: number;
|
||||||
|
dayOfWeek: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function startSimulation(params: {
|
||||||
|
nUsers: number;
|
||||||
|
nRounds: number;
|
||||||
|
tasksPerRound: number;
|
||||||
|
useLlm: boolean;
|
||||||
|
policies: string[];
|
||||||
|
}) {
|
||||||
|
return apiFetch<{ id: string; status: string }>('/admin/simulate/start', {
|
||||||
|
method: 'POST',
|
||||||
|
body: JSON.stringify(params),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
export function getSimRuns() {
|
||||||
|
return apiFetch<{ runs: SimRun[] }>('/admin/simulate/runs');
|
||||||
|
}
|
||||||
|
|
||||||
|
export function getSimRun(id: string) {
|
||||||
|
return apiFetch<{ run: SimRun; events: SimEvent[] }>(`/admin/simulate/${id}`);
|
||||||
|
}
|
||||||
|
|||||||
12
apps/admin/tailwind.config.ts
Normal file
12
apps/admin/tailwind.config.ts
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
import type { Config } from 'tailwindcss';
|
||||||
|
|
||||||
|
const config: Config = {
|
||||||
|
content: [
|
||||||
|
'./src/**/*.{ts,tsx}',
|
||||||
|
'./node_modules/@tremor/**/*.{js,jsx,ts,tsx}',
|
||||||
|
],
|
||||||
|
theme: { extend: {} },
|
||||||
|
plugins: [],
|
||||||
|
};
|
||||||
|
|
||||||
|
export default config;
|
||||||
23
apps/admin/tsconfig.json
Normal file
23
apps/admin/tsconfig.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"compilerOptions": {
|
||||||
|
"target": "ES2017",
|
||||||
|
"lib": ["dom", "dom.iterable", "esnext"],
|
||||||
|
"allowJs": true,
|
||||||
|
"skipLibCheck": true,
|
||||||
|
"strict": true,
|
||||||
|
"noEmit": true,
|
||||||
|
"esModuleInterop": true,
|
||||||
|
"module": "esnext",
|
||||||
|
"moduleResolution": "bundler",
|
||||||
|
"resolveJsonModule": true,
|
||||||
|
"isolatedModules": true,
|
||||||
|
"jsx": "preserve",
|
||||||
|
"incremental": true,
|
||||||
|
"plugins": [{ "name": "next" }],
|
||||||
|
"paths": {
|
||||||
|
"@/*": ["./src/*"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
|
||||||
|
"exclude": ["node_modules"]
|
||||||
|
}
|
||||||
1
apps/admin/tsconfig.tsbuildinfo
Normal file
1
apps/admin/tsconfig.tsbuildinfo
Normal file
File diff suppressed because one or more lines are too long
11
apps/web/e2e/sign-in.spec.ts
Normal file
11
apps/web/e2e/sign-in.spec.ts
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
import { test, expect } from '@playwright/test';
|
||||||
|
|
||||||
|
test('sign-in page loads and shows Google button', async ({ page }) => {
|
||||||
|
await page.goto('/sign-in');
|
||||||
|
await expect(page.getByRole('link', { name: /google/i })).toBeVisible();
|
||||||
|
});
|
||||||
|
|
||||||
|
test('unauthenticated root redirects to sign-in', async ({ page }) => {
|
||||||
|
await page.goto('/');
|
||||||
|
await expect(page).toHaveURL(/sign-in/);
|
||||||
|
});
|
||||||
@@ -7,6 +7,10 @@
|
|||||||
"build": "next build",
|
"build": "next build",
|
||||||
"start": "next start -p 3079",
|
"start": "next start -p 3079",
|
||||||
"lint": "next lint",
|
"lint": "next lint",
|
||||||
|
"test": "vitest run",
|
||||||
|
"test:watch": "vitest",
|
||||||
|
"test:e2e": "playwright test",
|
||||||
|
"test:e2e:ui": "playwright test --ui",
|
||||||
"type-check": "tsc --noEmit",
|
"type-check": "tsc --noEmit",
|
||||||
"clean": "rm -rf .next"
|
"clean": "rm -rf .next"
|
||||||
},
|
},
|
||||||
@@ -17,9 +21,17 @@
|
|||||||
"react-dom": "^19.0.0"
|
"react-dom": "^19.0.0"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
|
"@playwright/test": "^1.59.1",
|
||||||
|
"@testing-library/jest-dom": "^6.9.1",
|
||||||
|
"@testing-library/react": "^16.3.2",
|
||||||
|
"@testing-library/user-event": "^14.6.1",
|
||||||
|
"@types/node": "^22.10.5",
|
||||||
"@types/react": "^19.0.0",
|
"@types/react": "^19.0.0",
|
||||||
"@types/react-dom": "^19.0.0",
|
"@types/react-dom": "^19.0.0",
|
||||||
"@types/node": "^22.10.5",
|
"@vitejs/plugin-react": "^6.0.1",
|
||||||
"typescript": "^5.7.3"
|
"@vitest/coverage-v8": "^4.1.4",
|
||||||
|
"jsdom": "^29.0.2",
|
||||||
|
"typescript": "^5.7.3",
|
||||||
|
"vitest": "^4.1.4"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
24
apps/web/playwright.config.ts
Normal file
24
apps/web/playwright.config.ts
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
import { defineConfig, devices } from '@playwright/test';
|
||||||
|
|
||||||
|
export default defineConfig({
|
||||||
|
testDir: './e2e',
|
||||||
|
fullyParallel: true,
|
||||||
|
forbidOnly: !!process.env.CI,
|
||||||
|
retries: process.env.CI ? 2 : 0,
|
||||||
|
reporter: 'html',
|
||||||
|
use: {
|
||||||
|
baseURL: process.env.BASE_URL ?? 'http://localhost:3079',
|
||||||
|
trace: 'on-first-retry',
|
||||||
|
},
|
||||||
|
projects: [
|
||||||
|
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
|
||||||
|
],
|
||||||
|
// Start dev server automatically in CI; locally, run `pnpm dev` first
|
||||||
|
webServer: process.env.CI
|
||||||
|
? {
|
||||||
|
command: 'pnpm build && pnpm start',
|
||||||
|
url: 'http://localhost:3079',
|
||||||
|
reuseExistingServer: false,
|
||||||
|
}
|
||||||
|
: undefined,
|
||||||
|
});
|
||||||
131
apps/web/src/components/__tests__/TipPage.test.tsx
Normal file
131
apps/web/src/components/__tests__/TipPage.test.tsx
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
||||||
|
import { render, screen, waitFor, act, fireEvent } from '@testing-library/react';
|
||||||
|
import userEvent from '@testing-library/user-event';
|
||||||
|
|
||||||
|
// Mock the API module — we test UI behaviour, not network calls
|
||||||
|
vi.mock('@/lib/api', () => ({
|
||||||
|
getRecommendation: vi.fn(),
|
||||||
|
sendFeedback: vi.fn().mockResolvedValue(undefined),
|
||||||
|
getVapidPublicKey: vi.fn(),
|
||||||
|
subscribePush: vi.fn(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
import { getRecommendation, sendFeedback } from '@/lib/api';
|
||||||
|
import TipPage from '@/app/tip/page';
|
||||||
|
|
||||||
|
const mockGetRec = getRecommendation as ReturnType<typeof vi.fn>;
|
||||||
|
const mockSendFeedback = sendFeedback as ReturnType<typeof vi.fn>;
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
vi.clearAllMocks();
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('TipPage — empty / error states', () => {
|
||||||
|
it('shows "All clear." when no tip is returned', async () => {
|
||||||
|
mockGetRec.mockResolvedValue(null);
|
||||||
|
render(<TipPage />);
|
||||||
|
await waitFor(() => expect(screen.getByText('All clear.')).toBeInTheDocument());
|
||||||
|
});
|
||||||
|
|
||||||
|
it('shows "All clear." when getRecommendation throws', async () => {
|
||||||
|
mockGetRec.mockRejectedValue(Object.assign(new Error('Network error'), { status: 503 }));
|
||||||
|
render(<TipPage />);
|
||||||
|
await waitFor(() => expect(screen.getByText('All clear.')).toBeInTheDocument());
|
||||||
|
});
|
||||||
|
|
||||||
|
it('"Check again" button re-calls getRecommendation', async () => {
|
||||||
|
mockGetRec.mockResolvedValue(null);
|
||||||
|
render(<TipPage />);
|
||||||
|
await waitFor(() => screen.getByText('Check again'));
|
||||||
|
|
||||||
|
mockGetRec.mockResolvedValue({
|
||||||
|
tip: { id: 'todoist:2', content: 'New tip', source: 'todoist', createdAt: '' },
|
||||||
|
});
|
||||||
|
fireEvent.click(screen.getByText('Check again'));
|
||||||
|
await waitFor(() => expect(mockGetRec).toHaveBeenCalledTimes(2));
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('TipPage — tip display', () => {
|
||||||
|
it('renders tip content after loading', async () => {
|
||||||
|
mockGetRec.mockResolvedValue({
|
||||||
|
tip: { id: 'todoist:1', content: 'Write the test', source: 'todoist', createdAt: '' },
|
||||||
|
});
|
||||||
|
render(<TipPage />);
|
||||||
|
await waitFor(() => expect(screen.getByText('Write the test')).toBeInTheDocument());
|
||||||
|
});
|
||||||
|
|
||||||
|
it('shows "hold to act" hint when tip is displayed', async () => {
|
||||||
|
mockGetRec.mockResolvedValue({
|
||||||
|
tip: { id: 'todoist:3', content: 'Do the thing', source: 'todoist', createdAt: '' },
|
||||||
|
});
|
||||||
|
render(<TipPage />);
|
||||||
|
await waitFor(() => expect(screen.getByText(/hold to act/i)).toBeInTheDocument());
|
||||||
|
});
|
||||||
|
|
||||||
|
it('shows "reading you…" while loading', async () => {
|
||||||
|
// Never resolves during this assertion
|
||||||
|
mockGetRec.mockReturnValue(new Promise(() => {}));
|
||||||
|
render(<TipPage />);
|
||||||
|
expect(screen.getByText(/reading you/i)).toBeInTheDocument();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('TipPage — action sheet', () => {
|
||||||
|
// Render with real timers, THEN switch to fake for hold simulation
|
||||||
|
async function renderTipAndHold(id: string, content: string) {
|
||||||
|
mockGetRec.mockResolvedValue({ tip: { id, content, source: 'todoist', createdAt: '' } });
|
||||||
|
render(<TipPage />);
|
||||||
|
// Wait for tip to appear (real timers — no deadlock)
|
||||||
|
await screen.findByText(content);
|
||||||
|
const main = screen.getByRole('main');
|
||||||
|
|
||||||
|
// Switch to fake timers now that the component is fully loaded
|
||||||
|
vi.useFakeTimers();
|
||||||
|
act(() => { main.dispatchEvent(new PointerEvent('pointerdown', { bubbles: true })); });
|
||||||
|
act(() => { vi.advanceTimersByTime(650); });
|
||||||
|
vi.useRealTimers();
|
||||||
|
|
||||||
|
// Wait for action sheet
|
||||||
|
await screen.findByText('Done ✓');
|
||||||
|
return main;
|
||||||
|
}
|
||||||
|
|
||||||
|
it('action sheet appears after a long press (600 ms)', async () => {
|
||||||
|
await renderTipAndHold('tip:lp', 'Hold me');
|
||||||
|
expect(screen.getByText('Done ✓')).toBeInTheDocument();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('action sheet does not appear on short press (<600 ms)', async () => {
|
||||||
|
mockGetRec.mockResolvedValue({ tip: { id: 'tip:sp', content: 'Short press', source: 'todoist', createdAt: '' } });
|
||||||
|
render(<TipPage />);
|
||||||
|
await screen.findByText('Short press');
|
||||||
|
const main = screen.getByRole('main');
|
||||||
|
|
||||||
|
vi.useFakeTimers();
|
||||||
|
act(() => { main.dispatchEvent(new PointerEvent('pointerdown', { bubbles: true })); });
|
||||||
|
act(() => { vi.advanceTimersByTime(200); });
|
||||||
|
act(() => { main.dispatchEvent(new PointerEvent('pointerup', { bubbles: true })); });
|
||||||
|
vi.useRealTimers();
|
||||||
|
|
||||||
|
expect(screen.queryByText('Done ✓')).not.toBeInTheDocument();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('clicking "Done ✓" calls sendFeedback with action=done', async () => {
|
||||||
|
await renderTipAndHold('tip:d', 'Do it');
|
||||||
|
await act(async () => { fireEvent.click(screen.getByText('Done ✓')); });
|
||||||
|
expect(mockSendFeedback).toHaveBeenCalledWith('tip:d', { action: 'done' });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('clicking "Dismiss" calls sendFeedback with action=dismiss', async () => {
|
||||||
|
await renderTipAndHold('tip:dis', 'Dismiss me');
|
||||||
|
await act(async () => { fireEvent.click(screen.getByText('Dismiss')); });
|
||||||
|
expect(mockSendFeedback).toHaveBeenCalledWith('tip:dis', { action: 'dismiss' });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('clicking "Helpful" calls sendFeedback with action=helpful (non-navigating)', async () => {
|
||||||
|
await renderTipAndHold('tip:help', 'Helpful tip');
|
||||||
|
await act(async () => { fireEvent.click(screen.getByText('Helpful')); });
|
||||||
|
expect(mockSendFeedback).toHaveBeenCalledWith('tip:help', { action: 'helpful' });
|
||||||
|
});
|
||||||
|
});
|
||||||
1
apps/web/src/test/setup.ts
Normal file
1
apps/web/src/test/setup.ts
Normal file
@@ -0,0 +1 @@
|
|||||||
|
import '@testing-library/jest-dom';
|
||||||
23
apps/web/vitest.config.ts
Normal file
23
apps/web/vitest.config.ts
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
import { defineConfig } from 'vitest/config';
|
||||||
|
import react from '@vitejs/plugin-react';
|
||||||
|
import { resolve } from 'path';
|
||||||
|
|
||||||
|
export default defineConfig({
|
||||||
|
plugins: [react()],
|
||||||
|
test: {
|
||||||
|
globals: true,
|
||||||
|
environment: 'jsdom',
|
||||||
|
setupFiles: ['./src/test/setup.ts'],
|
||||||
|
exclude: ['e2e/**', 'node_modules/**'],
|
||||||
|
coverage: {
|
||||||
|
provider: 'v8',
|
||||||
|
reporter: ['text', 'lcov'],
|
||||||
|
include: ['src/**'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
resolve: {
|
||||||
|
alias: {
|
||||||
|
'@': resolve(__dirname, 'src'),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
59
docs/adr/0006-admin-console-framework.md
Normal file
59
docs/adr/0006-admin-console-framework.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# ADR-0006: Admin console framework — Next.js 15 + Tremor + shadcn/ui + embed specialist tools
|
||||||
|
|
||||||
|
## Status
|
||||||
|
Accepted — 2026-04-15
|
||||||
|
|
||||||
|
## Context
|
||||||
|
M1 ships a bandit-driven recommender, an event bus, and a live feedback loop. Without a cockpit to observe these systems, every model change ships blind. An admin console is needed to:
|
||||||
|
|
||||||
|
1. **Observe** — DAU/WAU, tip outcomes, reaction rates, LinUCB arm stats, feature distributions
|
||||||
|
2. **Inspect** — per-user identity, consents, integrations, reward history
|
||||||
|
3. **Act** — revoke tokens, replay signals, reset a per-user bandit, promote a policy
|
||||||
|
4. **Audit** — every operator action is logged
|
||||||
|
|
||||||
|
The team is two people. The stack is TypeScript/React/Tailwind. Any framework that forks the stack creates a context-switch tax and a second deployment surface.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
### App shell — `apps/admin`, Next.js 15, App Router
|
||||||
|
|
||||||
|
Same stack as `apps/web`. Reuses `packages/shared-types`, the Auth.js session cookie, and the API rewrite convention. Deployed at `admin.o.alogins.net` behind Caddy, port 3080 in dev.
|
||||||
|
|
||||||
|
### UI libraries
|
||||||
|
|
||||||
|
| Layer | Library | Reason |
|
||||||
|
|-------|---------|--------|
|
||||||
|
| Charts / KPI | **Tremor** | Analytics-first React + Tailwind components (KPI cards, time-series, bar lists). Designed for dashboards, not bolted on. |
|
||||||
|
| CRUD primitives | **shadcn/ui** | Copy-paste Radix components; forms, dialogs, command palette. No version lock-in — code lives in-repo. |
|
||||||
|
| Heavy grids | **TanStack Table v8** | Sortable / paginated / virtualized tables for events, users, tips. |
|
||||||
|
| Extra charts | **Recharts** | Fallback where Tremor falls short (histograms, distributions). |
|
||||||
|
|
||||||
|
### Embed, don't rebuild
|
||||||
|
|
||||||
|
Specialized tooling is **reverse-proxied into the admin shell**, not reimplemented:
|
||||||
|
|
||||||
|
- **MLflow UI** → `/admin/models` (Caddy sub-path proxy)
|
||||||
|
- **Grafana panels** → `/admin/infra` (iframed or embedded panels)
|
||||||
|
- **Marimo notebooks** → launch-out link from admin
|
||||||
|
|
||||||
|
This prevents reimplementing artifact browsers or graph renderers we'd never do as well.
|
||||||
|
|
||||||
|
### AuthZ
|
||||||
|
|
||||||
|
`profile.role` column on the `users` table (values: `'user'` | `'admin'`). First admin seeded via `ADMIN_SEED_EMAIL` env var at startup. Admin-only gate in Next.js middleware checks the session and the role returned by `GET /api/user/me`. Every write action through the admin API is appended to an `admin_actions` audit log.
|
||||||
|
|
||||||
|
### Rejected alternatives
|
||||||
|
|
||||||
|
| Option | Rejected because |
|
||||||
|
|--------|-----------------|
|
||||||
|
| Retool / AppSmith | Admin logic leaves the repo; weak analytics affordances |
|
||||||
|
| Streamlit / Gradio | Python-first; splits the frontend stack; thin RBAC |
|
||||||
|
| React-admin / Refine.dev | Strong CRUD scaffolding, analytics views feel bolted on |
|
||||||
|
| Superset / Metabase as the admin surface | Excellent BI, poor operational writes; plan: adopt Superset in M4 for BI alongside batch pipelines |
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- One more Next.js app in the monorepo. Build/dev added to Turborepo.
|
||||||
|
- Tremor + shadcn/ui are added as dependencies. shadcn components are copied into `apps/admin/src/components/ui/` — no runtime version coupling.
|
||||||
|
- MLflow and Grafana must be reachable from the Caddy reverse proxy; they are not embedded in the JS bundle.
|
||||||
|
- `admin_actions` audit log grows unboundedly — needs a retention policy before M4.
|
||||||
47
docs/adr/0007-egreedy-v1-active-policy.md
Normal file
47
docs/adr/0007-egreedy-v1-active-policy.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# ADR-0007: ε-greedy v1 as the active recommendation policy
|
||||||
|
|
||||||
|
## Status
|
||||||
|
Accepted — 2026-04-16
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
M1 shipped LinUCB (d=5, α=1.0) as the first learned policy via `ml/serving /score`. After the M1 admin console landed, we ran an offline simulation to compare LinUCB against a new ε-greedy ridge-regression policy before deciding which to keep live.
|
||||||
|
|
||||||
|
**ε-greedy v1 design:**
|
||||||
|
- Ridge regression estimator, θ updated online (equivalent to LinUCB without the UCB bonus).
|
||||||
|
- d=7 feature vector: base 5 (is\_overdue, task\_age\_days, priority, hour\_of\_day, bias) + sin/cos encoding of day\_of\_week.
|
||||||
|
- ε=0.10 random exploration; 90% argmax(θ·x).
|
||||||
|
- Separate per-user state files (`{user}_egreedy.json`), independent of LinUCB state.
|
||||||
|
|
||||||
|
**Simulation setup (rule judge, seed=42):**
|
||||||
|
- 5 synthetic personas × 20 rounds × 8 tasks/round = 100 judgments per policy.
|
||||||
|
- Reward inferred from dwell-time (same `inferReward` logic as production): dismiss=−1, snooze=+0.1, done<15 s=−0.3, done 15 s–2 min=+1.0, done 2–10 min=+0.6, done>10 min=+0.3.
|
||||||
|
- Both policies started from blank state (no warm-up).
|
||||||
|
|
||||||
|
**Results:**
|
||||||
|
|
||||||
|
| Policy | Total reward | Mean reward/pull | Pulls |
|
||||||
|
|--------|-------------|-----------------|-------|
|
||||||
|
| egreedy-v1 | −54.80 | −0.548 | 100 |
|
||||||
|
| linucb-v1 | −60.60 | −0.606 | 100 |
|
||||||
|
|
||||||
|
Winner: **egreedy-v1** (+10.7% mean reward).
|
||||||
|
|
||||||
|
Both policies produce negative mean rewards under the dwell-time model — expected: most simulated users don't act in the 15s–2min magic zone on cold models. The gap widens from round 8 onward, consistent with LinUCB's UCB exploration bonus over-favouring high-uncertainty dimensions (is\_overdue, task\_age\_days) regardless of persona fit.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Promote **egreedy-v1** to the active serving policy:
|
||||||
|
- `POST /recommend` calls `/score/egreedy` instead of `/score`.
|
||||||
|
- Feedback loop calls `/reward/egreedy`.
|
||||||
|
- LinUCB (`/score`, `/reward`) remains deployed in `ml/serving` as a shadow-eligible fallback.
|
||||||
|
|
||||||
|
The simulation does not replace online A/B testing; it is evidence that egreedy-v1 is worth promoting before collecting real-user signal. A future milestone will run live A/B once we have enough daily active users for statistical power.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Recommendation calls and reward updates now hit the egreedy endpoints only.
|
||||||
|
- LinUCB state is preserved on disk; re-activation is a one-line change.
|
||||||
|
- `tip_scores.policy` will log `egreedy-v1` for new serves; historical rows remain `linucb-v1` or `random`.
|
||||||
|
- The dwell-time reward model (`inferReward`) is now the canonical feedback signal for both online updates and simulation. Explicit helpful/not\_helpful signals are removed.
|
||||||
|
- Next evaluation gate: once ≥500 real tips served with egreedy-v1, compare reward distribution to the LinUCB historical baseline in the admin Reward Analytics page before deciding on next policy iteration.
|
||||||
@@ -23,6 +23,22 @@ jobs:
|
|||||||
- run: pnpm build --filter=@oo/shared-types
|
- run: pnpm build --filter=@oo/shared-types
|
||||||
- run: pnpm type-check
|
- run: pnpm type-check
|
||||||
|
|
||||||
|
test:
|
||||||
|
name: Unit tests
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- uses: pnpm/action-setup@v4
|
||||||
|
with:
|
||||||
|
version: 10
|
||||||
|
- uses: actions/setup-node@v4
|
||||||
|
with:
|
||||||
|
node-version: 22
|
||||||
|
cache: pnpm
|
||||||
|
- run: pnpm install --frozen-lockfile
|
||||||
|
- run: pnpm build --filter=@oo/shared-types
|
||||||
|
- run: pnpm test
|
||||||
|
|
||||||
ml-lint:
|
ml-lint:
|
||||||
name: Python lint
|
name: Python lint
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
@@ -33,3 +49,15 @@ jobs:
|
|||||||
python-version: '3.12'
|
python-version: '3.12'
|
||||||
- run: pip install ruff
|
- run: pip install ruff
|
||||||
- run: ruff check ml/serving/
|
- run: ruff check ml/serving/
|
||||||
|
|
||||||
|
ml-test:
|
||||||
|
name: Python tests
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.12'
|
||||||
|
- run: python -m venv ml/serving/.venv
|
||||||
|
- run: ml/serving/.venv/bin/pip install -r ml/serving/requirements-dev.txt
|
||||||
|
- run: ml/serving/.venv/bin/python -m pytest ml/serving/tests/ -v
|
||||||
|
|||||||
204
ml/experiments/sim/llm_judge.py
Normal file
204
ml/experiments/sim/llm_judge.py
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
"""
|
||||||
|
LLM-based user reaction judge.
|
||||||
|
|
||||||
|
Uses Claude Haiku when ANTHROPIC_API_KEY is set; falls back to a
|
||||||
|
deterministic persona-based rule when it is not.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import random
|
||||||
|
|
||||||
|
from personas import Persona
|
||||||
|
|
||||||
|
ACTIONS = ["done", "snooze", "dismiss"]
|
||||||
|
|
||||||
|
# Reward is NOT a fixed map anymore — it depends on action + simulated dwell time.
|
||||||
|
# Use infer_reward() to compute the final reward after simulating dwell.
|
||||||
|
_BASE_REWARDS: dict[str, float] = {
|
||||||
|
"done": 1.0, # placeholder; real reward computed from dwell
|
||||||
|
"snooze": 0.1,
|
||||||
|
"dismiss": -1.0,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def infer_reward(action: str, dwell_ms: int) -> float:
|
||||||
|
"""Mirror of production inferReward() in recommender.ts."""
|
||||||
|
if action == "dismiss":
|
||||||
|
return -1.0
|
||||||
|
if action == "snooze":
|
||||||
|
return 0.1
|
||||||
|
# done — dwell-based
|
||||||
|
if dwell_ms < 15_000:
|
||||||
|
return -0.3 # stale / reflex done
|
||||||
|
if dwell_ms < 120_000:
|
||||||
|
return 1.0 # magic zone
|
||||||
|
if dwell_ms < 600_000:
|
||||||
|
return 0.6 # good
|
||||||
|
return 0.3 # eventually done
|
||||||
|
|
||||||
|
_HOUR_PERIODS = {
|
||||||
|
(5, 10): "morning",
|
||||||
|
(10, 14): "midday",
|
||||||
|
(14, 18): "afternoon",
|
||||||
|
(18, 22): "evening",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _period(hour: int) -> str:
|
||||||
|
for (lo, hi), name in _HOUR_PERIODS.items():
|
||||||
|
if lo <= hour < hi:
|
||||||
|
return name
|
||||||
|
return "night"
|
||||||
|
|
||||||
|
|
||||||
|
# ── Deterministic judge ────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _engagement_score(persona: Persona, tip: dict, hour: int) -> float:
|
||||||
|
"""0–1 score of how well this tip fits this persona right now."""
|
||||||
|
features = tip.get("features", {})
|
||||||
|
priority = features.get("priority", 1)
|
||||||
|
is_overdue = features.get("is_overdue", False)
|
||||||
|
|
||||||
|
p = 0.35
|
||||||
|
priority_norm = (priority - 1) / 3.0
|
||||||
|
p += (priority_norm - 0.5) * persona.prefers_high_priority * 0.4
|
||||||
|
if is_overdue:
|
||||||
|
p += (persona.prefers_overdue - 0.5) * 0.3
|
||||||
|
|
||||||
|
is_morning = 5 <= hour < 10
|
||||||
|
is_evening = 18 <= hour < 22
|
||||||
|
if persona.morning_active and is_morning:
|
||||||
|
p += 0.15
|
||||||
|
elif persona.evening_active and is_evening:
|
||||||
|
p += 0.15
|
||||||
|
elif persona.morning_active and not is_morning and not is_evening:
|
||||||
|
p -= 0.10
|
||||||
|
elif persona.evening_active and not is_evening and not is_morning:
|
||||||
|
p -= 0.10
|
||||||
|
|
||||||
|
return max(0.05, min(0.90, p))
|
||||||
|
|
||||||
|
|
||||||
|
def _simulate_dwell_ms(engagement: float, rng: random.Random) -> int:
|
||||||
|
"""
|
||||||
|
Simulate how many milliseconds the user takes to act on a tip.
|
||||||
|
|
||||||
|
High engagement → quick action (magic zone, 15s–2min).
|
||||||
|
Medium engagement → slower (2–10min).
|
||||||
|
Low engagement → very slow (>10min) — tip helped eventually but not 'magic'.
|
||||||
|
For snooze/dismiss the dwell doesn't affect reward; return a short value.
|
||||||
|
"""
|
||||||
|
if engagement >= 0.70:
|
||||||
|
# Strong match — magic zone: 15s–90s
|
||||||
|
return rng.randint(15_000, 90_000)
|
||||||
|
elif engagement >= 0.50:
|
||||||
|
# Moderate match — good zone: 2–8min
|
||||||
|
return rng.randint(120_000, 480_000)
|
||||||
|
else:
|
||||||
|
# Weak match but still done — eventually: 10–30min
|
||||||
|
return rng.randint(600_000, 1_800_000)
|
||||||
|
|
||||||
|
|
||||||
|
def _rule_judge(persona: Persona, tip: dict, hour: int, rng: random.Random) -> tuple[str, int]:
|
||||||
|
"""Return (action, dwell_ms) based on persona preferences and task features."""
|
||||||
|
engagement = _engagement_score(persona, tip, hour)
|
||||||
|
|
||||||
|
r = rng.random()
|
||||||
|
if r < engagement * 0.55:
|
||||||
|
# done — dwell depends on engagement
|
||||||
|
dwell = _simulate_dwell_ms(engagement, rng)
|
||||||
|
return "done", dwell
|
||||||
|
elif r < engagement:
|
||||||
|
return "snooze", rng.randint(3_000, 20_000)
|
||||||
|
else:
|
||||||
|
return "dismiss", rng.randint(1_000, 5_000)
|
||||||
|
|
||||||
|
|
||||||
|
# ── LLM judge ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
_anthropic_client = None
|
||||||
|
|
||||||
|
def _get_client():
|
||||||
|
global _anthropic_client
|
||||||
|
if _anthropic_client is None:
|
||||||
|
try:
|
||||||
|
import anthropic # type: ignore
|
||||||
|
key = os.environ.get("ANTHROPIC_API_KEY", "")
|
||||||
|
if key:
|
||||||
|
_anthropic_client = anthropic.Anthropic(api_key=key)
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
return _anthropic_client
|
||||||
|
|
||||||
|
|
||||||
|
def _llm_judge(
|
||||||
|
persona: Persona, tip: dict, hour: int, day_of_week: int, rng: random.Random,
|
||||||
|
) -> tuple[str, int]:
|
||||||
|
client = _get_client()
|
||||||
|
if client is None:
|
||||||
|
return _rule_judge(persona, tip, hour, rng)
|
||||||
|
|
||||||
|
features = tip.get("features", {})
|
||||||
|
priority = features.get("priority", 1)
|
||||||
|
is_overdue = features.get("is_overdue", False)
|
||||||
|
age_days = features.get("task_age_days", 0)
|
||||||
|
|
||||||
|
priority_label = {1: "low", 2: "normal", 3: "high", 4: "urgent"}.get(priority, "normal")
|
||||||
|
overdue_str = f", overdue by {age_days:.0f} day(s)" if is_overdue else ""
|
||||||
|
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
|
||||||
|
day_str = days[day_of_week % 7]
|
||||||
|
|
||||||
|
prompt = (
|
||||||
|
f"You are simulating how a specific user reacts to a task recommendation app.\n\n"
|
||||||
|
f"User persona: {persona.name}\n"
|
||||||
|
f"Persona: {persona.description}\n\n"
|
||||||
|
f'Recommended task: "{tip.get("content", "Unknown task")}"\n'
|
||||||
|
f"Task: priority={priority_label}{overdue_str}\n"
|
||||||
|
f"Current time: {_period(hour)} ({hour}:00, {day_str})\n\n"
|
||||||
|
f"How does this user react? Reply with exactly one word: done | snooze | dismiss\n\n"
|
||||||
|
f"- done: acts on this tip (marks task complete)\n"
|
||||||
|
f"- snooze: acknowledges but not now\n"
|
||||||
|
f"- dismiss: ignores or rejects it"
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
message = client.messages.create(
|
||||||
|
model="claude-haiku-4-5-20251001",
|
||||||
|
max_tokens=10,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
)
|
||||||
|
raw = message.content[0].text.strip().lower().split()[0]
|
||||||
|
action = raw if raw in ACTIONS else _rule_judge(persona, tip, hour, rng)[0]
|
||||||
|
except Exception:
|
||||||
|
action, _ = _rule_judge(persona, tip, hour, rng)
|
||||||
|
|
||||||
|
# Simulate dwell based on engagement level
|
||||||
|
engagement = _engagement_score(persona, tip, hour)
|
||||||
|
dwell = _simulate_dwell_ms(engagement, rng) if action == "done" else rng.randint(2_000, 15_000)
|
||||||
|
return action, dwell
|
||||||
|
|
||||||
|
|
||||||
|
# ── Public API ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def judge(
|
||||||
|
persona: Persona,
|
||||||
|
tip: dict,
|
||||||
|
hour: int,
|
||||||
|
day_of_week: int,
|
||||||
|
rng: random.Random,
|
||||||
|
use_llm: bool = True,
|
||||||
|
) -> tuple[str, int, float]:
|
||||||
|
"""Return (action, dwell_ms, reward).
|
||||||
|
|
||||||
|
action — 'done' | 'snooze' | 'dismiss'
|
||||||
|
dwell_ms — simulated milliseconds between tip appearance and user action
|
||||||
|
reward — inferred from action + dwell_ms via infer_reward()
|
||||||
|
"""
|
||||||
|
if use_llm and os.environ.get("ANTHROPIC_API_KEY"):
|
||||||
|
action, dwell_ms = _llm_judge(persona, tip, hour, day_of_week, rng)
|
||||||
|
else:
|
||||||
|
action, dwell_ms = _rule_judge(persona, tip, hour, rng)
|
||||||
|
|
||||||
|
return action, dwell_ms, infer_reward(action, dwell_ms)
|
||||||
79
ml/experiments/sim/personas.py
Normal file
79
ml/experiments/sim/personas.py
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
"""Synthetic user personas for simulation."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Persona:
|
||||||
|
name: str
|
||||||
|
description: str
|
||||||
|
# Feature preference weights — used by deterministic judge
|
||||||
|
prefers_high_priority: float # 0–1: scales response to priority
|
||||||
|
prefers_overdue: float # 0–1: scales response to overdue tasks
|
||||||
|
morning_active: bool # higher engagement hours 6–10
|
||||||
|
evening_active: bool # higher engagement hours 18–22
|
||||||
|
recency_bias: float # 0–1: prefers recently-due tasks
|
||||||
|
|
||||||
|
|
||||||
|
PERSONAS: list[Persona] = [
|
||||||
|
Persona(
|
||||||
|
name="deadline-driven",
|
||||||
|
description=(
|
||||||
|
"Responds urgently to overdue and high-priority tasks. "
|
||||||
|
"Most active in the morning. Dismisses low-priority tips."
|
||||||
|
),
|
||||||
|
prefers_high_priority=0.9,
|
||||||
|
prefers_overdue=0.85,
|
||||||
|
morning_active=True,
|
||||||
|
evening_active=False,
|
||||||
|
recency_bias=0.3,
|
||||||
|
),
|
||||||
|
Persona(
|
||||||
|
name="evening-relaxed",
|
||||||
|
description=(
|
||||||
|
"Reviews tasks in the evenings. Neutral on priority. "
|
||||||
|
"Snoozes morning recommendations."
|
||||||
|
),
|
||||||
|
prefers_high_priority=0.5,
|
||||||
|
prefers_overdue=0.4,
|
||||||
|
morning_active=False,
|
||||||
|
evening_active=True,
|
||||||
|
recency_bias=0.5,
|
||||||
|
),
|
||||||
|
Persona(
|
||||||
|
name="low-priority-first",
|
||||||
|
description=(
|
||||||
|
"Clears small tasks first. Snoozes urgent items until deadline. "
|
||||||
|
"Morning person."
|
||||||
|
),
|
||||||
|
prefers_high_priority=0.2,
|
||||||
|
prefers_overdue=0.6,
|
||||||
|
morning_active=True,
|
||||||
|
evening_active=False,
|
||||||
|
recency_bias=0.7,
|
||||||
|
),
|
||||||
|
Persona(
|
||||||
|
name="consistent-responder",
|
||||||
|
description=(
|
||||||
|
"Engages consistently across hours and days. "
|
||||||
|
"Acts on helpful tips regardless of priority."
|
||||||
|
),
|
||||||
|
prefers_high_priority=0.6,
|
||||||
|
prefers_overdue=0.6,
|
||||||
|
morning_active=True,
|
||||||
|
evening_active=True,
|
||||||
|
recency_bias=0.5,
|
||||||
|
),
|
||||||
|
Persona(
|
||||||
|
name="overdue-ignorer",
|
||||||
|
description=(
|
||||||
|
"Avoids overdue tasks (stress avoidance). "
|
||||||
|
"Focuses on future-due, high-priority items. Evening person."
|
||||||
|
),
|
||||||
|
prefers_high_priority=0.8,
|
||||||
|
prefers_overdue=0.1,
|
||||||
|
morning_active=False,
|
||||||
|
evening_active=True,
|
||||||
|
recency_bias=0.2,
|
||||||
|
),
|
||||||
|
]
|
||||||
527
ml/experiments/sim/runner.py
Normal file
527
ml/experiments/sim/runner.py
Normal file
@@ -0,0 +1,527 @@
|
|||||||
|
"""
|
||||||
|
oO simulation runner — compares two recommendation policies.
|
||||||
|
|
||||||
|
Judge modes:
|
||||||
|
rule Deterministic persona-based rules (default, no external deps)
|
||||||
|
llm Claude Haiku via Anthropic API (requires ANTHROPIC_API_KEY)
|
||||||
|
claude-code Two-phase: Claude Code acts as the judge (you are the judge)
|
||||||
|
|
||||||
|
Usage — rule/llm (single pass):
|
||||||
|
python runner.py --n-users 5 --n-rounds 10 --no-llm
|
||||||
|
python runner.py --n-users 5 --n-rounds 10
|
||||||
|
|
||||||
|
Usage — claude-code judge (two phases):
|
||||||
|
# Phase 1: score candidates, write judgment requests
|
||||||
|
python runner.py --judge claude-code --phase score \\
|
||||||
|
--n-users 5 --n-rounds 10 --out /tmp/oo-cc-sim.json
|
||||||
|
|
||||||
|
# (Claude Code reads /tmp/oo-cc-sim-requests.json and writes /tmp/oo-cc-sim-responses.json)
|
||||||
|
|
||||||
|
# Phase 2: apply responses, run rewards, produce results
|
||||||
|
python runner.py --judge claude-code --phase reward --plan /tmp/oo-cc-sim-plan.json \\
|
||||||
|
--out /tmp/oo-cc-sim.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import random
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent))
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from llm_judge import ACTIONS, infer_reward, judge
|
||||||
|
from personas import PERSONAS, Persona
|
||||||
|
from task_generator import generate_task_pool
|
||||||
|
|
||||||
|
POLICY_SCORE_ENDPOINTS: dict[str, str] = {
|
||||||
|
"linucb-v1": "/score",
|
||||||
|
"egreedy-v1": "/score/egreedy",
|
||||||
|
}
|
||||||
|
POLICY_REWARD_ENDPOINTS: dict[str, str] = {
|
||||||
|
"linucb-v1": "/reward",
|
||||||
|
"egreedy-v1": "/reward/egreedy",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _call_score(
|
||||||
|
client: httpx.Client, ml_url: str, policy: str,
|
||||||
|
user_id: str, tasks: list[dict], hour: int, dow: int,
|
||||||
|
) -> dict | None:
|
||||||
|
endpoint = POLICY_SCORE_ENDPOINTS.get(policy, "/score")
|
||||||
|
body = {
|
||||||
|
"user_id": user_id,
|
||||||
|
"candidates": [
|
||||||
|
{
|
||||||
|
"id": t["id"], "content": t["content"], "source": t["source"],
|
||||||
|
"source_id": None,
|
||||||
|
"features": {
|
||||||
|
"hour_of_day": hour,
|
||||||
|
"is_overdue": t["features"]["is_overdue"],
|
||||||
|
"task_age_days": t["features"]["task_age_days"],
|
||||||
|
"priority": t["features"]["priority"],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
for t in tasks
|
||||||
|
],
|
||||||
|
"context": {"hour_of_day": hour, "day_of_week": dow},
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
r = client.post(f"{ml_url}{endpoint}", json=body, timeout=5.0)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json()
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [warn] score {policy}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _call_reward(
|
||||||
|
client: httpx.Client, ml_url: str, policy: str,
|
||||||
|
user_id: str, tip_id: str, reward: float, features: dict,
|
||||||
|
day_of_week: int = 0,
|
||||||
|
) -> None:
|
||||||
|
endpoint = POLICY_REWARD_ENDPOINTS.get(policy, "/reward")
|
||||||
|
try:
|
||||||
|
client.post(
|
||||||
|
f"{ml_url}{endpoint}",
|
||||||
|
json={"user_id": user_id, "tip_id": tip_id, "reward": reward,
|
||||||
|
"features": features, "day_of_week": day_of_week},
|
||||||
|
timeout=5.0,
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [warn] reward {policy}: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Standard single-pass runner (rule / llm modes) ─────────────────────────
|
||||||
|
|
||||||
|
def run_simulation(
|
||||||
|
n_users: int, n_rounds: int, tasks_per_round: int,
|
||||||
|
ml_url: str, policies: list[str], use_llm: bool, seed: int,
|
||||||
|
) -> dict:
|
||||||
|
rng = random.Random(seed)
|
||||||
|
run_id = str(uuid.uuid4())[:8]
|
||||||
|
started_at = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
||||||
|
|
||||||
|
user_personas = [
|
||||||
|
(f"sim-{run_id}-u{i}", PERSONAS[i % len(PERSONAS)])
|
||||||
|
for i in range(n_users)
|
||||||
|
]
|
||||||
|
|
||||||
|
acc: dict[str, dict] = {
|
||||||
|
p: {
|
||||||
|
"total_reward": 0.0, "n_pulls": 0,
|
||||||
|
"cumulative_rewards": [],
|
||||||
|
"action_counts": {a: 0 for a in ACTIONS},
|
||||||
|
}
|
||||||
|
for p in policies
|
||||||
|
}
|
||||||
|
events: list[dict] = []
|
||||||
|
|
||||||
|
with httpx.Client(trust_env=False) as client:
|
||||||
|
for rnd in range(n_rounds):
|
||||||
|
hour = rng.randint(6, 22)
|
||||||
|
dow = rng.randint(0, 6)
|
||||||
|
round_rewards = {p: 0.0 for p in policies}
|
||||||
|
|
||||||
|
for user_id, persona in user_personas:
|
||||||
|
seed_tasks = rnd * 997 + abs(hash(user_id)) % 997
|
||||||
|
tasks = generate_task_pool(n=tasks_per_round, seed=seed_tasks)
|
||||||
|
|
||||||
|
for policy in policies:
|
||||||
|
p_user = f"{user_id}-{policy}"
|
||||||
|
scored = _call_score(client, ml_url, policy, p_user, tasks, hour, dow)
|
||||||
|
if not scored:
|
||||||
|
continue
|
||||||
|
tip_id = scored.get("tip_id")
|
||||||
|
tip = next((t for t in tasks if t["id"] == tip_id), None)
|
||||||
|
if not tip:
|
||||||
|
continue
|
||||||
|
|
||||||
|
action, dwell_ms, reward = judge(persona, tip, hour, dow, rng, use_llm=use_llm)
|
||||||
|
_call_reward(client, ml_url, policy, p_user, tip_id, reward, {
|
||||||
|
"hour_of_day": hour,
|
||||||
|
"is_overdue": tip["features"]["is_overdue"],
|
||||||
|
"task_age_days": tip["features"]["task_age_days"],
|
||||||
|
"priority": tip["features"]["priority"],
|
||||||
|
}, day_of_week=dow)
|
||||||
|
|
||||||
|
acc[policy]["total_reward"] += reward
|
||||||
|
acc[policy]["n_pulls"] += 1
|
||||||
|
acc[policy]["action_counts"][action] += 1
|
||||||
|
round_rewards[policy] += reward
|
||||||
|
events.append({
|
||||||
|
"round": rnd, "user_id": user_id, "persona": persona.name,
|
||||||
|
"policy": policy, "tip_content": tip["content"],
|
||||||
|
"priority": tip["features"]["priority"],
|
||||||
|
"is_overdue": tip["features"]["is_overdue"],
|
||||||
|
"action": action, "dwell_ms": dwell_ms, "reward": reward,
|
||||||
|
"hour": hour, "day_of_week": dow,
|
||||||
|
})
|
||||||
|
|
||||||
|
for p in policies:
|
||||||
|
prev = acc[p]["cumulative_rewards"][-1] if acc[p]["cumulative_rewards"] else 0.0
|
||||||
|
acc[p]["cumulative_rewards"].append(prev + round_rewards[p])
|
||||||
|
|
||||||
|
mode = "llm" if use_llm else "rule"
|
||||||
|
print(f" Round {rnd+1:>3}/{n_rounds} [{mode}] " + " ".join(
|
||||||
|
f"{p}={acc[p]['cumulative_rewards'][-1]:+.2f}" for p in policies
|
||||||
|
))
|
||||||
|
|
||||||
|
return _build_result(run_id, started_at, policies, acc, events,
|
||||||
|
n_users, n_rounds, tasks_per_round, use_llm, seed)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Claude Code judge — phase 1: score ─────────────────────────────────────
|
||||||
|
|
||||||
|
def run_score_phase(
|
||||||
|
n_users: int, n_rounds: int, tasks_per_round: int,
|
||||||
|
ml_url: str, policies: list[str], seed: int, out_path: str,
|
||||||
|
) -> None:
|
||||||
|
"""Score all candidates and write judgment requests for Claude Code."""
|
||||||
|
rng = random.Random(seed)
|
||||||
|
run_id = str(uuid.uuid4())[:8]
|
||||||
|
started_at = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
||||||
|
|
||||||
|
user_personas = [
|
||||||
|
(f"sim-{run_id}-u{i}", PERSONAS[i % len(PERSONAS)])
|
||||||
|
for i in range(n_users)
|
||||||
|
]
|
||||||
|
|
||||||
|
plan_rounds: list[dict] = []
|
||||||
|
judgment_requests: list[dict] = []
|
||||||
|
|
||||||
|
print(f"[Phase 1] Scoring {n_rounds} rounds × {n_users} users × {len(policies)} policies…")
|
||||||
|
|
||||||
|
with httpx.Client(trust_env=False) as client:
|
||||||
|
for rnd in range(n_rounds):
|
||||||
|
hour = rng.randint(6, 22)
|
||||||
|
dow = rng.randint(0, 6)
|
||||||
|
round_sessions: list[dict] = []
|
||||||
|
|
||||||
|
for user_id, persona in user_personas:
|
||||||
|
seed_tasks = rnd * 997 + abs(hash(user_id)) % 997
|
||||||
|
tasks = generate_task_pool(n=tasks_per_round, seed=seed_tasks)
|
||||||
|
|
||||||
|
for policy in policies:
|
||||||
|
p_user = f"{user_id}-{policy}"
|
||||||
|
scored = _call_score(client, ml_url, policy, p_user, tasks, hour, dow)
|
||||||
|
if not scored:
|
||||||
|
continue
|
||||||
|
tip_id = scored.get("tip_id")
|
||||||
|
tip = next((t for t in tasks if t["id"] == tip_id), None)
|
||||||
|
if not tip:
|
||||||
|
continue
|
||||||
|
|
||||||
|
req_id = f"r{rnd}_{user_id.split('-')[-1]}_{policy}"
|
||||||
|
round_sessions.append({
|
||||||
|
"req_id": req_id,
|
||||||
|
"p_user": p_user,
|
||||||
|
"policy": policy,
|
||||||
|
"user_id": user_id,
|
||||||
|
"persona_name": persona.name,
|
||||||
|
"tip_id": tip_id,
|
||||||
|
"tip_features": tip["features"],
|
||||||
|
"tip_content": tip["content"],
|
||||||
|
"ml_score": scored.get("score"),
|
||||||
|
})
|
||||||
|
|
||||||
|
judgment_requests.append({
|
||||||
|
"id": req_id,
|
||||||
|
"round": rnd,
|
||||||
|
"hour": hour,
|
||||||
|
"day_of_week": dow,
|
||||||
|
"policy": policy,
|
||||||
|
"persona_name": persona.name,
|
||||||
|
"persona_description": persona.description,
|
||||||
|
"tip_content": tip["content"],
|
||||||
|
"priority": tip["features"]["priority"],
|
||||||
|
"is_overdue": tip["features"]["is_overdue"],
|
||||||
|
"age_days": tip["features"]["task_age_days"],
|
||||||
|
"ml_score": scored.get("score"),
|
||||||
|
})
|
||||||
|
|
||||||
|
plan_rounds.append({
|
||||||
|
"round": rnd, "hour": hour, "dow": dow,
|
||||||
|
"sessions": round_sessions,
|
||||||
|
})
|
||||||
|
print(f" Round {rnd+1:>3}/{n_rounds}: {len(round_sessions)} sessions scored")
|
||||||
|
|
||||||
|
plan = {
|
||||||
|
"run_id": run_id,
|
||||||
|
"started_at": started_at,
|
||||||
|
"config": {
|
||||||
|
"n_users": n_users, "n_rounds": n_rounds,
|
||||||
|
"tasks_per_round": tasks_per_round, "policies": policies,
|
||||||
|
"use_llm": False, "seed": seed,
|
||||||
|
},
|
||||||
|
"user_personas": [
|
||||||
|
{"user_id": uid, "persona_name": p.name, "persona_description": p.description}
|
||||||
|
for uid, p in user_personas
|
||||||
|
],
|
||||||
|
"rounds": plan_rounds,
|
||||||
|
}
|
||||||
|
|
||||||
|
base = out_path.replace(".json", "")
|
||||||
|
plan_path = f"{base}-plan.json"
|
||||||
|
requests_path = f"{base}-requests.json"
|
||||||
|
responses_path = f"{base}-responses.json"
|
||||||
|
|
||||||
|
Path(plan_path).write_text(json.dumps(plan, indent=2))
|
||||||
|
Path(requests_path).write_text(json.dumps(judgment_requests, indent=2))
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("=" * 60)
|
||||||
|
print(f"Phase 1 complete — {len(judgment_requests)} judgment requests.")
|
||||||
|
print()
|
||||||
|
print(f" Requests : {requests_path}")
|
||||||
|
print(f" Plan : {plan_path}")
|
||||||
|
print()
|
||||||
|
print('Claude Code: read the requests file, judge each tip for the persona,')
|
||||||
|
print(f'then write your responses to: {responses_path}')
|
||||||
|
print()
|
||||||
|
print('Response format: { "<id>": "<action>" | { "action": "<action>", "dwell_ms": <int> } }')
|
||||||
|
print('Valid actions: done | snooze | dismiss')
|
||||||
|
print()
|
||||||
|
print('For "done", optionally specify dwell_ms (ms between tip appearing and user acting):')
|
||||||
|
print(' { "r0_u0_linucb-v1": { "action": "done", "dwell_ms": 45000 } } # magic zone')
|
||||||
|
print(' { "r0_u0_linucb-v1": "snooze" } # plain string also ok (uses default 60s dwell for done)')
|
||||||
|
print()
|
||||||
|
print('Reward is inferred from action + dwell_ms:')
|
||||||
|
print(' dismiss → -1.0')
|
||||||
|
print(' snooze → 0.1')
|
||||||
|
print(' done < 15s → -0.3 (stale task)')
|
||||||
|
print(' done 15s–2min → 1.0 (magic!)')
|
||||||
|
print(' done 2–10min → 0.6 (good)')
|
||||||
|
print(' done > 10min → 0.3 (eventually)')
|
||||||
|
print()
|
||||||
|
print('Then run Phase 2:')
|
||||||
|
print(f' python runner.py --judge claude-code --phase reward \\')
|
||||||
|
print(f' --plan {plan_path} --out {out_path}')
|
||||||
|
|
||||||
|
|
||||||
|
# ── Claude Code judge — phase 2: reward ────────────────────────────────────
|
||||||
|
|
||||||
|
def run_reward_phase(plan_path: str, out_path: str, ml_url: str) -> dict:
|
||||||
|
"""Apply Claude Code judgments, send reward signals, compute metrics."""
|
||||||
|
plan = json.loads(Path(plan_path).read_text())
|
||||||
|
base = plan_path.replace("-plan.json", "")
|
||||||
|
responses_path = f"{base}-responses.json"
|
||||||
|
|
||||||
|
if not Path(responses_path).exists():
|
||||||
|
print(f"ERROR: responses file not found: {responses_path}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
raw_responses = json.loads(Path(responses_path).read_text())
|
||||||
|
|
||||||
|
# Responses can be either { id: "action" } or { id: { action, dwell_ms } }
|
||||||
|
def _parse_response(v) -> tuple[str, int]:
|
||||||
|
if isinstance(v, dict):
|
||||||
|
return v["action"], int(v.get("dwell_ms", 60_000))
|
||||||
|
return str(v), 60_000 # plain string → assume 60s dwell for "done"
|
||||||
|
|
||||||
|
responses: dict[str, tuple[str, int]] = {k: _parse_response(v) for k, v in raw_responses.items()}
|
||||||
|
|
||||||
|
invalid = {k: v[0] for k, v in responses.items() if v[0] not in ACTIONS}
|
||||||
|
if invalid:
|
||||||
|
print(f"ERROR: invalid actions in responses: {invalid}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
policies: list[str] = plan["config"]["policies"]
|
||||||
|
acc: dict[str, dict] = {
|
||||||
|
p: {
|
||||||
|
"total_reward": 0.0, "n_pulls": 0,
|
||||||
|
"cumulative_rewards": [],
|
||||||
|
"action_counts": {a: 0 for a in ACTIONS},
|
||||||
|
}
|
||||||
|
for p in policies
|
||||||
|
}
|
||||||
|
events: list[dict] = []
|
||||||
|
persona_map = {u["user_id"]: u["persona_name"] for u in plan["user_personas"]}
|
||||||
|
missing_responses = 0
|
||||||
|
|
||||||
|
print(f"[Phase 2] Applying {len(responses)} judgments → reward calls…")
|
||||||
|
|
||||||
|
with httpx.Client(trust_env=False) as client:
|
||||||
|
for rnd_data in plan["rounds"]:
|
||||||
|
rnd = rnd_data["round"]
|
||||||
|
round_rewards = {p: 0.0 for p in policies}
|
||||||
|
|
||||||
|
for session in rnd_data["sessions"]:
|
||||||
|
req_id = session["req_id"]
|
||||||
|
resp = responses.get(req_id)
|
||||||
|
if not resp:
|
||||||
|
print(f" [warn] no response for {req_id}, defaulting to snooze")
|
||||||
|
action, dwell_ms = "snooze", 10_000
|
||||||
|
missing_responses += 1
|
||||||
|
else:
|
||||||
|
action, dwell_ms = resp
|
||||||
|
|
||||||
|
reward = infer_reward(action, dwell_ms)
|
||||||
|
_call_reward(
|
||||||
|
client, ml_url, session["policy"], session["p_user"],
|
||||||
|
session["tip_id"], reward,
|
||||||
|
{"hour_of_day": rnd_data["hour"], **session["tip_features"]},
|
||||||
|
day_of_week=rnd_data["dow"],
|
||||||
|
)
|
||||||
|
|
||||||
|
p = session["policy"]
|
||||||
|
acc[p]["total_reward"] += reward
|
||||||
|
acc[p]["n_pulls"] += 1
|
||||||
|
acc[p]["action_counts"][action] += 1
|
||||||
|
round_rewards[p] += reward
|
||||||
|
|
||||||
|
events.append({
|
||||||
|
"round": rnd,
|
||||||
|
"user_id": session["user_id"],
|
||||||
|
"persona": persona_map.get(session["user_id"], "?"),
|
||||||
|
"policy": p,
|
||||||
|
"tip_content": session["tip_content"],
|
||||||
|
"priority": session["tip_features"]["priority"],
|
||||||
|
"is_overdue": session["tip_features"]["is_overdue"],
|
||||||
|
"action": action,
|
||||||
|
"dwell_ms": dwell_ms,
|
||||||
|
"reward": reward,
|
||||||
|
"hour": rnd_data["hour"],
|
||||||
|
"day_of_week": rnd_data["dow"],
|
||||||
|
})
|
||||||
|
|
||||||
|
for p in policies:
|
||||||
|
prev = acc[p]["cumulative_rewards"][-1] if acc[p]["cumulative_rewards"] else 0.0
|
||||||
|
acc[p]["cumulative_rewards"].append(prev + round_rewards[p])
|
||||||
|
|
||||||
|
print(f" Round {rnd+1:>3}/{plan['config']['n_rounds']} [cc] " + " ".join(
|
||||||
|
f"{p}={acc[p]['cumulative_rewards'][-1]:+.2f}" for p in policies
|
||||||
|
))
|
||||||
|
|
||||||
|
if missing_responses:
|
||||||
|
print(f" [warn] {missing_responses} requests had no response (defaulted to snooze)")
|
||||||
|
|
||||||
|
cfg = plan["config"]
|
||||||
|
result = _build_result(
|
||||||
|
plan["run_id"], plan["started_at"], policies, acc, events,
|
||||||
|
cfg["n_users"], cfg["n_rounds"], cfg["tasks_per_round"],
|
||||||
|
use_llm=False, seed=cfg["seed"],
|
||||||
|
)
|
||||||
|
result["judge_mode"] = "claude-code"
|
||||||
|
Path(out_path).write_text(json.dumps(result, indent=2))
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ── Shared result builder ───────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _build_result(
|
||||||
|
run_id: str, started_at: str, policies: list[str],
|
||||||
|
acc: dict, events: list[dict],
|
||||||
|
n_users: int, n_rounds: int, tasks_per_round: int,
|
||||||
|
use_llm: bool, seed: int,
|
||||||
|
) -> dict:
|
||||||
|
summary = {
|
||||||
|
p: {
|
||||||
|
"total_reward": acc[p]["total_reward"],
|
||||||
|
"mean_reward": (
|
||||||
|
acc[p]["total_reward"] / acc[p]["n_pulls"]
|
||||||
|
if acc[p]["n_pulls"] > 0 else 0.0
|
||||||
|
),
|
||||||
|
"n_pulls": acc[p]["n_pulls"],
|
||||||
|
"cumulative_rewards": acc[p]["cumulative_rewards"],
|
||||||
|
"action_counts": acc[p]["action_counts"],
|
||||||
|
}
|
||||||
|
for p in policies
|
||||||
|
}
|
||||||
|
winner = max(policies, key=lambda p: summary[p]["total_reward"])
|
||||||
|
|
||||||
|
persona_breakdown: dict[str, dict] = {}
|
||||||
|
for ev in events:
|
||||||
|
pname = ev["persona"]
|
||||||
|
pol = ev["policy"]
|
||||||
|
persona_breakdown.setdefault(pname, {}).setdefault(pol, {"reward": 0.0, "n": 0})
|
||||||
|
persona_breakdown[pname][pol]["reward"] += ev["reward"]
|
||||||
|
persona_breakdown[pname][pol]["n"] += 1
|
||||||
|
|
||||||
|
return {
|
||||||
|
"run_id": run_id,
|
||||||
|
"started_at": started_at,
|
||||||
|
"finished_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
||||||
|
"config": {
|
||||||
|
"n_users": n_users, "n_rounds": n_rounds,
|
||||||
|
"tasks_per_round": tasks_per_round, "policies": policies,
|
||||||
|
"use_llm": use_llm, "seed": seed,
|
||||||
|
},
|
||||||
|
"summary": summary,
|
||||||
|
"winner": winner,
|
||||||
|
"persona_breakdown": persona_breakdown,
|
||||||
|
"events": events,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ── CLI ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description="oO simulation runner")
|
||||||
|
parser.add_argument("--judge", choices=["rule", "llm", "claude-code"], default="rule")
|
||||||
|
parser.add_argument("--phase", choices=["score", "reward"], default=None,
|
||||||
|
help="For --judge claude-code only")
|
||||||
|
parser.add_argument("--plan", default=None,
|
||||||
|
help="Plan file path (for --judge claude-code --phase reward)")
|
||||||
|
parser.add_argument("--n-users", type=int, default=5)
|
||||||
|
parser.add_argument("--n-rounds", type=int, default=20)
|
||||||
|
parser.add_argument("--tasks-per-round", type=int, default=8)
|
||||||
|
parser.add_argument("--ml-url", default="http://localhost:5001")
|
||||||
|
parser.add_argument("--policies", nargs="+", default=["linucb-v1", "egreedy-v1"])
|
||||||
|
parser.add_argument("--no-llm", action="store_true",
|
||||||
|
help="Alias for --judge rule (backwards compat)")
|
||||||
|
parser.add_argument("--seed", type=int, default=42)
|
||||||
|
parser.add_argument("--out", default=None)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.no_llm:
|
||||||
|
args.judge = "rule"
|
||||||
|
|
||||||
|
out_path = args.out or f"/tmp/oo-sim-{int(time.time())}.json"
|
||||||
|
|
||||||
|
if args.judge == "claude-code":
|
||||||
|
if args.phase == "score":
|
||||||
|
run_score_phase(
|
||||||
|
n_users=args.n_users, n_rounds=args.n_rounds,
|
||||||
|
tasks_per_round=args.tasks_per_round, ml_url=args.ml_url,
|
||||||
|
policies=args.policies, seed=args.seed, out_path=out_path,
|
||||||
|
)
|
||||||
|
elif args.phase == "reward":
|
||||||
|
if not args.plan:
|
||||||
|
print("ERROR: --plan is required for --phase reward", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
result = run_reward_phase(args.plan, out_path, args.ml_url)
|
||||||
|
print()
|
||||||
|
print(f"Winner : {result['winner']}")
|
||||||
|
for p, s in result["summary"].items():
|
||||||
|
print(f" {p:20s} total={s['total_reward']:+.2f} mean={s['mean_reward']:+.4f} pulls={s['n_pulls']}")
|
||||||
|
print(f"Results: {out_path}")
|
||||||
|
else:
|
||||||
|
print("ERROR: --judge claude-code requires --phase score or --phase reward",
|
||||||
|
file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
else:
|
||||||
|
use_llm = (args.judge == "llm")
|
||||||
|
print(f"oO simulation: {args.n_users} users × {args.n_rounds} rounds")
|
||||||
|
print(f"Policies : {args.policies}")
|
||||||
|
print(f"ML URL : {args.ml_url}")
|
||||||
|
print(f"Judge : {args.judge}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
result = run_simulation(
|
||||||
|
n_users=args.n_users, n_rounds=args.n_rounds,
|
||||||
|
tasks_per_round=args.tasks_per_round, ml_url=args.ml_url,
|
||||||
|
policies=args.policies, use_llm=use_llm, seed=args.seed,
|
||||||
|
)
|
||||||
|
Path(out_path).write_text(json.dumps(result, indent=2))
|
||||||
|
print()
|
||||||
|
print(f"Winner : {result['winner']}")
|
||||||
|
for p, s in result["summary"].items():
|
||||||
|
print(f" {p:20s} total={s['total_reward']:+.2f} mean={s['mean_reward']:+.4f} pulls={s['n_pulls']}")
|
||||||
|
print(f"Results: {out_path}")
|
||||||
62
ml/experiments/sim/task_generator.py
Normal file
62
ml/experiments/sim/task_generator.py
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
"""Generate synthetic task pools for simulation."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import random
|
||||||
|
|
||||||
|
_TEMPLATES = [
|
||||||
|
"Send weekly report to team",
|
||||||
|
"Review pull request #{n}",
|
||||||
|
"Schedule meeting with {name}",
|
||||||
|
"Update project documentation",
|
||||||
|
"Fix bug in authentication module",
|
||||||
|
"Prepare presentation for stakeholders",
|
||||||
|
"Call back {name}",
|
||||||
|
"Submit expense report",
|
||||||
|
"Review quarterly goals",
|
||||||
|
"Clean up inbox",
|
||||||
|
"Follow up on proposal to {name}",
|
||||||
|
"Complete onboarding checklist",
|
||||||
|
"Write tests for feature #{n}",
|
||||||
|
"Deploy hotfix to production",
|
||||||
|
"Respond to support ticket #{n}",
|
||||||
|
"Draft release notes",
|
||||||
|
"Update dependencies",
|
||||||
|
"Review design mockups",
|
||||||
|
"Archive old tickets",
|
||||||
|
"Check in with {name}",
|
||||||
|
]
|
||||||
|
|
||||||
|
_NAMES = ["Alice", "Bob", "Carol", "David", "Eve", "Frank", "Grace"]
|
||||||
|
|
||||||
|
|
||||||
|
def generate_task_pool(n: int = 10, seed: int | None = None) -> list[dict]:
|
||||||
|
"""Return n synthetic tasks with randomly sampled features."""
|
||||||
|
rng = random.Random(seed)
|
||||||
|
|
||||||
|
tasks = []
|
||||||
|
for i in range(n):
|
||||||
|
priority = rng.choices([1, 2, 3, 4], weights=[0.3, 0.3, 0.25, 0.15])[0]
|
||||||
|
# age_days: most tasks fresh, a few stale
|
||||||
|
age_days = rng.choices(
|
||||||
|
[0.0, 0.5, 1.0, 3.0, 7.0, 14.0],
|
||||||
|
weights=[0.35, 0.20, 0.20, 0.12, 0.08, 0.05],
|
||||||
|
)[0] + rng.random() * 0.5
|
||||||
|
# is_overdue only meaningful when age > 0
|
||||||
|
is_overdue = age_days > 0.5 and rng.random() < 0.65
|
||||||
|
|
||||||
|
template = rng.choice(_TEMPLATES)
|
||||||
|
content = template.format(n=rng.randint(100, 999), name=rng.choice(_NAMES))
|
||||||
|
|
||||||
|
tasks.append({
|
||||||
|
"id": f"sim:{i}",
|
||||||
|
"content": content,
|
||||||
|
"source": "sim",
|
||||||
|
"features": {
|
||||||
|
"is_overdue": is_overdue,
|
||||||
|
"task_age_days": age_days if is_overdue else 0.0,
|
||||||
|
"priority": priority,
|
||||||
|
},
|
||||||
|
})
|
||||||
|
|
||||||
|
return tasks
|
||||||
@@ -35,8 +35,10 @@ app = FastAPI(title="oO ML Serving", version="1.0.0")
|
|||||||
STATE_DIR = Path(os.getenv("STATE_DIR", "/tmp/oo-bandit-state"))
|
STATE_DIR = Path(os.getenv("STATE_DIR", "/tmp/oo-bandit-state"))
|
||||||
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
ALPHA = 1.0 # exploration coefficient
|
ALPHA = 1.0 # LinUCB exploration coefficient
|
||||||
D = 5 # feature dimension
|
D = 5 # LinUCB feature dimension
|
||||||
|
D7 = 7 # ε-greedy feature dimension (adds day-of-week cyclical encoding)
|
||||||
|
EPSILON = 0.1 # ε-greedy exploration rate
|
||||||
FEATURE_HISTORY_SIZE = 100 # per-user ring buffer
|
FEATURE_HISTORY_SIZE = 100 # per-user ring buffer
|
||||||
|
|
||||||
|
|
||||||
@@ -63,6 +65,8 @@ def build_feature_vector(features: dict) -> np.ndarray:
|
|||||||
|
|
||||||
# ── Per-user bandit state (disjoint LinUCB, global arm) ───────────────────
|
# ── Per-user bandit state (disjoint LinUCB, global arm) ───────────────────
|
||||||
|
|
||||||
|
# ── LinUCB state helpers ───────────────────────────────────────────────────
|
||||||
|
|
||||||
def state_path(user_id: str) -> Path:
|
def state_path(user_id: str) -> Path:
|
||||||
safe = "".join(c if c.isalnum() else "_" for c in user_id)
|
safe = "".join(c if c.isalnum() else "_" for c in user_id)
|
||||||
return STATE_DIR / f"{safe}.json"
|
return STATE_DIR / f"{safe}.json"
|
||||||
@@ -85,6 +89,37 @@ def save_state(user_id: str, A: np.ndarray, b: np.ndarray, meta: dict) -> None:
|
|||||||
p.write_text(json.dumps({"A": A.tolist(), "b": b.tolist(), "meta": meta}))
|
p.write_text(json.dumps({"A": A.tolist(), "b": b.tolist(), "meta": meta}))
|
||||||
|
|
||||||
|
|
||||||
|
# ── ε-greedy state helpers (d=7, extended features) ───────────────────────
|
||||||
|
|
||||||
|
def build_feature_vector_7(features: dict, day_of_week: int = 0) -> np.ndarray:
|
||||||
|
"""d=7: base 5 features + day-of-week cyclical encoding."""
|
||||||
|
base = build_feature_vector(features)
|
||||||
|
dow_sin = math.sin(2 * math.pi * day_of_week / 7)
|
||||||
|
dow_cos = math.cos(2 * math.pi * day_of_week / 7)
|
||||||
|
return np.append(base, [dow_sin, dow_cos])
|
||||||
|
|
||||||
|
|
||||||
|
def state7_path(user_id: str) -> Path:
|
||||||
|
safe = "".join(c if c.isalnum() else "_" for c in user_id)
|
||||||
|
return STATE_DIR / f"{safe}_egreedy.json"
|
||||||
|
|
||||||
|
|
||||||
|
def load_state7(user_id: str) -> tuple[np.ndarray, np.ndarray, dict]:
|
||||||
|
"""Returns (A, b, meta) for ε-greedy d=7 policy."""
|
||||||
|
p = state7_path(user_id)
|
||||||
|
if p.exists():
|
||||||
|
raw = json.loads(p.read_text())
|
||||||
|
A = np.array(raw["A"], dtype=np.float64)
|
||||||
|
b = np.array(raw["b"], dtype=np.float64)
|
||||||
|
return A, b, raw.get("meta", {})
|
||||||
|
return np.identity(D7, dtype=np.float64), np.zeros(D7, dtype=np.float64), {}
|
||||||
|
|
||||||
|
|
||||||
|
def save_state7(user_id: str, A: np.ndarray, b: np.ndarray, meta: dict) -> None:
|
||||||
|
p = state7_path(user_id)
|
||||||
|
p.write_text(json.dumps({"A": A.tolist(), "b": b.tolist(), "meta": meta}))
|
||||||
|
|
||||||
|
|
||||||
# ── API models ─────────────────────────────────────────────────────────────
|
# ── API models ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
class CandidateFeatures(BaseModel):
|
class CandidateFeatures(BaseModel):
|
||||||
@@ -124,6 +159,7 @@ class RewardRequest(BaseModel):
|
|||||||
tip_id: str
|
tip_id: str
|
||||||
reward: float # +1 done, +0.5 helpful, 0 snooze, -0.5 not_helpful, -1 dismiss
|
reward: float # +1 done, +0.5 helpful, 0 snooze, -0.5 not_helpful, -1 dismiss
|
||||||
features: CandidateFeatures
|
features: CandidateFeatures
|
||||||
|
day_of_week: int = 0 # included so egreedy can train dow features correctly
|
||||||
|
|
||||||
|
|
||||||
class RewardResponse(BaseModel):
|
class RewardResponse(BaseModel):
|
||||||
@@ -209,12 +245,131 @@ def reward(req: RewardRequest) -> RewardResponse:
|
|||||||
return RewardResponse(ok=True)
|
return RewardResponse(ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/score/egreedy", response_model=ScoreResponse)
|
||||||
|
def score_egreedy(req: ScoreRequest) -> ScoreResponse:
|
||||||
|
"""ε-greedy policy with d=7 features (adds day-of-week encoding).
|
||||||
|
|
||||||
|
Exploration: pick uniformly at random with probability ε.
|
||||||
|
Exploitation: pick argmax of linear payoff estimate θ·x.
|
||||||
|
Differs from LinUCB in: no UCB bonus, richer feature space.
|
||||||
|
"""
|
||||||
|
if not req.candidates:
|
||||||
|
raise HTTPException(status_code=422, detail="No candidates")
|
||||||
|
|
||||||
|
A, b, meta = load_state7(req.user_id)
|
||||||
|
try:
|
||||||
|
A_inv = np.linalg.inv(A)
|
||||||
|
except np.linalg.LinAlgError:
|
||||||
|
A_inv = np.identity(D7, dtype=np.float64)
|
||||||
|
theta = A_inv @ b
|
||||||
|
|
||||||
|
dow = req.context.day_of_week
|
||||||
|
exploring = np.random.random() < EPSILON
|
||||||
|
|
||||||
|
if exploring:
|
||||||
|
chosen = req.candidates[np.random.randint(len(req.candidates))]
|
||||||
|
feat_dict = {
|
||||||
|
"hour_of_day": req.context.hour_of_day,
|
||||||
|
"is_overdue": chosen.features.is_overdue,
|
||||||
|
"task_age_days": chosen.features.task_age_days,
|
||||||
|
"priority": chosen.features.priority,
|
||||||
|
}
|
||||||
|
x = build_feature_vector_7(feat_dict, dow)
|
||||||
|
best_score = float(theta @ x)
|
||||||
|
best_id = chosen.id
|
||||||
|
else:
|
||||||
|
best_id = None
|
||||||
|
best_score = -float("inf")
|
||||||
|
feat_dict = {}
|
||||||
|
for candidate in req.candidates:
|
||||||
|
fd = {
|
||||||
|
"hour_of_day": req.context.hour_of_day,
|
||||||
|
"is_overdue": candidate.features.is_overdue,
|
||||||
|
"task_age_days": candidate.features.task_age_days,
|
||||||
|
"priority": candidate.features.priority,
|
||||||
|
}
|
||||||
|
x = build_feature_vector_7(fd, dow)
|
||||||
|
s = float(theta @ x)
|
||||||
|
if s > best_score:
|
||||||
|
best_score = s
|
||||||
|
best_id = candidate.id
|
||||||
|
feat_dict = fd
|
||||||
|
|
||||||
|
history = get_feature_history(req.user_id)
|
||||||
|
history.append({
|
||||||
|
"ts": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
||||||
|
"features": {**feat_dict, "day_of_week": dow, "exploring": exploring},
|
||||||
|
"score": best_score,
|
||||||
|
"tip_id": best_id,
|
||||||
|
"policy": "egreedy-v1",
|
||||||
|
})
|
||||||
|
|
||||||
|
meta["pulls"] = meta.get("pulls", 0) + 1
|
||||||
|
meta["explore_count"] = meta.get("explore_count", 0) + int(exploring)
|
||||||
|
meta["last_updated"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
||||||
|
save_state7(req.user_id, A, b, meta)
|
||||||
|
|
||||||
|
return ScoreResponse(tip_id=best_id, score=best_score, policy="egreedy-v1")
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/reward/egreedy", response_model=RewardResponse)
|
||||||
|
def reward_egreedy(req: RewardRequest) -> RewardResponse:
|
||||||
|
"""Update ε-greedy ridge estimator with observed reward."""
|
||||||
|
A, b, meta = load_state7(req.user_id)
|
||||||
|
feat_dict = {
|
||||||
|
"hour_of_day": req.features.hour_of_day,
|
||||||
|
"is_overdue": req.features.is_overdue,
|
||||||
|
"task_age_days": req.features.task_age_days,
|
||||||
|
"priority": req.features.priority,
|
||||||
|
}
|
||||||
|
x = build_feature_vector_7(feat_dict, day_of_week=req.day_of_week)
|
||||||
|
A += np.outer(x, x)
|
||||||
|
b += req.reward * x
|
||||||
|
|
||||||
|
meta["cumulative_reward"] = meta.get("cumulative_reward", 0.0) + req.reward
|
||||||
|
meta["reward_count"] = meta.get("reward_count", 0) + 1
|
||||||
|
meta["last_updated"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
|
||||||
|
save_state7(req.user_id, A, b, meta)
|
||||||
|
return RewardResponse(ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/stats/egreedy/{user_id}")
|
||||||
|
def stats_egreedy(user_id: str):
|
||||||
|
"""ε-greedy policy stats — pulls, cumulative reward, θ vector."""
|
||||||
|
A, b, meta = load_state7(user_id)
|
||||||
|
try:
|
||||||
|
theta = (np.linalg.inv(A) @ b).tolist()
|
||||||
|
except np.linalg.LinAlgError:
|
||||||
|
theta = [0.0] * D7
|
||||||
|
|
||||||
|
pulls = meta.get("pulls", 0)
|
||||||
|
cumulative_reward = meta.get("cumulative_reward", 0.0)
|
||||||
|
reward_count = meta.get("reward_count", 0)
|
||||||
|
explore_count = meta.get("explore_count", 0)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"user_id": user_id,
|
||||||
|
"policy": "egreedy-v1",
|
||||||
|
"pulls": pulls,
|
||||||
|
"reward_count": reward_count,
|
||||||
|
"cumulative_reward": cumulative_reward,
|
||||||
|
"estimated_mean_reward": cumulative_reward / reward_count if reward_count > 0 else 0.0,
|
||||||
|
"exploration_rate": explore_count / pulls if pulls > 0 else 0.0,
|
||||||
|
"theta": theta,
|
||||||
|
"feature_labels": ["hour_sin", "hour_cos", "is_overdue", "task_age", "priority", "dow_sin", "dow_cos"],
|
||||||
|
"last_updated": meta.get("last_updated"),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
@app.post("/reset/{user_id}", response_model=RewardResponse)
|
@app.post("/reset/{user_id}", response_model=RewardResponse)
|
||||||
def reset(user_id: str) -> RewardResponse:
|
def reset(user_id: str) -> RewardResponse:
|
||||||
"""Reset per-user bandit state (admin action)."""
|
"""Reset per-user bandit state (admin action)."""
|
||||||
p = state_path(user_id)
|
p = state_path(user_id)
|
||||||
if p.exists():
|
if p.exists():
|
||||||
p.unlink()
|
p.unlink()
|
||||||
|
p7 = state7_path(user_id)
|
||||||
|
if p7.exists():
|
||||||
|
p7.unlink()
|
||||||
if user_id in _feature_history:
|
if user_id in _feature_history:
|
||||||
_feature_history[user_id].clear()
|
_feature_history[user_id].clear()
|
||||||
return RewardResponse(ok=True)
|
return RewardResponse(ok=True)
|
||||||
|
|||||||
@@ -4,6 +4,7 @@
|
|||||||
"private": true,
|
"private": true,
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"dev": ".venv/bin/uvicorn main:app --reload --port 8000",
|
"dev": ".venv/bin/uvicorn main:app --reload --port 8000",
|
||||||
"start": ".venv/bin/uvicorn main:app --port 8000"
|
"start": ".venv/bin/uvicorn main:app --port 8000",
|
||||||
|
"test": ".venv/bin/python -m pytest tests/ -v"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
4
ml/serving/requirements-dev.txt
Normal file
4
ml/serving/requirements-dev.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
-r requirements.txt
|
||||||
|
pytest==8.3.5
|
||||||
|
pytest-asyncio==0.24.0
|
||||||
|
httpx==0.28.1
|
||||||
@@ -2,3 +2,5 @@ fastapi==0.115.6
|
|||||||
uvicorn[standard]==0.32.1
|
uvicorn[standard]==0.32.1
|
||||||
pydantic==2.10.4
|
pydantic==2.10.4
|
||||||
numpy>=1.26.0
|
numpy>=1.26.0
|
||||||
|
httpx>=0.27.0
|
||||||
|
anthropic>=0.40.0
|
||||||
|
|||||||
0
ml/serving/tests/__init__.py
Normal file
0
ml/serving/tests/__init__.py
Normal file
261
ml/serving/tests/test_score.py
Normal file
261
ml/serving/tests/test_score.py
Normal file
@@ -0,0 +1,261 @@
|
|||||||
|
"""
|
||||||
|
Unit tests for ml/serving — feature building and scoring contract.
|
||||||
|
Run with: pytest ml/serving/tests/
|
||||||
|
"""
|
||||||
|
import math
|
||||||
|
import pytest
|
||||||
|
from httpx import AsyncClient, ASGITransport
|
||||||
|
|
||||||
|
from main import app, build_feature_vector
|
||||||
|
|
||||||
|
|
||||||
|
class TestFeatureVector:
|
||||||
|
def test_shape(self):
|
||||||
|
v = build_feature_vector({"hour_of_day": 8, "is_overdue": True, "task_age_days": 3, "priority": 3})
|
||||||
|
assert v.shape == (5,)
|
||||||
|
|
||||||
|
def test_hour_encoding_noon(self):
|
||||||
|
v = build_feature_vector({"hour_of_day": 12})
|
||||||
|
# sin(2π * 12/24) = sin(π) ≈ 0
|
||||||
|
assert abs(v[0]) < 1e-10
|
||||||
|
# cos(2π * 12/24) = cos(π) = -1
|
||||||
|
assert abs(v[1] - (-1.0)) < 1e-10
|
||||||
|
|
||||||
|
def test_hour_encoding_midnight(self):
|
||||||
|
v = build_feature_vector({"hour_of_day": 0})
|
||||||
|
# sin(0) = 0
|
||||||
|
assert abs(v[0]) < 1e-10
|
||||||
|
# cos(0) = 1
|
||||||
|
assert abs(v[1] - 1.0) < 1e-10
|
||||||
|
|
||||||
|
def test_hour_encoding_6am(self):
|
||||||
|
v = build_feature_vector({"hour_of_day": 6})
|
||||||
|
# sin(2π * 6/24) = sin(π/2) = 1
|
||||||
|
assert abs(v[0] - 1.0) < 1e-10
|
||||||
|
# cos(π/2) = 0
|
||||||
|
assert abs(v[1]) < 1e-10
|
||||||
|
|
||||||
|
def test_age_clipped_at_30(self):
|
||||||
|
v_long = build_feature_vector({"task_age_days": 100})
|
||||||
|
v_cap = build_feature_vector({"task_age_days": 30})
|
||||||
|
assert v_long[3] == v_cap[3] == 1.0
|
||||||
|
|
||||||
|
def test_age_zero(self):
|
||||||
|
v = build_feature_vector({"task_age_days": 0})
|
||||||
|
assert v[3] == pytest.approx(0.0)
|
||||||
|
|
||||||
|
def test_age_15_days_normalised(self):
|
||||||
|
v = build_feature_vector({"task_age_days": 15})
|
||||||
|
assert v[3] == pytest.approx(0.5)
|
||||||
|
|
||||||
|
def test_priority_normalised(self):
|
||||||
|
v1 = build_feature_vector({"priority": 1})
|
||||||
|
v4 = build_feature_vector({"priority": 4})
|
||||||
|
assert v1[4] == pytest.approx(0.0)
|
||||||
|
assert v4[4] == pytest.approx(1.0)
|
||||||
|
|
||||||
|
def test_priority_2_and_3(self):
|
||||||
|
v2 = build_feature_vector({"priority": 2})
|
||||||
|
v3 = build_feature_vector({"priority": 3})
|
||||||
|
assert v2[4] == pytest.approx(1 / 3)
|
||||||
|
assert v3[4] == pytest.approx(2 / 3)
|
||||||
|
|
||||||
|
def test_is_overdue_true(self):
|
||||||
|
v = build_feature_vector({"is_overdue": True})
|
||||||
|
assert v[2] == 1.0
|
||||||
|
|
||||||
|
def test_is_overdue_false(self):
|
||||||
|
v = build_feature_vector({"is_overdue": False})
|
||||||
|
assert v[2] == 0.0
|
||||||
|
|
||||||
|
def test_defaults_when_no_keys(self):
|
||||||
|
v = build_feature_vector({})
|
||||||
|
# hour=12 → sin(π)≈0, cos(π)=-1
|
||||||
|
assert abs(v[0]) < 1e-10
|
||||||
|
assert abs(v[1] - (-1.0)) < 1e-10
|
||||||
|
assert v[2] == 0.0 # is_overdue=False
|
||||||
|
assert v[3] == 0.0 # task_age_days=0
|
||||||
|
assert v[4] == 0.0 # priority=1 → (1-1)/3=0
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_health():
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r = await client.get("/health")
|
||||||
|
assert r.status_code == 200
|
||||||
|
assert r.json()["ok"] is True
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_score_returns_a_candidate():
|
||||||
|
payload = {
|
||||||
|
"user_id": "test-user",
|
||||||
|
"candidates": [
|
||||||
|
{"id": "t:1", "content": "Task A", "source": "todoist", "source_id": "1",
|
||||||
|
"features": {"is_overdue": True, "task_age_days": 2, "priority": 3}},
|
||||||
|
{"id": "t:2", "content": "Task B", "source": "todoist", "source_id": "2",
|
||||||
|
"features": {"is_overdue": False, "task_age_days": 0, "priority": 1}},
|
||||||
|
],
|
||||||
|
"context": {"hour_of_day": 9, "day_of_week": 1},
|
||||||
|
}
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r = await client.post("/score", json=payload)
|
||||||
|
assert r.status_code == 200
|
||||||
|
body = r.json()
|
||||||
|
assert body["tip_id"] in {"t:1", "t:2"}
|
||||||
|
assert "policy" in body
|
||||||
|
assert body["policy"] == "linucb-v1"
|
||||||
|
assert isinstance(body["score"], float)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_score_single_candidate_always_selected():
|
||||||
|
"""With a single candidate there is no choice — it must be returned."""
|
||||||
|
payload = {
|
||||||
|
"user_id": "solo-user",
|
||||||
|
"candidates": [
|
||||||
|
{"id": "only:1", "content": "Only task", "source": "todoist",
|
||||||
|
"features": {"is_overdue": False, "task_age_days": 0, "priority": 1}},
|
||||||
|
],
|
||||||
|
"context": {"hour_of_day": 10, "day_of_week": 0},
|
||||||
|
}
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r = await client.post("/score", json=payload)
|
||||||
|
assert r.status_code == 200
|
||||||
|
assert r.json()["tip_id"] == "only:1"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_score_empty_candidates_returns_422():
|
||||||
|
payload = {"user_id": "u", "candidates": [], "context": {"hour_of_day": 9, "day_of_week": 1}}
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r = await client.post("/score", json=payload)
|
||||||
|
assert r.status_code == 422
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reward_accepted():
|
||||||
|
payload = {
|
||||||
|
"user_id": "reward-user",
|
||||||
|
"tip_id": "t:1",
|
||||||
|
"reward": 1.0,
|
||||||
|
"features": {"hour_of_day": 9, "is_overdue": True, "task_age_days": 2, "priority": 3},
|
||||||
|
}
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r = await client.post("/reward", json=payload)
|
||||||
|
assert r.status_code == 200
|
||||||
|
assert r.json()["ok"] is True
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reward_updates_stats():
|
||||||
|
"""Posting a reward should increase cumulative_reward in /stats."""
|
||||||
|
user_id = "reward-stats-user"
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r0 = await client.get(f"/stats/{user_id}")
|
||||||
|
before = r0.json()["cumulative_reward"]
|
||||||
|
|
||||||
|
await client.post("/reward", json={
|
||||||
|
"user_id": user_id,
|
||||||
|
"tip_id": "tip:x",
|
||||||
|
"reward": 1.0,
|
||||||
|
"features": {"hour_of_day": 8, "is_overdue": False, "task_age_days": 0, "priority": 2},
|
||||||
|
})
|
||||||
|
r1 = await client.get(f"/stats/{user_id}")
|
||||||
|
assert r1.json()["cumulative_reward"] == pytest.approx(before + 1.0)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_score_increments_pulls():
|
||||||
|
user_id = "pull-counter-user"
|
||||||
|
payload = {
|
||||||
|
"user_id": user_id,
|
||||||
|
"candidates": [
|
||||||
|
{"id": "t:p1", "content": "Pull task", "source": "todoist",
|
||||||
|
"features": {"is_overdue": False, "task_age_days": 1, "priority": 2}},
|
||||||
|
],
|
||||||
|
"context": {"hour_of_day": 10, "day_of_week": 2},
|
||||||
|
}
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r0 = await client.get(f"/stats/{user_id}")
|
||||||
|
pulls_before = r0.json()["pulls"]
|
||||||
|
|
||||||
|
await client.post("/score", json=payload)
|
||||||
|
await client.post("/score", json=payload)
|
||||||
|
|
||||||
|
r1 = await client.get(f"/stats/{user_id}")
|
||||||
|
assert r1.json()["pulls"] == pulls_before + 2
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reset_clears_state():
|
||||||
|
user_id = "reset-user"
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
# Score once to build state
|
||||||
|
await client.post("/score", json={
|
||||||
|
"user_id": user_id,
|
||||||
|
"candidates": [
|
||||||
|
{"id": "t:r", "content": "Reset task", "source": "todoist",
|
||||||
|
"features": {"is_overdue": True, "task_age_days": 5, "priority": 4}},
|
||||||
|
],
|
||||||
|
"context": {"hour_of_day": 14, "day_of_week": 3},
|
||||||
|
})
|
||||||
|
r_reset = await client.post(f"/reset/{user_id}")
|
||||||
|
assert r_reset.json()["ok"] is True
|
||||||
|
|
||||||
|
r_stats = await client.get(f"/stats/{user_id}")
|
||||||
|
assert r_stats.json()["pulls"] == 0
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_features_endpoint_returns_history():
|
||||||
|
user_id = "features-user"
|
||||||
|
payload = {
|
||||||
|
"user_id": user_id,
|
||||||
|
"candidates": [
|
||||||
|
{"id": "t:f1", "content": "Feature task", "source": "todoist",
|
||||||
|
"features": {"is_overdue": False, "task_age_days": 0, "priority": 1}},
|
||||||
|
],
|
||||||
|
"context": {"hour_of_day": 7, "day_of_week": 0},
|
||||||
|
}
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
await client.post("/score", json=payload)
|
||||||
|
r = await client.get(f"/features/{user_id}")
|
||||||
|
body = r.json()
|
||||||
|
assert r.status_code == 200
|
||||||
|
assert "history" in body
|
||||||
|
assert len(body["history"]) >= 1
|
||||||
|
entry = body["history"][-1]
|
||||||
|
assert "ts" in entry
|
||||||
|
assert "score" in entry
|
||||||
|
assert "tip_id" in entry
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_stats_for_fresh_user():
|
||||||
|
"""A user with no history should return zero/default stats without error."""
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r = await client.get("/stats/brand-new-user-xyz-abc")
|
||||||
|
body = r.json()
|
||||||
|
assert r.status_code == 200
|
||||||
|
assert body["pulls"] == 0
|
||||||
|
assert body["cumulative_reward"] == 0.0
|
||||||
|
assert body["estimated_mean_reward"] == 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reward_negative_value():
|
||||||
|
"""Dismissing a tip should decrease cumulative_reward."""
|
||||||
|
user_id = "dismiss-user-neg"
|
||||||
|
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
|
||||||
|
r0 = await client.get(f"/stats/{user_id}")
|
||||||
|
before = r0.json()["cumulative_reward"]
|
||||||
|
|
||||||
|
await client.post("/reward", json={
|
||||||
|
"user_id": user_id,
|
||||||
|
"tip_id": "t:neg",
|
||||||
|
"reward": -1.0,
|
||||||
|
"features": {"hour_of_day": 20, "is_overdue": False, "task_age_days": 0, "priority": 1},
|
||||||
|
})
|
||||||
|
r1 = await client.get(f"/stats/{user_id}")
|
||||||
|
assert r1.json()["cumulative_reward"] == pytest.approx(before - 1.0)
|
||||||
@@ -12,10 +12,14 @@
|
|||||||
},
|
},
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"build": "tsc",
|
"build": "tsc",
|
||||||
|
"test": "vitest run",
|
||||||
|
"test:watch": "vitest",
|
||||||
"type-check": "tsc --noEmit",
|
"type-check": "tsc --noEmit",
|
||||||
"clean": "rm -rf dist"
|
"clean": "rm -rf dist"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"typescript": "^5.7.3"
|
"@vitest/coverage-v8": "^4.1.4",
|
||||||
|
"typescript": "^5.7.3",
|
||||||
|
"vitest": "^4.1.4"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
40
packages/shared-types/src/__tests__/tip.test.ts
Normal file
40
packages/shared-types/src/__tests__/tip.test.ts
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
import { describe, it, expect } from 'vitest';
|
||||||
|
import type { Tip, TipFeedback, RecommendResponse } from '../index.js';
|
||||||
|
|
||||||
|
describe('Tip type contract', () => {
|
||||||
|
it('accepts a valid Tip object', () => {
|
||||||
|
const tip: Tip = {
|
||||||
|
id: 'todoist:123',
|
||||||
|
content: 'Finish the report',
|
||||||
|
source: 'todoist',
|
||||||
|
sourceId: '123',
|
||||||
|
createdAt: new Date().toISOString(),
|
||||||
|
};
|
||||||
|
expect(tip.source).toBe('todoist');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('accepts advice source without sourceId', () => {
|
||||||
|
const tip: Tip = {
|
||||||
|
id: 'advice:abc',
|
||||||
|
content: 'Take a break',
|
||||||
|
source: 'advice',
|
||||||
|
createdAt: new Date().toISOString(),
|
||||||
|
};
|
||||||
|
expect(tip.sourceId).toBeUndefined();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('RecommendResponse wraps a Tip', () => {
|
||||||
|
const res: RecommendResponse = {
|
||||||
|
tip: { id: 'x', content: 'Do it', source: 'todoist', createdAt: '' },
|
||||||
|
};
|
||||||
|
expect(res.tip.id).toBe('x');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('TipFeedback allows valid actions', () => {
|
||||||
|
const actions: TipFeedback['action'][] = ['done', 'dismiss', 'snooze'];
|
||||||
|
for (const action of actions) {
|
||||||
|
const fb: TipFeedback = { action };
|
||||||
|
expect(fb.action).toBe(action);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
9
packages/shared-types/vitest.config.ts
Normal file
9
packages/shared-types/vitest.config.ts
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
import { defineConfig } from 'vitest/config';
|
||||||
|
|
||||||
|
export default defineConfig({
|
||||||
|
test: {
|
||||||
|
globals: true,
|
||||||
|
environment: 'node',
|
||||||
|
exclude: ['dist/**', 'node_modules/**'],
|
||||||
|
},
|
||||||
|
});
|
||||||
2682
pnpm-lock.yaml
generated
2682
pnpm-lock.yaml
generated
File diff suppressed because it is too large
Load Diff
@@ -8,6 +8,9 @@
|
|||||||
"build": "tsc",
|
"build": "tsc",
|
||||||
"dev": "tsx watch src/index.ts",
|
"dev": "tsx watch src/index.ts",
|
||||||
"start": "node dist/index.js",
|
"start": "node dist/index.js",
|
||||||
|
"test": "vitest run",
|
||||||
|
"test:watch": "vitest",
|
||||||
|
"test:coverage": "vitest run --coverage",
|
||||||
"type-check": "tsc --noEmit",
|
"type-check": "tsc --noEmit",
|
||||||
"clean": "rm -rf dist"
|
"clean": "rm -rf dist"
|
||||||
},
|
},
|
||||||
@@ -33,8 +36,10 @@
|
|||||||
"@types/express": "^5.0.0",
|
"@types/express": "^5.0.0",
|
||||||
"@types/express-session": "^1.18.1",
|
"@types/express-session": "^1.18.1",
|
||||||
"@types/web-push": "^3.6.4",
|
"@types/web-push": "^3.6.4",
|
||||||
|
"@vitest/coverage-v8": "^4.1.4",
|
||||||
"drizzle-kit": "^0.30.4",
|
"drizzle-kit": "^0.30.4",
|
||||||
"tsx": "^4.19.2",
|
"tsx": "^4.19.2",
|
||||||
"typescript": "^5.7.3"
|
"typescript": "^5.7.3",
|
||||||
|
"vitest": "^4.1.4"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -99,6 +99,40 @@ export function runMigrations() {
|
|||||||
sql TEXT NOT NULL,
|
sql TEXT NOT NULL,
|
||||||
created_at TEXT NOT NULL
|
created_at TEXT NOT NULL
|
||||||
);
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS sim_runs (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
policy_a TEXT NOT NULL,
|
||||||
|
policy_b TEXT NOT NULL,
|
||||||
|
n_users INTEGER NOT NULL,
|
||||||
|
n_rounds INTEGER NOT NULL,
|
||||||
|
tasks_per_round INTEGER NOT NULL DEFAULT 8,
|
||||||
|
use_llm INTEGER NOT NULL DEFAULT 0,
|
||||||
|
status TEXT NOT NULL DEFAULT 'pending',
|
||||||
|
summary_json TEXT,
|
||||||
|
winner TEXT,
|
||||||
|
persona_breakdown_json TEXT,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
finished_at TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS sim_events (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
run_id TEXT NOT NULL REFERENCES sim_runs(id),
|
||||||
|
round INTEGER NOT NULL,
|
||||||
|
user_id TEXT NOT NULL,
|
||||||
|
persona TEXT NOT NULL,
|
||||||
|
policy TEXT NOT NULL,
|
||||||
|
tip_content TEXT NOT NULL,
|
||||||
|
priority INTEGER NOT NULL,
|
||||||
|
is_overdue INTEGER NOT NULL,
|
||||||
|
action TEXT NOT NULL,
|
||||||
|
dwell_ms INTEGER,
|
||||||
|
reward_milli INTEGER NOT NULL,
|
||||||
|
hour INTEGER NOT NULL,
|
||||||
|
day_of_week INTEGER NOT NULL,
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
`);
|
`);
|
||||||
|
|
||||||
// Additive column migrations — safe to run on existing DBs.
|
// Additive column migrations — safe to run on existing DBs.
|
||||||
@@ -106,6 +140,8 @@ export function runMigrations() {
|
|||||||
for (const stmt of [
|
for (const stmt of [
|
||||||
`ALTER TABLE users ADD COLUMN role TEXT NOT NULL DEFAULT 'user'`,
|
`ALTER TABLE users ADD COLUMN role TEXT NOT NULL DEFAULT 'user'`,
|
||||||
`ALTER TABLE push_subscriptions ADD COLUMN created_at TEXT NOT NULL DEFAULT ''`,
|
`ALTER TABLE push_subscriptions ADD COLUMN created_at TEXT NOT NULL DEFAULT ''`,
|
||||||
|
`ALTER TABLE tip_feedback ADD COLUMN dwell_ms INTEGER`,
|
||||||
|
`ALTER TABLE tip_feedback ADD COLUMN reward_milli INTEGER`,
|
||||||
]) {
|
]) {
|
||||||
try { sqlite.exec(stmt); } catch { /* column already exists */ }
|
try { sqlite.exec(stmt); } catch { /* column already exists */ }
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -29,6 +29,8 @@ export const tipFeedback = sqliteTable('tip_feedback', {
|
|||||||
tipId: text('tip_id').notNull(),
|
tipId: text('tip_id').notNull(),
|
||||||
action: text('action').notNull(), // 'done' | 'dismiss' | 'snooze'
|
action: text('action').notNull(), // 'done' | 'dismiss' | 'snooze'
|
||||||
sourceId: text('source_id'),
|
sourceId: text('source_id'),
|
||||||
|
dwellMs: integer('dwell_ms'), // ms between servedAt and feedback; null if unknown
|
||||||
|
rewardMilli: integer('reward_milli'), // inferred reward × 1000 (e.g. 1000 = +1.0)
|
||||||
createdAt: text('created_at').notNull(),
|
createdAt: text('created_at').notNull(),
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -81,6 +83,43 @@ export const tipScores = sqliteTable('tip_scores', {
|
|||||||
servedAt: text('served_at').notNull(),
|
servedAt: text('served_at').notNull(),
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ── Simulation runs ──────────────────────────────────────────────────────────
|
||||||
|
// One row per offline simulation run (two-policy comparison).
|
||||||
|
export const simRuns = sqliteTable('sim_runs', {
|
||||||
|
id: text('id').primaryKey(),
|
||||||
|
policyA: text('policy_a').notNull(),
|
||||||
|
policyB: text('policy_b').notNull(),
|
||||||
|
nUsers: integer('n_users').notNull(),
|
||||||
|
nRounds: integer('n_rounds').notNull(),
|
||||||
|
tasksPerRound: integer('tasks_per_round').notNull().default(8),
|
||||||
|
useLlm: integer('use_llm', { mode: 'boolean' }).notNull().default(false),
|
||||||
|
status: text('status').notNull().default('pending'), // 'pending'|'running'|'done'|'failed'
|
||||||
|
summaryJson: text('summary_json'), // JSON: { [policy]: PolicySummary }
|
||||||
|
winner: text('winner'),
|
||||||
|
personaBreakdownJson: text('persona_breakdown_json'), // JSON: { [persona]: { [policy]: {reward,n} } }
|
||||||
|
createdAt: text('created_at').notNull(),
|
||||||
|
finishedAt: text('finished_at'),
|
||||||
|
});
|
||||||
|
|
||||||
|
// One row per tip served in a simulation round.
|
||||||
|
export const simEvents = sqliteTable('sim_events', {
|
||||||
|
id: text('id').primaryKey(),
|
||||||
|
runId: text('run_id').notNull().references(() => simRuns.id),
|
||||||
|
round: integer('round').notNull(),
|
||||||
|
userId: text('user_id').notNull(),
|
||||||
|
persona: text('persona').notNull(),
|
||||||
|
policy: text('policy').notNull(),
|
||||||
|
tipContent: text('tip_content').notNull(),
|
||||||
|
priority: integer('priority').notNull(),
|
||||||
|
isOverdue: integer('is_overdue', { mode: 'boolean' }).notNull(),
|
||||||
|
action: text('action').notNull(), // 'done' | 'snooze' | 'dismiss'
|
||||||
|
dwellMs: integer('dwell_ms'), // simulated ms between tip appear and user action
|
||||||
|
rewardMilli: integer('reward_milli').notNull(), // inferred reward × 1000
|
||||||
|
hour: integer('hour').notNull(),
|
||||||
|
dayOfWeek: integer('day_of_week').notNull(),
|
||||||
|
createdAt: text('created_at').notNull(),
|
||||||
|
});
|
||||||
|
|
||||||
// Admin saved SQL queries.
|
// Admin saved SQL queries.
|
||||||
export const savedQueries = sqliteTable('saved_queries', {
|
export const savedQueries = sqliteTable('saved_queries', {
|
||||||
id: text('id').primaryKey(),
|
id: text('id').primaryKey(),
|
||||||
|
|||||||
173
services/api/src/events/__tests__/bus.test.ts
Normal file
173
services/api/src/events/__tests__/bus.test.ts
Normal file
@@ -0,0 +1,173 @@
|
|||||||
|
import { describe, it, expect, vi } from 'vitest';
|
||||||
|
import { Bus, bus } from '../bus.js';
|
||||||
|
|
||||||
|
// Use a fresh Bus instance for isolation in most tests
|
||||||
|
function makeBus() {
|
||||||
|
return new Bus();
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('EventBus — delivery', () => {
|
||||||
|
it('delivers a published event to subscribers', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
const handler = vi.fn();
|
||||||
|
b.subscribe('signals.tip.served', handler);
|
||||||
|
|
||||||
|
const payload = { userId: 'u1', tipId: 'tip:1', policy: 'random', servedAt: new Date().toISOString() };
|
||||||
|
b.publish('signals.tip.served', payload);
|
||||||
|
|
||||||
|
expect(handler).toHaveBeenCalledOnce();
|
||||||
|
expect(handler).toHaveBeenCalledWith(payload);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('delivers to multiple subscribers on the same subject', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
const h1 = vi.fn();
|
||||||
|
const h2 = vi.fn();
|
||||||
|
b.subscribe('signals.tip.served', h1);
|
||||||
|
b.subscribe('signals.tip.served', h2);
|
||||||
|
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
expect(h1).toHaveBeenCalledOnce();
|
||||||
|
expect(h2).toHaveBeenCalledOnce();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('does not deliver to handlers on a different subject', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
const feedbackHandler = vi.fn();
|
||||||
|
b.subscribe('signals.tip.feedback', feedbackHandler);
|
||||||
|
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
expect(feedbackHandler).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('does not call a handler after bus.off()', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
const handler = vi.fn();
|
||||||
|
b.subscribe('signals.tip.served', handler);
|
||||||
|
b.off('signals.tip.served', handler);
|
||||||
|
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
expect(handler).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('does not throw when publishing with no subscribers', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
expect(() =>
|
||||||
|
b.publish('signals.task.synced', { userId: 'u', count: 3, syncedAt: '' }),
|
||||||
|
).not.toThrow();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('reward maps correctly: done=1, dismiss=-1, snooze=0', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
const cases: Array<['done' | 'dismiss' | 'snooze', number]> = [
|
||||||
|
['done', 1.0],
|
||||||
|
['dismiss', -1.0],
|
||||||
|
['snooze', 0.0],
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const [action, expected] of cases) {
|
||||||
|
const handler = vi.fn();
|
||||||
|
b.subscribe('signals.tip.feedback', handler);
|
||||||
|
|
||||||
|
const payload = {
|
||||||
|
userId: 'u1',
|
||||||
|
tipId: 'todoist:42',
|
||||||
|
action,
|
||||||
|
reward: action === 'done' ? 1.0 : action === 'dismiss' ? -1.0 : 0.0,
|
||||||
|
dwellMs: null,
|
||||||
|
createdAt: new Date().toISOString(),
|
||||||
|
};
|
||||||
|
b.publish('signals.tip.feedback', payload);
|
||||||
|
|
||||||
|
expect(handler).toHaveBeenCalledWith(expect.objectContaining({ reward: expected }));
|
||||||
|
b.off('signals.tip.feedback', handler);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('EventBus — ring buffer / tail()', () => {
|
||||||
|
it('tail() returns published events', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
b.publish('signals.tip.served', { userId: 'u1', tipId: 't1', policy: 'p', servedAt: '' });
|
||||||
|
b.publish('signals.tip.served', { userId: 'u2', tipId: 't2', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
const events = b.tail();
|
||||||
|
expect(events.length).toBeGreaterThanOrEqual(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('tail() filters by subject prefix', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't', policy: 'p', servedAt: '' });
|
||||||
|
b.publish('signals.task.synced', { userId: 'u', count: 1, syncedAt: '' });
|
||||||
|
|
||||||
|
const tipEvents = b.tail({ subject: 'signals.tip' });
|
||||||
|
expect(tipEvents.every((e) => e.subject.startsWith('signals.tip'))).toBe(true);
|
||||||
|
|
||||||
|
const taskEvents = b.tail({ subject: 'signals.task' });
|
||||||
|
expect(taskEvents.every((e) => e.subject.startsWith('signals.task'))).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('tail() filters by userId', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
b.publish('signals.tip.served', { userId: 'alice', tipId: 't1', policy: 'p', servedAt: '' });
|
||||||
|
b.publish('signals.tip.served', { userId: 'bob', tipId: 't2', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
const aliceEvents = b.tail({ userId: 'alice' });
|
||||||
|
expect(aliceEvents.every((e) => (e.payload as any).userId === 'alice')).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('tail() respects limit', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
for (let i = 0; i < 10; i++) {
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: `t${i}`, policy: 'p', servedAt: '' });
|
||||||
|
}
|
||||||
|
const events = b.tail({ limit: 3 });
|
||||||
|
expect(events).toHaveLength(3);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('tail() returns only events after `since` id', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't1', policy: 'p', servedAt: '' });
|
||||||
|
const snap = b.tail();
|
||||||
|
const lastId = snap[snap.length - 1].id;
|
||||||
|
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't2', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
const after = b.tail({ since: lastId });
|
||||||
|
expect(after).toHaveLength(1);
|
||||||
|
expect((after[0].payload as any).tipId).toBe('t2');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('assigns monotonically increasing ids', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't1', policy: 'p', servedAt: '' });
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: 't2', policy: 'p', servedAt: '' });
|
||||||
|
|
||||||
|
const events = b.tail();
|
||||||
|
const ids = events.map((e) => e.id);
|
||||||
|
for (let i = 1; i < ids.length; i++) {
|
||||||
|
expect(ids[i]).toBeGreaterThan(ids[i - 1]);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('ring buffer caps at 500 entries and evicts oldest', () => {
|
||||||
|
const b = makeBus();
|
||||||
|
// Publish 502 events — the first two should be evicted
|
||||||
|
for (let i = 0; i < 502; i++) {
|
||||||
|
b.publish('signals.tip.served', { userId: 'u', tipId: `t${i}`, policy: 'p', servedAt: '' });
|
||||||
|
}
|
||||||
|
const all = b.tail({ limit: 1000 });
|
||||||
|
expect(all).toHaveLength(500);
|
||||||
|
// Oldest surviving entry should be the 3rd published (index 2)
|
||||||
|
expect((all[0].payload as any).tipId).toBe('t2');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('EventBus — singleton bus export', () => {
|
||||||
|
it('singleton bus is a Bus instance', () => {
|
||||||
|
expect(bus).toBeInstanceOf(Bus);
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -22,8 +22,9 @@ export type TipServedEvent = {
|
|||||||
export type TipFeedbackEvent = {
|
export type TipFeedbackEvent = {
|
||||||
userId: string;
|
userId: string;
|
||||||
tipId: string;
|
tipId: string;
|
||||||
action: 'done' | 'dismiss' | 'snooze' | 'helpful' | 'not_helpful';
|
action: 'done' | 'dismiss' | 'snooze';
|
||||||
reward: number;
|
reward: number; // inferred from action + dwellMs (see inferReward in recommender.ts)
|
||||||
|
dwellMs: number | null;
|
||||||
createdAt: string;
|
createdAt: string;
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -91,4 +92,5 @@ class Bus extends EventEmitter {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export { Bus };
|
||||||
export const bus = new Bus();
|
export const bus = new Bus();
|
||||||
|
|||||||
109
services/api/src/middleware/__tests__/admin.test.ts
Normal file
109
services/api/src/middleware/__tests__/admin.test.ts
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
||||||
|
import type { Response, NextFunction } from 'express';
|
||||||
|
import type { AuthenticatedRequest } from '../session.js';
|
||||||
|
|
||||||
|
// Mock the db module so requireAdmin uses our test db
|
||||||
|
const mockSelect = vi.fn();
|
||||||
|
vi.mock('../../db/index.js', () => ({ db: { select: mockSelect } }));
|
||||||
|
|
||||||
|
// Import AFTER mock is set up
|
||||||
|
const { requireAdmin } = await import('../admin.js');
|
||||||
|
|
||||||
|
function makeRes() {
|
||||||
|
const json = vi.fn();
|
||||||
|
const status = vi.fn().mockReturnValue({ json });
|
||||||
|
return { status, json, _status: status, _json: json } as unknown as Response & {
|
||||||
|
_status: ReturnType<typeof vi.fn>;
|
||||||
|
_json: ReturnType<typeof vi.fn>;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function makeReq(userId?: string): AuthenticatedRequest {
|
||||||
|
return { userId } as AuthenticatedRequest;
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('requireAdmin middleware', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
vi.clearAllMocks();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('calls next() when user has role=admin', async () => {
|
||||||
|
mockSelect.mockReturnValue({
|
||||||
|
from: vi.fn().mockReturnValue({
|
||||||
|
where: vi.fn().mockReturnValue({
|
||||||
|
limit: vi.fn().mockResolvedValue([{ role: 'admin' }]),
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const next: NextFunction = vi.fn() as unknown as NextFunction;
|
||||||
|
await requireAdmin(makeReq('user-1'), makeRes(), next);
|
||||||
|
expect(next).toHaveBeenCalledOnce();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 403 when user has role=user', async () => {
|
||||||
|
mockSelect.mockReturnValue({
|
||||||
|
from: vi.fn().mockReturnValue({
|
||||||
|
where: vi.fn().mockReturnValue({
|
||||||
|
limit: vi.fn().mockResolvedValue([{ role: 'user' }]),
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const res = makeRes();
|
||||||
|
const next: NextFunction = vi.fn() as unknown as NextFunction;
|
||||||
|
await requireAdmin(makeReq('user-2'), res, next);
|
||||||
|
|
||||||
|
expect(res.status).toHaveBeenCalledWith(403);
|
||||||
|
expect(next).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 403 when user is not found', async () => {
|
||||||
|
mockSelect.mockReturnValue({
|
||||||
|
from: vi.fn().mockReturnValue({
|
||||||
|
where: vi.fn().mockReturnValue({
|
||||||
|
limit: vi.fn().mockResolvedValue([]),
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const res = makeRes();
|
||||||
|
const next: NextFunction = vi.fn() as unknown as NextFunction;
|
||||||
|
await requireAdmin(makeReq('unknown'), res, next);
|
||||||
|
|
||||||
|
expect(res.status).toHaveBeenCalledWith(403);
|
||||||
|
expect(next).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 403 when userId is undefined (unauthenticated request)', async () => {
|
||||||
|
// DB will return empty — userId is undefined so the query matches nothing
|
||||||
|
mockSelect.mockReturnValue({
|
||||||
|
from: vi.fn().mockReturnValue({
|
||||||
|
where: vi.fn().mockReturnValue({
|
||||||
|
limit: vi.fn().mockResolvedValue([]),
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const res = makeRes();
|
||||||
|
const next: NextFunction = vi.fn() as unknown as NextFunction;
|
||||||
|
await requireAdmin(makeReq(undefined), res, next);
|
||||||
|
|
||||||
|
expect(res.status).toHaveBeenCalledWith(403);
|
||||||
|
expect(next).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('propagates DB errors (does not swallow exceptions)', async () => {
|
||||||
|
mockSelect.mockReturnValue({
|
||||||
|
from: vi.fn().mockReturnValue({
|
||||||
|
where: vi.fn().mockReturnValue({
|
||||||
|
limit: vi.fn().mockRejectedValue(new Error('DB down')),
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
const res = makeRes();
|
||||||
|
const next: NextFunction = vi.fn() as unknown as NextFunction;
|
||||||
|
await expect(requireAdmin(makeReq('user-1'), res, next)).rejects.toThrow('DB down');
|
||||||
|
});
|
||||||
|
});
|
||||||
27
services/api/src/middleware/admin.ts
Normal file
27
services/api/src/middleware/admin.ts
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
import { Response, NextFunction } from 'express';
|
||||||
|
import { db } from '../db/index.js';
|
||||||
|
import { users } from '../db/schema.js';
|
||||||
|
import { eq } from 'drizzle-orm';
|
||||||
|
import { AuthenticatedRequest } from './session.js';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Requires the session user to have role='admin'.
|
||||||
|
* Must be used after requireAuth (which sets req.userId).
|
||||||
|
*/
|
||||||
|
export async function requireAdmin(
|
||||||
|
req: AuthenticatedRequest,
|
||||||
|
res: Response,
|
||||||
|
next: NextFunction,
|
||||||
|
) {
|
||||||
|
const [user] = await db
|
||||||
|
.select({ role: users.role })
|
||||||
|
.from(users)
|
||||||
|
.where(eq(users.id, req.userId!))
|
||||||
|
.limit(1);
|
||||||
|
|
||||||
|
if (!user || user.role !== 'admin') {
|
||||||
|
res.status(403).json({ error: 'Forbidden' });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
next();
|
||||||
|
}
|
||||||
370
services/api/src/routes/__tests__/admin.test.ts
Normal file
370
services/api/src/routes/__tests__/admin.test.ts
Normal file
@@ -0,0 +1,370 @@
|
|||||||
|
/**
|
||||||
|
* Admin route integration tests.
|
||||||
|
*
|
||||||
|
* A real Express app + in-memory SQLite DB per test suite.
|
||||||
|
* Auth and admin middleware are mocked so we can focus on route logic.
|
||||||
|
*/
|
||||||
|
import { describe, it, expect, vi, beforeAll } from 'vitest';
|
||||||
|
import express from 'express';
|
||||||
|
import * as http from 'http';
|
||||||
|
import { makeTestDb } from '../../test/db.js';
|
||||||
|
import { users, integrationTokens, tipViews, tipFeedback } from '../../db/schema.js';
|
||||||
|
|
||||||
|
// ---- in-memory DB ----
|
||||||
|
const testDb = makeTestDb();
|
||||||
|
|
||||||
|
vi.mock('../../db/index.js', () => ({ db: testDb }));
|
||||||
|
|
||||||
|
// Bypass auth — all requests arrive pre-authenticated as 'admin-1'
|
||||||
|
vi.mock('../../middleware/session.js', () => ({
|
||||||
|
sessionMiddleware: (_req: express.Request, _res: express.Response, next: express.NextFunction) =>
|
||||||
|
next(),
|
||||||
|
requireAuth: (req: express.Request, _res: express.Response, next: express.NextFunction) => {
|
||||||
|
(req as any).userId = 'admin-1';
|
||||||
|
next();
|
||||||
|
},
|
||||||
|
}));
|
||||||
|
|
||||||
|
vi.mock('../../middleware/admin.js', () => ({
|
||||||
|
requireAdmin: (_req: express.Request, _res: express.Response, next: express.NextFunction) =>
|
||||||
|
next(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
const { adminRouter } = await import('../admin.js');
|
||||||
|
|
||||||
|
// ---- seed ----
|
||||||
|
const NOW = new Date().toISOString();
|
||||||
|
const DAY_AGO = new Date(Date.now() - 23 * 60 * 60 * 1000).toISOString();
|
||||||
|
|
||||||
|
beforeAll(async () => {
|
||||||
|
await testDb.insert(users).values([
|
||||||
|
{ id: 'admin-1', email: 'admin@test.com', role: 'admin', consentGiven: true, consentAt: NOW, createdAt: NOW },
|
||||||
|
{ id: 'user-1', email: 'alice@test.com', role: 'user', consentGiven: true, consentAt: NOW, createdAt: NOW },
|
||||||
|
{ id: 'user-2', email: 'bob@test.com', role: 'user', consentGiven: false, createdAt: NOW },
|
||||||
|
]);
|
||||||
|
await testDb.insert(integrationTokens).values([
|
||||||
|
{ id: 'tok-1', userId: 'user-1', provider: 'todoist', accessToken: 'secret', connectedAt: NOW },
|
||||||
|
]);
|
||||||
|
await testDb.insert(tipViews).values([
|
||||||
|
{ id: 'tv-1', userId: 'user-1', tipId: 'tip:a', servedAt: DAY_AGO },
|
||||||
|
{ id: 'tv-2', userId: 'user-1', tipId: 'tip:b', servedAt: NOW },
|
||||||
|
{ id: 'tv-3', userId: 'user-2', tipId: 'tip:c', servedAt: NOW },
|
||||||
|
]);
|
||||||
|
await testDb.insert(tipFeedback).values([
|
||||||
|
{ id: 'tf-1', userId: 'user-1', tipId: 'tip:a', action: 'done', createdAt: DAY_AGO },
|
||||||
|
{ id: 'tf-2', userId: 'user-1', tipId: 'tip:b', action: 'snooze', createdAt: NOW },
|
||||||
|
]);
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---- test helpers ----
|
||||||
|
function buildApp() {
|
||||||
|
const app = express();
|
||||||
|
app.use(express.json());
|
||||||
|
app.use('/api/admin', adminRouter);
|
||||||
|
return app;
|
||||||
|
}
|
||||||
|
|
||||||
|
function call(
|
||||||
|
server: http.Server,
|
||||||
|
method: string,
|
||||||
|
path: string,
|
||||||
|
body?: unknown,
|
||||||
|
): Promise<{ status: number; body: unknown }> {
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
const { port } = server.address() as { port: number };
|
||||||
|
const req = http.request(
|
||||||
|
{ method, hostname: '127.0.0.1', port, path, headers: { 'Content-Type': 'application/json' } },
|
||||||
|
(res) => {
|
||||||
|
let data = '';
|
||||||
|
res.on('data', (c) => (data += c));
|
||||||
|
res.on('end', () => {
|
||||||
|
try { resolve({ status: res.statusCode!, body: JSON.parse(data) }); }
|
||||||
|
catch { resolve({ status: res.statusCode!, body: data }); }
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
req.on('error', reject);
|
||||||
|
if (body) req.write(JSON.stringify(body));
|
||||||
|
req.end();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function startServer(app: express.Application): Promise<{ server: http.Server; call: (method: string, path: string, body?: unknown) => ReturnType<typeof call> }> {
|
||||||
|
return new Promise((resolve) => {
|
||||||
|
const server = http.createServer(app);
|
||||||
|
server.listen(0, () =>
|
||||||
|
resolve({ server, call: (m, p, b) => call(server, m, p, b) }),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- tests ----
|
||||||
|
describe('GET /api/admin/stats', () => {
|
||||||
|
it('returns dau, wau, tips, reactions, user totals', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/stats');
|
||||||
|
const b = body as Record<string, unknown>;
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(typeof b.dau).toBe('number');
|
||||||
|
expect(typeof b.wau).toBe('number');
|
||||||
|
expect(b.tipsServedLast7d).toBeGreaterThanOrEqual(3);
|
||||||
|
expect(b.totalUsers).toBe(3);
|
||||||
|
expect(b.activatedUsers).toBeGreaterThanOrEqual(2);
|
||||||
|
expect(b.reactionsLast7d).toBeDefined();
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/users', () => {
|
||||||
|
it('returns paginated list with total', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/users?limit=10&offset=0');
|
||||||
|
const b = body as { users: unknown[]; total: number };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(b.total).toBe(3);
|
||||||
|
expect(b.users).toHaveLength(3);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('respects limit', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/users?limit=2&offset=0');
|
||||||
|
const b = body as { users: unknown[]; total: number };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(b.users).toHaveLength(2);
|
||||||
|
expect(b.total).toBe(3);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/users/:id', () => {
|
||||||
|
it('returns user detail with integrations and tip stats', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/users/user-1');
|
||||||
|
const b = body as {
|
||||||
|
user: { email: string; role: string };
|
||||||
|
integrations: { provider: string }[];
|
||||||
|
tipsServed: number;
|
||||||
|
recentFeedback: unknown[];
|
||||||
|
};
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(b.user.email).toBe('alice@test.com');
|
||||||
|
expect(b.user.role).toBe('user');
|
||||||
|
expect(b.integrations).toHaveLength(1);
|
||||||
|
expect(b.integrations[0].provider).toBe('todoist');
|
||||||
|
expect(b.tipsServed).toBe(2);
|
||||||
|
expect(b.recentFeedback).toHaveLength(2);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 404 for unknown user', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status } = await call('GET', '/api/admin/users/nonexistent');
|
||||||
|
expect(status).toBe(404);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/audit', () => {
|
||||||
|
it('returns list and total', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/audit');
|
||||||
|
const b = body as { actions: unknown[]; total: number };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(Array.isArray(b.actions)).toBe(true);
|
||||||
|
expect(typeof b.total).toBe('number');
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('POST /api/admin/users/:id/revoke-integration', () => {
|
||||||
|
it('removes the integration and writes an audit entry', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call(
|
||||||
|
'POST', '/api/admin/users/user-1/revoke-integration', { provider: 'todoist' },
|
||||||
|
);
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect((body as { ok: boolean }).ok).toBe(true);
|
||||||
|
|
||||||
|
// Integration should be gone
|
||||||
|
const detail = await call('GET', '/api/admin/users/user-1');
|
||||||
|
expect((detail.body as { integrations: unknown[] }).integrations).toHaveLength(0);
|
||||||
|
|
||||||
|
// Audit log should contain the action
|
||||||
|
const audit = await call('GET', '/api/admin/audit');
|
||||||
|
const actions = (audit.body as { actions: { action: string }[] }).actions;
|
||||||
|
expect(actions.some((x) => x.action === 'revoke_integration')).toBe(true);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 404 for non-existent integration', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status } = await call(
|
||||||
|
'POST', '/api/admin/users/user-2/revoke-integration', { provider: 'todoist' },
|
||||||
|
);
|
||||||
|
expect(status).toBe(404);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 400 when provider is missing', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status } = await call('POST', '/api/admin/users/user-1/revoke-integration', {});
|
||||||
|
expect(status).toBe(400);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns 404 when target user does not exist', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status } = await call(
|
||||||
|
'POST', '/api/admin/users/ghost/revoke-integration', { provider: 'todoist' },
|
||||||
|
);
|
||||||
|
expect(status).toBe(404);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/users — pagination', () => {
|
||||||
|
it('offset skips rows', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const page0 = await call('GET', '/api/admin/users?limit=2&offset=0');
|
||||||
|
const page1 = await call('GET', '/api/admin/users?limit=2&offset=2');
|
||||||
|
const b0 = page0.body as { users: { id: string }[]; total: number };
|
||||||
|
const b1 = page1.body as { users: { id: string }[]; total: number };
|
||||||
|
expect(b0.users).toHaveLength(2);
|
||||||
|
expect(b1.users).toHaveLength(1);
|
||||||
|
// total stays constant
|
||||||
|
expect(b0.total).toBe(3);
|
||||||
|
expect(b1.total).toBe(3);
|
||||||
|
// no overlap
|
||||||
|
const ids0 = b0.users.map((u) => u.id);
|
||||||
|
const ids1 = b1.users.map((u) => u.id);
|
||||||
|
expect(ids0.every((id) => !ids1.includes(id))).toBe(true);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('offset beyond total returns empty users array', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/users?limit=10&offset=999');
|
||||||
|
const b = body as { users: unknown[]; total: number };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(b.users).toHaveLength(0);
|
||||||
|
expect(b.total).toBe(3);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('limit is capped at 200', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
// Passing a huge limit should not crash and should return at most all users
|
||||||
|
const { status, body } = await call('GET', '/api/admin/users?limit=9999');
|
||||||
|
const b = body as { users: unknown[] };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(b.users.length).toBeLessThanOrEqual(200);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/events', () => {
|
||||||
|
it('returns events array and nextSince', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/events');
|
||||||
|
const b = body as { events: unknown[]; nextSince: number };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(Array.isArray(b.events)).toBe(true);
|
||||||
|
expect(typeof b.nextSince).toBe('number');
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/health', () => {
|
||||||
|
it('returns 200 with ok, services array, and checkedAt', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { status, body } = await call('GET', '/api/admin/health');
|
||||||
|
const b = body as { ok: boolean; services: { name: string; status: string }[]; checkedAt: string };
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(typeof b.ok).toBe('boolean');
|
||||||
|
expect(Array.isArray(b.services)).toBe(true);
|
||||||
|
expect(typeof b.checkedAt).toBe('string');
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/users/:id — edge cases', () => {
|
||||||
|
it('user with no integrations and no tips has empty arrays and 0 count', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
// user-2 has no integrations and no feedback seeded
|
||||||
|
const { status, body } = await call('GET', '/api/admin/users/user-2');
|
||||||
|
const b = body as {
|
||||||
|
user: { id: string };
|
||||||
|
integrations: unknown[];
|
||||||
|
tipsServed: number;
|
||||||
|
recentFeedback: unknown[];
|
||||||
|
};
|
||||||
|
expect(status).toBe(200);
|
||||||
|
expect(b.integrations).toHaveLength(0);
|
||||||
|
expect(b.recentFeedback).toHaveLength(0);
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('GET /api/admin/stats — field types', () => {
|
||||||
|
it('reactionsLast7d has correct action counts', async () => {
|
||||||
|
const { server, call } = await startServer(buildApp());
|
||||||
|
try {
|
||||||
|
const { body } = await call('GET', '/api/admin/stats');
|
||||||
|
const b = body as { reactionsLast7d: Record<string, number> };
|
||||||
|
// We seeded 'done' and 'snooze' feedback
|
||||||
|
expect(typeof b.reactionsLast7d['done']).toBe('number');
|
||||||
|
expect(typeof b.reactionsLast7d['snooze']).toBe('number');
|
||||||
|
} finally {
|
||||||
|
server.close();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -8,6 +8,8 @@ import {
|
|||||||
adminActions,
|
adminActions,
|
||||||
tipScores,
|
tipScores,
|
||||||
savedQueries,
|
savedQueries,
|
||||||
|
simRuns,
|
||||||
|
simEvents,
|
||||||
} from '../db/schema.js';
|
} from '../db/schema.js';
|
||||||
import { eq, desc, sql, gte, and, isNull, lt } from 'drizzle-orm';
|
import { eq, desc, sql, gte, and, isNull, lt } from 'drizzle-orm';
|
||||||
import { requireAuth, AuthenticatedRequest } from '../middleware/session.js';
|
import { requireAuth, AuthenticatedRequest } from '../middleware/session.js';
|
||||||
@@ -16,6 +18,16 @@ import { nanoid } from 'nanoid';
|
|||||||
import { bus } from '../events/bus.js';
|
import { bus } from '../events/bus.js';
|
||||||
import { config } from '../config.js';
|
import { config } from '../config.js';
|
||||||
import { getShadowPolicies, setPolicyActive } from './recommender.js';
|
import { getShadowPolicies, setPolicyActive } from './recommender.js';
|
||||||
|
import { spawn } from 'child_process';
|
||||||
|
import { existsSync, readFileSync, unlinkSync } from 'fs';
|
||||||
|
import { resolve, dirname } from 'path';
|
||||||
|
import { fileURLToPath } from 'url';
|
||||||
|
|
||||||
|
const __filename = fileURLToPath(import.meta.url);
|
||||||
|
const __dirname = dirname(__filename);
|
||||||
|
|
||||||
|
// In-memory tracker for running sim processes
|
||||||
|
const _simProcesses = new Map<string, { pid: number; startedAt: string }>();
|
||||||
|
|
||||||
const router: ExpressRouter = Router();
|
const router: ExpressRouter = Router();
|
||||||
router.use(requireAuth, requireAdmin);
|
router.use(requireAuth, requireAdmin);
|
||||||
@@ -606,4 +618,158 @@ router.delete('/saved-queries/:id', async (req: AuthenticatedRequest, res: Respo
|
|||||||
res.json({ ok: true });
|
res.json({ ok: true });
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// POST /api/admin/simulate/start
|
||||||
|
// Spawn ml/experiments/sim/runner.py in the background; return run_id.
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
router.post('/simulate/start', async (req: AuthenticatedRequest, res: Response) => {
|
||||||
|
const {
|
||||||
|
nUsers = 5,
|
||||||
|
nRounds = 20,
|
||||||
|
tasksPerRound = 8,
|
||||||
|
useLlm = false,
|
||||||
|
judgeMode = 'rule',
|
||||||
|
policies = ['linucb-v1', 'egreedy-v1'],
|
||||||
|
} = req.body as {
|
||||||
|
nUsers?: number;
|
||||||
|
nRounds?: number;
|
||||||
|
tasksPerRound?: number;
|
||||||
|
useLlm?: boolean;
|
||||||
|
judgeMode?: 'rule' | 'llm' | 'claude-code';
|
||||||
|
policies?: string[];
|
||||||
|
};
|
||||||
|
|
||||||
|
if (policies.length < 2) {
|
||||||
|
res.status(400).json({ error: 'At least two policies required' });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const id = nanoid();
|
||||||
|
const now = new Date().toISOString();
|
||||||
|
|
||||||
|
await db.insert(simRuns).values({
|
||||||
|
id,
|
||||||
|
policyA: policies[0],
|
||||||
|
policyB: policies[1],
|
||||||
|
nUsers,
|
||||||
|
nRounds,
|
||||||
|
tasksPerRound,
|
||||||
|
useLlm,
|
||||||
|
status: 'running',
|
||||||
|
createdAt: now,
|
||||||
|
});
|
||||||
|
|
||||||
|
const runnerPath = resolve(__dirname, '../../../../ml/experiments/sim/runner.py');
|
||||||
|
const venvPython = resolve(__dirname, '../../../../ml/serving/.venv/bin/python');
|
||||||
|
const pythonBin = existsSync(venvPython) ? venvPython : 'python3';
|
||||||
|
const outPath = `/tmp/oo-sim-${id}.json`;
|
||||||
|
|
||||||
|
const args = [
|
||||||
|
runnerPath,
|
||||||
|
'--n-users', String(nUsers),
|
||||||
|
'--n-rounds', String(nRounds),
|
||||||
|
'--tasks-per-round', String(tasksPerRound),
|
||||||
|
'--ml-url', config.ML_SERVING_URL,
|
||||||
|
'--policies', ...policies,
|
||||||
|
'--out', outPath,
|
||||||
|
'--judge', judgeMode === 'llm' ? 'llm' : judgeMode === 'claude-code' ? 'rule' : 'rule',
|
||||||
|
// claude-code mode isn't auto-runnable from the API (requires human in the loop)
|
||||||
|
// it falls back to rule judge when triggered from the panel
|
||||||
|
];
|
||||||
|
|
||||||
|
const child = spawn(pythonBin, args, { stdio: ['ignore', 'pipe', 'pipe'] });
|
||||||
|
|
||||||
|
if (child.pid) {
|
||||||
|
_simProcesses.set(id, { pid: child.pid, startedAt: now });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Capture stderr for debugging
|
||||||
|
const stderrLines: string[] = [];
|
||||||
|
child.stderr?.on('data', (d: Buffer) => stderrLines.push(d.toString()));
|
||||||
|
|
||||||
|
child.on('exit', async (code) => {
|
||||||
|
_simProcesses.delete(id);
|
||||||
|
const finishedAt = new Date().toISOString();
|
||||||
|
|
||||||
|
if (code === 0 && existsSync(outPath)) {
|
||||||
|
try {
|
||||||
|
const raw = JSON.parse(readFileSync(outPath, 'utf-8'));
|
||||||
|
|
||||||
|
// Bulk-insert sim events
|
||||||
|
const eventRows = (raw.events ?? []).map((ev: Record<string, unknown>) => ({
|
||||||
|
id: nanoid(),
|
||||||
|
runId: id,
|
||||||
|
round: Number(ev.round),
|
||||||
|
userId: String(ev.user_id),
|
||||||
|
persona: String(ev.persona),
|
||||||
|
policy: String(ev.policy),
|
||||||
|
tipContent: String(ev.tip_content),
|
||||||
|
priority: Number(ev.priority),
|
||||||
|
isOverdue: Boolean(ev.is_overdue),
|
||||||
|
action: String(ev.action),
|
||||||
|
dwellMs: ev.dwell_ms != null ? Number(ev.dwell_ms) : null,
|
||||||
|
rewardMilli: Math.round(Number(ev.reward) * 1000),
|
||||||
|
hour: Number(ev.hour),
|
||||||
|
dayOfWeek: Number(ev.day_of_week),
|
||||||
|
createdAt: now,
|
||||||
|
}));
|
||||||
|
|
||||||
|
for (const row of eventRows) {
|
||||||
|
await db.insert(simEvents).values(row).catch(() => {});
|
||||||
|
}
|
||||||
|
|
||||||
|
await db.update(simRuns).set({
|
||||||
|
status: 'done',
|
||||||
|
summaryJson: JSON.stringify(raw.summary),
|
||||||
|
winner: raw.winner,
|
||||||
|
personaBreakdownJson: JSON.stringify(raw.persona_breakdown),
|
||||||
|
finishedAt,
|
||||||
|
}).where(eq(simRuns.id, id));
|
||||||
|
|
||||||
|
try { unlinkSync(outPath); } catch { /* ignore */ }
|
||||||
|
} catch (e) {
|
||||||
|
await db.update(simRuns).set({ status: 'failed', finishedAt }).where(eq(simRuns.id, id));
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
await db.update(simRuns).set({ status: 'failed', finishedAt }).where(eq(simRuns.id, id));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
res.json({ id, status: 'running' });
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// GET /api/admin/simulate/runs
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
router.get('/simulate/runs', async (_req: AuthenticatedRequest, res: Response) => {
|
||||||
|
const rows = await db
|
||||||
|
.select()
|
||||||
|
.from(simRuns)
|
||||||
|
.orderBy(desc(simRuns.createdAt))
|
||||||
|
.limit(50);
|
||||||
|
res.json({ runs: rows });
|
||||||
|
});
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// GET /api/admin/simulate/:id
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
router.get('/simulate/:id', async (req: AuthenticatedRequest, res: Response) => {
|
||||||
|
const { id } = req.params as { id: string };
|
||||||
|
const [run] = await db.select().from(simRuns).where(eq(simRuns.id, id)).limit(1);
|
||||||
|
if (!run) {
|
||||||
|
res.status(404).json({ error: 'Run not found' });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const events = await db
|
||||||
|
.select()
|
||||||
|
.from(simEvents)
|
||||||
|
.where(eq(simEvents.runId, id))
|
||||||
|
.orderBy(simEvents.round)
|
||||||
|
.limit(5000);
|
||||||
|
|
||||||
|
const isRunning = _simProcesses.has(id);
|
||||||
|
res.json({ run: { ...run, isRunning }, events });
|
||||||
|
});
|
||||||
|
|
||||||
export { router as adminRouter };
|
export { router as adminRouter };
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ import { type Router as ExpressRouter, Router, Response } from 'express';
|
|||||||
import { nanoid } from 'nanoid';
|
import { nanoid } from 'nanoid';
|
||||||
import { db } from '../db/index.js';
|
import { db } from '../db/index.js';
|
||||||
import { integrationTokens, tipFeedback, tipViews, tipScores } from '../db/schema.js';
|
import { integrationTokens, tipFeedback, tipViews, tipScores } from '../db/schema.js';
|
||||||
import { eq, and } from 'drizzle-orm';
|
import { eq, and, desc } from 'drizzle-orm';
|
||||||
import { requireAuth, AuthenticatedRequest } from '../middleware/session.js';
|
import { requireAuth, AuthenticatedRequest } from '../middleware/session.js';
|
||||||
import { config } from '../config.js';
|
import { config } from '../config.js';
|
||||||
import { bus } from '../events/bus.js';
|
import { bus } from '../events/bus.js';
|
||||||
@@ -105,7 +105,7 @@ async function fetchTodoistTasks(userId: string, accessToken: string): Promise<C
|
|||||||
async function remotePolicy(
|
async function remotePolicy(
|
||||||
userId: string,
|
userId: string,
|
||||||
tasks: CachedTask[],
|
tasks: CachedTask[],
|
||||||
): Promise<{ tipId: string; score: number } | null> {
|
): Promise<{ tipId: string; score: number; policy: string } | null> {
|
||||||
const hour = new Date().getHours();
|
const hour = new Date().getHours();
|
||||||
const dayOfWeek = new Date().getDay();
|
const dayOfWeek = new Date().getDay();
|
||||||
|
|
||||||
@@ -121,8 +121,9 @@ async function remotePolicy(
|
|||||||
context: { hour_of_day: hour, day_of_week: dayOfWeek },
|
context: { hour_of_day: hour, day_of_week: dayOfWeek },
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// Active policy: egreedy-v1 (selected over linucb-v1 after offline sim — ADR-0007)
|
||||||
try {
|
try {
|
||||||
const res = await fetch(`${config.ML_SERVING_URL}/score`, {
|
const res = await fetch(`${config.ML_SERVING_URL}/score/egreedy`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json' },
|
headers: { 'Content-Type': 'application/json' },
|
||||||
body: JSON.stringify(body),
|
body: JSON.stringify(body),
|
||||||
@@ -130,7 +131,7 @@ async function remotePolicy(
|
|||||||
});
|
});
|
||||||
if (!res.ok) return null;
|
if (!res.ok) return null;
|
||||||
const data = (await res.json()) as { tip_id: string; score: number };
|
const data = (await res.json()) as { tip_id: string; score: number };
|
||||||
return { tipId: data.tip_id, score: data.score };
|
return { tipId: data.tip_id, score: data.score, policy: 'egreedy-v1' };
|
||||||
} catch {
|
} catch {
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
@@ -178,7 +179,7 @@ router.post('/recommend', requireAuth, async (req: AuthenticatedRequest, res: Re
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
const policy = scored ? 'linucb-v1' : 'random';
|
const policy = scored ? scored.policy : 'random';
|
||||||
const servedAt = new Date().toISOString();
|
const servedAt = new Date().toISOString();
|
||||||
|
|
||||||
await db.insert(tipViews).values({ id: nanoid(), userId: req.userId!, tipId: tip.id, servedAt });
|
await db.insert(tipViews).values({ id: nanoid(), userId: req.userId!, tipId: tip.id, servedAt });
|
||||||
@@ -226,55 +227,85 @@ router.post('/recommend', requireAuth, async (req: AuthenticatedRequest, res: Re
|
|||||||
res.json({ tip });
|
res.json({ tip });
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// Reward inference from action + dwell time
|
||||||
|
//
|
||||||
|
// Feedback is now 3 signals only: done / snooze / dismiss.
|
||||||
|
// "Helpfulness" is inferred from how long the user took to act on a tip:
|
||||||
|
// dismiss → -1.0 (clear rejection)
|
||||||
|
// snooze → +0.1 (tip noticed, timing off — mild positive)
|
||||||
|
// done < 15 s → -0.3 (almost certainly a stale task, not magic)
|
||||||
|
// done 15 s – 2 min → +1.0 (magic zone: user saw tip and acted)
|
||||||
|
// done 2 – 10 min → +0.6 (good: user engaged, acted in same session)
|
||||||
|
// done > 10 min → +0.3 (eventually done; tip may have helped, unclear)
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
function inferReward(action: string, dwellMs: number | null): number {
|
||||||
|
if (action === 'dismiss') return -1.0;
|
||||||
|
if (action === 'snooze') return 0.1;
|
||||||
|
// done — use dwell time
|
||||||
|
if (dwellMs === null || dwellMs < 0) return 0.5; // unknown dwell: neutral positive
|
||||||
|
if (dwellMs < 15_000) return -0.3; // stale / reflex
|
||||||
|
if (dwellMs < 120_000) return 1.0; // magic zone
|
||||||
|
if (dwellMs < 600_000) return 0.6; // good
|
||||||
|
return 0.3; // eventually
|
||||||
|
}
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// POST /api/tip/:id/feedback
|
// POST /api/tip/:id/feedback
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
router.post('/tip/:id/feedback', requireAuth, async (req: AuthenticatedRequest, res: Response) => {
|
router.post('/tip/:id/feedback', requireAuth, async (req: AuthenticatedRequest, res: Response) => {
|
||||||
const { action } = req.body as { action: string };
|
const { action } = req.body as { action: string };
|
||||||
const tipId = String(req.params.id);
|
const tipId = String(req.params.id);
|
||||||
|
const now = new Date();
|
||||||
|
|
||||||
const validActions = ['done', 'dismiss', 'snooze', 'helpful', 'not_helpful'];
|
const validActions = ['done', 'dismiss', 'snooze'];
|
||||||
if (!validActions.includes(action)) {
|
if (!validActions.includes(action)) {
|
||||||
res.status(400).json({ error: 'Invalid action' });
|
res.status(400).json({ error: 'Invalid action' });
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Compute dwell time from the most recent tipViews record for this user+tip
|
||||||
|
let dwellMs: number | null = null;
|
||||||
|
const [lastView] = await db
|
||||||
|
.select({ servedAt: tipViews.servedAt })
|
||||||
|
.from(tipViews)
|
||||||
|
.where(and(eq(tipViews.userId, req.userId!), eq(tipViews.tipId, tipId)))
|
||||||
|
.orderBy(desc(tipViews.servedAt))
|
||||||
|
.limit(1);
|
||||||
|
|
||||||
|
if (lastView?.servedAt) {
|
||||||
|
dwellMs = now.getTime() - new Date(lastView.servedAt).getTime();
|
||||||
|
}
|
||||||
|
|
||||||
|
const reward = inferReward(action, dwellMs);
|
||||||
|
|
||||||
await db.insert(tipFeedback).values({
|
await db.insert(tipFeedback).values({
|
||||||
id: nanoid(),
|
id: nanoid(),
|
||||||
userId: req.userId!,
|
userId: req.userId!,
|
||||||
tipId,
|
tipId,
|
||||||
action,
|
action,
|
||||||
sourceId: tipId.startsWith('todoist:') ? tipId.slice(8) : null,
|
sourceId: tipId.startsWith('todoist:') ? tipId.slice(8) : null,
|
||||||
createdAt: new Date().toISOString(),
|
dwellMs: dwellMs !== null ? Math.round(dwellMs) : null,
|
||||||
|
rewardMilli: Math.round(reward * 1000),
|
||||||
|
createdAt: now.toISOString(),
|
||||||
});
|
});
|
||||||
|
|
||||||
// Map action to reward (helpful/not_helpful supplement behavioural signals)
|
|
||||||
const rewardMap: Record<string, number> = {
|
|
||||||
done: 1.0,
|
|
||||||
helpful: 0.5,
|
|
||||||
snooze: 0.0,
|
|
||||||
not_helpful: -0.5,
|
|
||||||
dismiss: -1.0,
|
|
||||||
};
|
|
||||||
const reward = rewardMap[action] ?? 0.0;
|
|
||||||
|
|
||||||
const task = taskCache.get(req.userId!)?.tasks.find((t) => t.id === tipId);
|
const task = taskCache.get(req.userId!)?.tasks.find((t) => t.id === tipId);
|
||||||
|
|
||||||
// Clear cache on behavioural actions (not on explicit helpful/not_helpful)
|
taskCache.delete(req.userId!);
|
||||||
if (['done', 'dismiss', 'snooze'].includes(action)) {
|
|
||||||
taskCache.delete(req.userId!);
|
|
||||||
}
|
|
||||||
|
|
||||||
bus.publish('signals.tip.feedback', {
|
bus.publish('signals.tip.feedback', {
|
||||||
userId: req.userId!,
|
userId: req.userId!,
|
||||||
tipId,
|
tipId,
|
||||||
action: action as 'done' | 'dismiss' | 'snooze' | 'helpful' | 'not_helpful',
|
action: action as 'done' | 'dismiss' | 'snooze',
|
||||||
reward,
|
reward,
|
||||||
createdAt: new Date().toISOString(),
|
dwellMs,
|
||||||
|
createdAt: now.toISOString(),
|
||||||
});
|
});
|
||||||
|
|
||||||
if (task) {
|
if (task) {
|
||||||
fetch(`${config.ML_SERVING_URL}/reward`, {
|
// Send reward to egreedy-v1 (active policy — ADR-0007)
|
||||||
|
fetch(`${config.ML_SERVING_URL}/reward/egreedy`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json' },
|
headers: { 'Content-Type': 'application/json' },
|
||||||
body: JSON.stringify({
|
body: JSON.stringify({
|
||||||
@@ -282,6 +313,7 @@ router.post('/tip/:id/feedback', requireAuth, async (req: AuthenticatedRequest,
|
|||||||
tip_id: tipId,
|
tip_id: tipId,
|
||||||
reward,
|
reward,
|
||||||
features: task.features,
|
features: task.features,
|
||||||
|
day_of_week: new Date().getDay(),
|
||||||
}),
|
}),
|
||||||
}).catch(() => {});
|
}).catch(() => {});
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -18,6 +18,7 @@ router.get('/me', requireAuth, async (req: AuthenticatedRequest, res: Response)
|
|||||||
email: user.email,
|
email: user.email,
|
||||||
name: user.name,
|
name: user.name,
|
||||||
image: user.image,
|
image: user.image,
|
||||||
|
role: user.role,
|
||||||
createdAt: user.createdAt,
|
createdAt: user.createdAt,
|
||||||
consentGiven: user.consentGiven,
|
consentGiven: user.consentGiven,
|
||||||
});
|
});
|
||||||
|
|||||||
84
services/api/src/test/db.ts
Normal file
84
services/api/src/test/db.ts
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
/**
|
||||||
|
* Creates an isolated in-memory SQLite DB with the full schema applied.
|
||||||
|
* Use this in tests instead of the shared `db` singleton.
|
||||||
|
*/
|
||||||
|
import Database from 'better-sqlite3';
|
||||||
|
import { drizzle } from 'drizzle-orm/better-sqlite3';
|
||||||
|
import * as schema from '../db/schema.js';
|
||||||
|
|
||||||
|
export function makeTestDb() {
|
||||||
|
const sqlite = new Database(':memory:');
|
||||||
|
sqlite.pragma('foreign_keys = ON');
|
||||||
|
|
||||||
|
sqlite.exec(`
|
||||||
|
CREATE TABLE IF NOT EXISTS users (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
email TEXT NOT NULL UNIQUE,
|
||||||
|
name TEXT,
|
||||||
|
image TEXT,
|
||||||
|
google_id TEXT UNIQUE,
|
||||||
|
role TEXT NOT NULL DEFAULT 'user',
|
||||||
|
consent_given INTEGER NOT NULL DEFAULT 0,
|
||||||
|
consent_at TEXT,
|
||||||
|
created_at TEXT NOT NULL,
|
||||||
|
deleted_at TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS integration_tokens (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
user_id TEXT NOT NULL REFERENCES users(id),
|
||||||
|
provider TEXT NOT NULL,
|
||||||
|
access_token TEXT NOT NULL,
|
||||||
|
refresh_token TEXT,
|
||||||
|
expires_at TEXT,
|
||||||
|
connected_at TEXT NOT NULL,
|
||||||
|
UNIQUE(user_id, provider)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS tip_feedback (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
user_id TEXT NOT NULL REFERENCES users(id),
|
||||||
|
tip_id TEXT NOT NULL,
|
||||||
|
action TEXT NOT NULL,
|
||||||
|
source_id TEXT,
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS tip_views (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
user_id TEXT NOT NULL REFERENCES users(id),
|
||||||
|
tip_id TEXT NOT NULL,
|
||||||
|
served_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS push_subscriptions (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
user_id TEXT NOT NULL REFERENCES users(id),
|
||||||
|
endpoint TEXT NOT NULL UNIQUE,
|
||||||
|
p256dh TEXT NOT NULL,
|
||||||
|
auth TEXT NOT NULL,
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS sessions (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
user_id TEXT NOT NULL REFERENCES users(id),
|
||||||
|
expires_at TEXT NOT NULL,
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS admin_actions (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
admin_id TEXT NOT NULL REFERENCES users(id),
|
||||||
|
action TEXT NOT NULL,
|
||||||
|
target_type TEXT,
|
||||||
|
target_id TEXT,
|
||||||
|
detail TEXT,
|
||||||
|
created_at TEXT NOT NULL
|
||||||
|
);
|
||||||
|
`);
|
||||||
|
|
||||||
|
return drizzle(sqlite, { schema });
|
||||||
|
}
|
||||||
|
|
||||||
|
export type TestDb = ReturnType<typeof makeTestDb>;
|
||||||
13
services/api/vitest.config.ts
Normal file
13
services/api/vitest.config.ts
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
import { defineConfig } from 'vitest/config';
|
||||||
|
|
||||||
|
export default defineConfig({
|
||||||
|
test: {
|
||||||
|
globals: true,
|
||||||
|
environment: 'node',
|
||||||
|
coverage: {
|
||||||
|
provider: 'v8',
|
||||||
|
reporter: ['text', 'lcov'],
|
||||||
|
include: ['src/**'],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
@@ -18,6 +18,10 @@
|
|||||||
"test": {
|
"test": {
|
||||||
"dependsOn": ["^build"]
|
"dependsOn": ["^build"]
|
||||||
},
|
},
|
||||||
|
"test:e2e": {
|
||||||
|
"dependsOn": ["build"],
|
||||||
|
"cache": false
|
||||||
|
},
|
||||||
"clean": {
|
"clean": {
|
||||||
"cache": false
|
"cache": false
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user