- Remove unused `httpx` import from bench.ts (package does not exist)
- Add explicit `IRouter` type on `router` in agent-outputs.ts and bench.ts
to resolve TS2742 portable-type errors
- Remove `docs` from .dockerignore so Dockerfile.admin can copy it into
the runner image (DOCS_ROOT=/app/docs is read at runtime by the admin)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Deletes shadowPolicies map, getShadowPolicies, setPolicyActive from
recommender.ts; removes /api/admin/policies routes from admin.ts; removes
getPolicies, togglePolicy, PolicyInfo from admin api.ts; removes the
policy toggle section from the ops page.
168 API tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST /recommend now calls ml/serving /recommend with pre-computed agent
snippets + task context instead of /generate + /score/egreedy/v2. Falls
back to a random signal candidate when ml/serving is unavailable.
Removes: remotePolicy, fetchLlmCandidates, sendRewardWithRetry,
candidateCache, pickPromptVersion. Feedback handler keeps inferReward +
tipFeedback writes for observability; reward delivery to the bandit is gone.
tipScores.policy is now 'orchestrator'; promptVersion is 'v4-orchestrator'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts computeAndStore() from the /agents/:agentId/compute route so it
can be called without an HTTP round-trip. startAgentPrecomputeScheduler()
runs every 15 min: fetches active users (tip view in 48h), runs all agents
in parallel per user, then purges outputs expired >24h. Agent IDs are
resolved from ml/serving /health at startup with a fallback hardcoded list.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum,
time_of_day, recent_patterns, focus_area) each producing a prompt snippet
from user signals. A registry wires them up; the orchestrator prompt in
ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM.
Also wires /api/agents route in the API and updates the Dockerfile to copy
the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks:
1. collect.py — generates candidates, logs to MLflow
2. export_for_judge — exports pending runs for Claude Code scoring
3. compare — generates leaderboard by (model, prompt) cell
Config via dag_run.conf supports all collect.py options (models, prompts,
n_tips, n_scenarios, temperature, experiment name, max_model_b).
New admin API endpoints (`services/api/src/routes/bench.ts`):
- GET /api/bench/experiments — list tip-bench-* experiments
- POST /api/bench/run — trigger DAG with custom config
- GET /api/bench/runs/:experiment — list runs in experiment
- GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt)
All endpoints require admin auth. Human judge (Claude Code) scores are
applied manually post-export; future enhancement: add webhook to DAG.
Admin UI can now trigger and monitor benchmarks from a dashboard panel.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add POST /api/auth/token — validates ADMIN_TOKEN env var, creates a 24h
session and sets the sid cookie so automated tools can access the admin
panel without Google OAuth. Admin login page gains a token input form.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TS: pino + pino-http; every HTTP request log includes traceId from
W3C traceparent header (generated if absent); forwarded to ml/serving
on all /score, /generate, /reward, and /api/ml proxy calls
- Python: structlog JSON; FastAPI middleware binds trace_id via
contextvars so every log line within a request carries it
- Sentry: optional SENTRY_DSN init in both runtimes (no-op if unset)
- Replace all console.* calls across services/api with pino logger
- Update tests to spy on logger instead of console
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
integrations/README — replace stale Connector interface and fictional
libsodium vault with the actual SignalSource pattern, SQLite token table,
and real OAuth routes.
recommender/README — document the SignalAggregator pipeline, current
policy registry, and actual /recommend + /feedback contract shapes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TaskSyncedPayload in shared-types and ml/serving schemas both require
source, but TaskSyncedEvent in bus.ts and the todoist publish call both
omitted it — causing the JetStream consumer to nak every task.synced
message on validation failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ship the scaffolding for #99 (phase B.3 of #81):
- ml/serving: add /score/egreedy/v2, /reward/egreedy/v2, /stats/egreedy/v2
endpoints (D=12). New feature dims: completion/dismiss rates, mean dwell
(clipped 10min), preferred-hour alignment (cosine, 1-dim), tip volume (log).
Separate state file per user (_egreedy_v2.json). /reset clears v2 state too.
- ADR-0012: documents D=7→12 dimension change, normalization choices, shadow
rollout protocol, and promotion gate (offline sim win per ADR-0002).
- recommender.ts: register egreedy-v2-shadow in shadow-policy map (disabled by
default). When enabled, calls /score/egreedy/v2 fire-and-forget and publishes
shadow:egreedy-v2-shadow serve signal. No reward to shadow — sim is the gate.
- sim runner/personas: personas carry synthetic profile_features per persona;
_call_score/_call_reward thread profile_features through (None-safe for v1/linucb).
- 18 new Python tests; all 56 Python + 170 TS tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Features now declare invalidatedBy subjects in the registry; the new
profile/subscriber.ts subscribes to each unique subject and drops
matching stored rows for the userId in the payload. Next getProfile
call recomputes from current data instead of waiting up to ttlSec.
Wiring:
completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d,
preferred_hour ← signals.tip.feedback
tip_volume_30d ← signals.tip.served
TTL stays as a safety net for clock drift and dropped events.
Registration validates each declared subject against KNOWN_SUBJECTS
(mirror of EventMap) so typos throw at startup, not silently.
ADR-0011 updated.
Refs #81.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a per-feature freshness summary to /admin/data-quality so the admin
can spot features that are systematically stale or never computed:
totalEligible — distinct users with tip_views in the last 30 days
missing — eligible users with no row stored for the feature
stale — eligible users whose stored row is past its TTL
Backend exposes summarizeProfileFreshness() in profile/builder.ts; one
query per feature joins eligible users LEFT JOIN profile rows.
Coverage = (eligible − missing − stale) / eligible, colored
green/yellow/red via the new PctGood helper (high-is-good, opposite of
the existing Pct used for missing-feature/stale-token rates).
Refs #81.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surfaces phase A's profile features in /admin/users/:id so we can verify
they're actually computing useful values before investing in bandit
consumption. The detail GET now includes profile rows joined with registry
metadata (name, value, age, fresh badge, ttlSec, description). Read does
NOT trigger compute — staleness must be visible. A new POST
.../profile/rebuild button force-recomputes and is audit-logged like
reset-bandit.
Refs #81.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Centralizes user-level features (completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that
owns both definition and SQL aggregation, since the data lives in the
TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps
recommend latency bounded; values persist in user_profile_features (KV).
ml/serving accepts profile_features on /score + /generate but does not
yet consume them — extending the bandit feature vector changes D and
resets every user's learned state, so that's a deliberate phase-B step.
Includes ml/features/profile_schema.py as a contract mirror with a sync
test that diffs name sets against registry.ts.
ADR-0011 records the data-locality reasoning (registry in TS, not Python
as the issue originally suggested).
Phase B (deferred): event-driven incremental updates, bandit consumption
with state migration, admin per-user profile page, staleness alerts.
Refs #81.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the hardcoded "v1" label with a real prompt registry:
ml/serving/prompts.py — keyed by version: v1 (baseline),
v2-mentor (calm/specific persona),
v3-few-shot (v1 persona + curated examples)
ml/serving/main.py — POST /generate accepts optional prompt_version,
422 on unknown, echoes the version actually used
back in the response
services/api/src/config.ts — TIP_PROMPT_VERSION: empty / single / comma-list
(uniform random per request)
services/api/src/routes/recommender.ts
— pickPromptVersion() drives selection; the
response's prompt_version (not a stale TS
constant) is what lands in tip_scores so the
#92 reward-analytics dashboard shows real
per-variant reaction rates
Closes#84.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
/admin/reward-analytics now surfaces served count, reaction rate, and avg
reward grouped by llm_model, prompt_version, and tip_kind — closing the
loop so model/prompt iterations in M2 are legible next to the bandit
policy view. Data comes from the tip_scores columns added in ffdf707 and
tip_feedback.reward_milli; bandit-only tips show as "(bandit-only)".
Closes#92.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Issue 21 — event infrastructure:
- NormalizedEvent<T> + payload types in packages/shared-types/src/events/
- Bus.onPublish() hook for side-effect bridges
- NATS JetStream adapter (services/api/src/events/nats.ts): connects when
NATS_URL is set, creates signals.> and feedback.> streams, bridges all
in-process bus publishes to JetStream — no-ops gracefully when NATS is absent
- NATS service added to docker-compose (profile: events|full, port 4222/8222)
Issue 22 — Todoist background sync:
- services/api/src/signals/scheduler.ts: queries all active-token users every
15 min (TODOIST_SYNC_INTERVAL_MS), fan-out via todoistSource.fetchSignals()
which emits signals.task.synced; on-demand fetch remains as freshness fallback
- NATS_URL + TODOIST_SYNC_INTERVAL_MS added to config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Signal + SignalSource interfaces to packages/shared-types
- TipCandidate.features widened to Record<string,number|boolean> to match Signal
- TodoistSignalSource: encapsulates fetch, cache, 401 handling, bus events, and act()
- SignalAggregator: parallel fan-out across sources with per-source failure isolation
- Recommender refactored to consume Signal[] via aggregator; source action dispatch via aggregator.act()
- ADR-0009: signal normalization strategy
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ML serving:
- LinUCB contextual bandit (disjoint, d=5 features: hour_sin/cos, is_overdue, task_age, priority)
- /score endpoint replaces stub random; /reward endpoint for online learning
- Per-user model state persisted to disk as JSON (survives restarts)
- venv at ml/serving/.venv; start with pnpm dev from ml/serving
Recommender:
- Todoist fetch now extracts features (is_overdue, task_age_days, priority)
- RemotePolicy calls ml/serving with 3s timeout; falls back to RandomPolicy
- Reward sent to /reward on feedback (done=+1, snooze=0, dismiss=-1)
Web Push:
- VAPID keys in config; push_subscriptions table in DB
- POST/DELETE /api/push/subscribe; GET /api/push/vapid-public-key
- Service worker (public/sw.js): push → showNotification, notificationclick → focus/open
- "notify me" button on tip page; registers SW + subscribes on permission grant
Event bus:
- services/api/src/events/bus.ts: typed EventEmitter wrapper
- Subjects: signals.tip.served, signals.tip.feedback, signals.task.synced
- Same publish/subscribe API NATS JetStream will implement — swap is mechanical
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- /legal/terms and /legal/privacy pages (linked from sign-in)
- Consent (consentGiven=true) recorded on first Google sign-in
- tip_views table: one row per tip served — enables activation + reaction rate queries
- tip_views purged on account deletion
- Delete account button on /connect (confirm → revoke tokens → purge data → sign out)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers
- ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships
- ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate
- New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature)
- Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage
- Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI
- PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint
- License decision in README (ARR with OSS plan in Phase 5)