Commit Graph

80 Commits

Author SHA1 Message Date
afacc34969 fix(agents): instruct orchestrator to output tip in English
Small models (qwen2.5:1.5b) mirror the language of task title content
in the prompt. Adding an explicit English note to snippets that embed
raw task titles (focus-area, overdue-task) prevents language bleed.
Also added the instruction to the orchestrator system prompt and user
message as belt-and-suspenders.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 11:53:21 +00:00
c124ff4d24 docs: update CLAUDE.md with session learnings (#118 tracing, compose gotchas)
- Clarify compose profile requirement for build/up (silent no-op without --profile)
- Add --force-recreate pattern for env-var-only changes
- Document MLflow host_header and auth gotchas for container-to-container calls
- Record MLflow tracing addition and #118 M4 tracking issue

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:41:57 +00:00
95e1b342b4 fix(serving): wire MLflow auth and Host header for container-to-container calls
- Pass MLFLOW_ADMIN_PASSWORD as fallback password credential
- Set host_header='localhost' to satisfy MLflow's --allowed-hosts check
  (MLflow rejects Host: mlflow but accepts Host: localhost)
- Default MLFLOW_TRACKING_URI to http://mlflow:5000 in compose so the
  env_file value is not silently overridden to empty

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:39:08 +00:00
c43dbaf23d feat(serving): add MLflow tracing to ml-serving for all agent calls
Logs one MLflow run per /recommend (params, token metrics, latency,
full prompt + tip as artifacts) and per /agents/{id}/compute and
/infer call (signals snapshot, inferred prefs, latency).

Tracing is a no-op when MLFLOW_TRACKING_URI is unset; ml-serving
starts and serves tips correctly without MLflow configured.

Refs #118 (M4: remove from production / move off critical path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 10:30:24 +00:00
488a764519 docs: mark M2 complete in README
All M2 items shipped: ADR-0014 (unified profile + inference framework),
per-agent auto-inference, tip generator, TipCandidate schema, prompt
versioning, model benchmark, task clustering, UX refinements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 08:02:44 +00:00
c67f2b14c4 docs: update CLAUDE.md with #61 completion and feature test patterns
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:45:40 +00:00
17b9516903 feat(features): mirror invalidatedBy into Python ProfileFeature (#61)
Adds invalidated_by: tuple[str, ...] to ProfileFeature, mirroring the
invalidatedBy bus subjects from registry.ts. Adds a test that parses the
TS source and asserts Python stays in sync — same drift-detection pattern
used for names and ttlSec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:10:36 +00:00
a75be0d832 docs: update CLAUDE.md with session learnings (#97, #113)
- focus-area v2.0.0 completion in recent completions; remove from active work
- Update focus-area inferred params table row
- min_history gotcha: checked against events, not task_completions
- httpx trust_env=False rule for ml/ code
- Agent test command

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 06:56:17 +00:00
26fc67776f feat(agents): semantic task clustering + focus-area inferred preferred_areas (#97, #113)
- New ml/agents/clustering.py: embed task content via nomic-embed-text
  (Ollama), greedy cosine clustering (threshold 0.72, max 6 clusters),
  graceful fallback to project-id grouping when Ollama is unreachable
- focus_area v2.0.0: compute() uses semantic clusters as focus areas;
  adds preferred_areas InferredParam inferred from top-2 projects by
  task_completion count
- 135 tests, all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 06:54:46 +00:00
336644a90a docs: update CLAUDE.md with rich per-agent inference completions (#112–#116)
- Inference framework table updated: all agents at v1.2.0 with full param list
- Documents UserHistory.task_completions and AgentInferRequest.task_completions
- Marks #112/114/115/116 complete in recent completions
- Active work updated: #78 closed, #61 and #97/#113 as next priorities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 06:28:30 +00:00
1d9a395591 feat(agents): quiet window + peak hours + tz prefs for time-of-day agent (#112)
Adds four InferredParams (all TTL=24h, min_history=50 except preferred_hour=10):
- quiet_start / quiet_end: longest contiguous below-baseline hour run (HH:MM)
- peak_hours: top-quartile done-event hours, sorted ascending
- tz: cold-start only ("UTC"); populated from auth provider, no inference function

compute() updated:
- in_quiet check (quiet window) takes precedence over peak hours
- in_peak emits "peak productivity hour" language when current hour is in peak_hours
- approaching peak (within 2h) surfaces for orchestrator timing
- tz surfaced in snippet header when not UTC
- snapshot adds peak_hours, in_quiet, in_peak, tz

- Agent bumped to v1.2.0
- 21 new tests: night-owl, early-bird, shift-worker, quiet/peak snippet rendering
- Fixed test_snapshot_keys in test_agents.py to include new snapshot fields

Closes #112

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 06:05:51 +00:00
bc71dc203d feat(agents): adaptive lookback + weekly/daily cycle detection for recent-patterns (#116)
Replaces the coarse density-bucket window_days with three InferredParams (all TTL=24h):
- lookback_days: min window containing ≥30 done events, capped at 30d (min_history=5)
- weekly_cycle: per-DOW peak-to-mean strength list (min_history=21, ≥3 weeks of signal)
- daily_cycle: per-hour peak-to-mean strength list (min_history=14)

compute() renders cycle hints when strength > 0.5:
  "User tends to complete tips on Tuesdays and Saturdays."
  "User is most active around 8pm."
Legacy window_days pref key still accepted as a fallback.

- window_days pref renamed lookback_days; backward-compat fallback in compute()
- Agent bumped to v1.2.0
- 19 new tests: weekend-warrior, weekday-only, evening-person, no-pattern,
  legacy compat, snippet rendering with strong/weak signals

Closes #116

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 05:51:45 +00:00
4cade4868b feat(agents): per-user baseline + stdev inference for momentum agent (#114)
Adds two InferredParams (TTL=7d) computed from 28-day rolling daily done counts:
- baseline_completions_per_day: mean done events/day over the window
- stdev: stdev of daily counts (floored at 0.1 to avoid division by zero)

MomentumAgent.compute() now calculates a z-score from recent done events in
inp.feedback_history vs the inferred baseline. Snippet language switches to
z-score framing ("above your usual pace", "slowing down") when |z| >= 1.0,
falling back to engagement_trend labels when in the normal range.

- engagement_trend InferredParam preserved for backward compatibility
- momentum_window pref added (default 7, user-overridable)
- 14 new tests covering power user, casual user, returning-from-break, and
  relative stdev comparison; engagement_trend tests updated for z-score priority
- Agent bumped to v1.2.0

Closes #114

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 05:18:29 +00:00
04212ff318 feat(agents): p50-lateness tolerance + per-project realness for overdue-task (#115)
Replaces snooze-rate heuristic with p50 of actual task lateness (completedAt − dueAt).
Adds project_realness inference: projects with chronic lateness get realness < 1 and
the agent softens its snippet language from "overdue" to "past target date".

- TaskCompletion added to UserHistory with lateness_days computed property
- _infer_lateness_tolerance: p50 of task_completions, clipped at 0, float
- _infer_project_realness: per-project median lateness normalised by global median
- Both InferredParams use 7d TTL; cold_start = 0.0 / {}
- AgentInferRequest accepts task_completions; endpoint wires them through
- 12 new tests covering punctual/chronic/mixed users and language softening
- Agent bumped to v1.2.0

Closes #115

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 05:14:04 +00:00
35257b7756 docs: mark ADR-0014 complete in CLAUDE.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:50:42 +00:00
ed1705cb5d feat(db): drop users.consentGiven/consentAt (ADR-0014 step 8)
Backfills consent_given=1 rows into user_consents as data:core before
dropping the legacy columns. auth.ts now writes user_consents on signup;
POST /consent writes user_consents; admin/user routes cleaned of the old
fields. Migration is idempotent — DROP COLUMN is wrapped in try/catch so
it no-ops on fresh DBs that never had the columns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:50:27 +00:00
afb0e9b0cb feat(agents): per-agent inference — momentum, overdue-task, recent-patterns, focus-area (ADR-0014 step 7)
All four agents bumped to v1.1.0.

momentum (#114): infers engagement_trend ('up'|'stable'|'down') by comparing
done-rate in the last 7 days vs the prior 7 days. Agent surfaces the trend
in its snippet ("trending up — build on the momentum").

overdue-task (#115): infers lateness_tolerance_days (0/1/2) from snooze rate.
Agent now filters tasks against the tolerance so low-urgency users aren't
nagged about tasks that are only hours overdue.

recent-patterns (#116): infers window_days (7/14/30) from feedback event
density — sparse users get a wider window so the snippet isn't always empty.

focus-area (#113): no inferred params (project-level feedback linkage needed,
tracked under #78). preferred_areas pref was declared but ignored; agent now
honours it as a tiebreaker and mentions it in the snippet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:21:10 +00:00
ad6747c242 feat(profile): /api/profile + eligibility filter + inference framework (ADR-0014 steps 4-6)
Step 4 — /api/profile read-through API:
  GET  /api/profile          → { user, prefs, consents, contexts }
  PATCH /api/profile/prefs/:scope  upsert user_preferences (source='user')
  PATCH /api/profile/consents      grant / revoke consent keys
  PATCH /api/profile/contexts      create / activate / deactivate contexts
  Legacy consentGiven bit folded in as data:core fallback.

Step 5 — registry-driven eligibility filter:
  fetchRegistry() exported from agent-registry.ts.
  profile/eligibility.ts: getEligibleAgentIds(userId) — filters by required
  consents, silenced_in_contexts, and user_preferences[enabled=false].
  fetchOrchestratorTip filters agent_outputs to eligible set before calling
  ml/serving /recommend. Fail-closed: registry unavailable → empty set.

Step 6 — shared context-inference framework (#111) + time-of-day proof (#112):
  ml/agents/inference/: UserHistory, FeedbackEvent, run_inference().
  Framework: cold-start, min_history gating, error fallback, structured logs.
  TimeOfDayAgent v1.1.0: inferred_params=[preferred_hour]; also reads
  quiet_start/quiet_end from agent_prefs. agent_prefs injected by TS caller.
  AgentInput gains agent_prefs field.
  ml/serving: POST /agents/{agent_id}/infer endpoint.
  agent-outputs.ts computeAndStore: loads prefs before compute, calls /infer
  after, persists results (source='inferred'); user overrides never touched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:14:25 +00:00
305eeae38b feat(agents): manifest plumbing + GET /agents/registry (ADR-0014 step 3)
Each agent now exports a module-level MANIFEST declaring id, version,
pref_schema, required_consents, ttl_sec, and silenced_in_contexts. The
registry surfaces both the agent and its manifest, and rejects on
mismatch so the two cannot drift.

ml/serving exposes GET /agents/registry; services/api proxies it as
GET /api/agents/registry with a 60s in-process cache so admin pageviews
don't hammer upstream. Failures aren't cached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:55:54 +00:00
5d43339616 feat(api): unified Profile schema + consent backfill (ADR-0014 step 1-2)
Adds user_preferences, user_consents, user_contexts and the tone /
tip_kinds_json columns on users. Backfills consent_given=1 rows into
user_consents as data:core; INSERT OR IGNORE keeps it idempotent and
respects later revocations.

Migration body moves to db/migrations.ts so tests can apply it to a
fresh in-memory handle without opening the prod DB on import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:28:47 +00:00
d454a0a8bf docs: ADR-0014 — unified Profile model + agent registry
Propose a shared substrate for per-user prefs, contexts, per-key
consents, and per-agent state so adding an agent stays a manifest
change. Updates CLAUDE.md, README, and architecture docs to reflect
the multi-agent pipeline (ADR-0013) and the registry direction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:19:07 +00:00
41302d9f36 fix: repair Docker build — TS errors and missing docs in image
- Remove unused `httpx` import from bench.ts (package does not exist)
- Add explicit `IRouter` type on `router` in agent-outputs.ts and bench.ts
  to resolve TS2742 portable-type errors
- Remove `docs` from .dockerignore so Dockerfile.admin can copy it into
  the runner image (DOCS_ROOT=/app/docs is read at runtime by the admin)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:52:27 +00:00
05f748159b chore: remove shadow policy machinery (ADR-0013 step 10)
Deletes shadowPolicies map, getShadowPolicies, setPolicyActive from
recommender.ts; removes /api/admin/policies routes from admin.ts; removes
getPolicies, togglePolicy, PolicyInfo from admin api.ts; removes the
policy toggle section from the ops page.

168 API tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:45:32 +00:00
8e9718e8ba chore(ml): remove bandit endpoints + helpers (ADR-0013 step 9)
Deletes all LinUCB and ε-greedy code from ml/serving: score, reward,
stats, reset, features endpoints; feature vector builders; per-user state
file helpers; related Pydantic models; numpy/math/time imports.

Removes test_score.py (pure bandit unit tests). 40 remaining tests pass.
STATE_DIR kept — nats_consumer still writes sync metadata there.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:41:58 +00:00
c65bedcf68 feat(api): orchestrator cutover — replace bandit with multi-agent pipeline (ADR-0013 step 6)
POST /recommend now calls ml/serving /recommend with pre-computed agent
snippets + task context instead of /generate + /score/egreedy/v2. Falls
back to a random signal candidate when ml/serving is unavailable.

Removes: remotePolicy, fetchLlmCandidates, sendRewardWithRetry,
candidateCache, pickPromptVersion. Feedback handler keeps inferReward +
tipFeedback writes for observability; reward delivery to the bandit is gone.
tipScores.policy is now 'orchestrator'; promptVersion is 'v4-orchestrator'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:37:15 +00:00
7e958a779d feat(api): agent pre-compute scheduler (ADR-0013 step 5)
Extracts computeAndStore() from the /agents/:agentId/compute route so it
can be called without an HTTP round-trip. startAgentPrecomputeScheduler()
runs every 15 min: fetches active users (tip view in 48h), runs all agents
in parallel per user, then purges outputs expired >24h. Agent IDs are
resolved from ml/serving /health at startup with a fallback hardcoded list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:29:50 +00:00
37aec4fee1 chore: ADR-0007/0012 superseded status + admin users ID column
ADR-0007 and ADR-0012 both superseded by ADR-0013 as of 2026-05-01.
UsersTable gains a truncated ID column for quick user identification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:20:44 +00:00
b3cf588f2f feat(ml): multi-agent context framework + v4 orchestrator prompt
Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum,
time_of_day, recent_patterns, focus_area) each producing a prompt snippet
from user signals. A registry wires them up; the orchestrator prompt in
ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM.

Also wires /api/agents route in the API and updates the Dockerfile to copy
the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:20:05 +00:00
f8d66aa01f chore: remove Airflow completely from the stack
Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:38:46 +00:00
ce1c8bde57 fix(admin): simulations view-only + docs path in Docker (#109 #110)
- simulate/page.tsx: remove launch form — simulations are triggered via
  Airflow DAG, not the admin UI. Page now shows run history + links to
  Airflow and MLflow only (#109)
- docs.ts: use DOCS_ROOT env var (fallback: ../../docs for local dev) so
  the path works in Docker standalone where CWD is /app (#110)
- Dockerfile.admin: copy docs/ into the runner image at /app/docs and set
  DOCS_ROOT=/app/docs so listAllDocs() finds the files at runtime (#110)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:55:50 +00:00
c1f5fcb561 fix(admin): ops page — add section description, remove redundant footer (#107)
Adds a one-line purpose description under the Ops heading so it is clear
what the section is for (shadow policy toggles, signal replay, per-user
actions). Removes the duplicate "User-level actions" subsection whose
content is now covered by the header description.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:53:35 +00:00
9bd60a9835 feat(web): action sheet cleanup + settings page (#100 #101 #102)
- Remove "Helpful"/"Not helpful" from action sheet — reward is inferred
  from done/snooze/dismiss + dwell time; explicit sentiment buttons were
  redundant and cluttered the UI (#100)
- Move "notify me" push subscription button to new /config page (#101)
- Add settings gear icon (bottom-right, fixed) on tip page linking to /config (#102)
- New /config page: push notification toggle + link to /connect integrations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:52:45 +00:00
4267e6ac68 feat(ml/serving): inject profile features + sort tasks in tip prompt (#79)
- prompts.py: sort tasks overdue-first → priority desc → age desc before
  rendering into the LLM prompt (same ordering as ml/features/context.py)
- prompts.py: render User profile summary line (completion_rate, dismiss_rate,
  preferred_hour) when profile_features are present
- main.py: add profile_features field to PromptContext; plumb from
  GenerateRequest into the prompt builder via model_copy
- logging_config.py: drop add_logger_name processor (incompatible with
  PrintLoggerFactory — caused test ordering failures)
- test_generate.py: 6 new tests covering sort order, profile rendering,
  partial fields, empty profile, and end-to-end plumbing through /generate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:46:16 +00:00
0474ad4deb feat(airflow): integrate bench harness into bench_collect DAG
New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks:
1. collect.py — generates candidates, logs to MLflow
2. export_for_judge — exports pending runs for Claude Code scoring
3. compare — generates leaderboard by (model, prompt) cell

Config via dag_run.conf supports all collect.py options (models, prompts,
n_tips, n_scenarios, temperature, experiment name, max_model_b).

New admin API endpoints (`services/api/src/routes/bench.ts`):
- GET /api/bench/experiments — list tip-bench-* experiments
- POST /api/bench/run — trigger DAG with custom config
- GET /api/bench/runs/:experiment — list runs in experiment
- GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt)

All endpoints require admin auth. Human judge (Claude Code) scores are
applied manually post-export; future enhancement: add webhook to DAG.

Admin UI can now trigger and monitor benchmarks from a dashboard panel.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 11:54:30 +00:00
556019b060 feat(bench): MLflow-based tip-generation benchmark harness (#93, #95)
Combines model evaluation (#93) and prompt A/B testing (#95) into one
experiment. Evaluates all (model × prompt × scenario) cells on the same
fixed contexts so quality differences are attributable.

Architecture:
- Phase A (collect.py): generates candidates per cell, logs to MLflow
  with judge_pending=true. Rejects models >4B, uses keep_alive=0 for
  RAM safety (no concurrent model weights in VRAM).
- Phase B (judge_cli.py): exports pending runs as JSON for Claude Code
  to score per the rubric, then applies scores back to MLflow.
- Phase C (compare.py): leaderboard by (model, prompt) cell.

Rubric (tip-v1) defines 1–5 scales for relevance, actionability, tone,
plus format_ok and overlong flags. Composite = rel + act + tone +
2×format_ok − overlong. Rubric is self-describing and persisted in every
run so judges use consistent criteria across sessions.

Artifacts (prompts, candidates, raw responses) stored as MLflow tags
because the server uses a file:// backend not accessible via REST. Full
artifacts accessible in MLflow UI → run → Tags section.

Tested end-to-end on local machine:
- 4 models (qwen2.5:0.5b/1.5b, gemma3:1b, llama3.2:3b) ≤4B
- 3 prompts (v1, v2-mentor, v3-few-shot)
- 4 scenarios (4 personas × 2 time-slots)
- 48 cells total, all judged and ranked

Winner: qwen2.5:1.5b × v3-few-shot (composite=12.75).

Ready for integration into Airflow prompt_ab_eval DAG and admin UI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 11:48:59 +00:00
e40dfdcbb0 chore(infra): wire MLflow/Airflow env vars, fix healthcheck, add .dockerignore
Some checks failed
buf-check / Lint & breaking-change check (push) Has been cancelled
- docker-compose: pass ML_SERVING_URL, MLFLOW_URL, AIRFLOW_URL + creds to api service
- docker-compose: pass NEXT_PUBLIC_MLFLOW_URL/AIRFLOW_URL to admin service
- docker-compose: replace wget healthcheck with node fetch (wget not in node image)
- docker-compose: enable Airflow basic_auth API backend; add MLflow pip dep for DAGs
- Dockerfiles: tighten layer caching, add .dockerignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:08:43 +00:00
bad1bb2cba feat(simulate): MLflow tracking, Airflow DAG integration, health checks for mlflow/airflow
- sim_runs schema: add judge_mode, n_policies, airflow_dag_run_id, mlflow_run_id columns
- admin health endpoint: add mlflow + airflow checks (Basic auth for Airflow API)
- admin nav: add Simulations page link; rename section label
- runner.py: optional MLflow experiment tracking; multi-policy support
- sim_dag.py: Airflow DAG for offline sim pipeline
- admin simulate page + API client methods for sim runs
- shared-types tsconfig: exclude test files from build

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:08:36 +00:00
e96ceb7ee1 feat(auth): token-based admin authentication for Playwright/CI (#105)
Add POST /api/auth/token — validates ADMIN_TOKEN env var, creates a 24h
session and sets the sid cookie so automated tools can access the admin
panel without Google OAuth. Admin login page gains a token input form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:07:43 +00:00
b554970032 docs(observability): add services/api README; update ml/serving + recommender docs (#18)
- services/api/README.md: new — contract, middleware stack, background
  tasks, config table (LOG_LEVEL, SENTRY_DSN), health story, extraction
  criteria
- ml/serving/README.md: add Observability section (structlog JSON,
  traceparent → trace_id binding), add SENTRY_DSN + ENV to config table
- services/recommender/README.md: fix policy table — egreedy-v2 is
  active (#99), egreedy-v1 is shadow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:41:39 +00:00
c4960d0601 feat(observability): structured logs, W3C trace IDs, Sentry hooks (#18)
- TS: pino + pino-http; every HTTP request log includes traceId from
  W3C traceparent header (generated if absent); forwarded to ml/serving
  on all /score, /generate, /reward, and /api/ml proxy calls
- Python: structlog JSON; FastAPI middleware binds trace_id via
  contextvars so every log line within a request carries it
- Sentry: optional SENTRY_DSN init in both runtimes (no-op if unset)
- Replace all console.* calls across services/api with pino logger
- Update tests to spy on logger instead of console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:37:28 +00:00
7281af83a4 feat(bandit): promote egreedy-v2 (D=12, profile features) as active policy (#99)
Offline sim gate passed — egreedy-v2 mean reward −0.629 vs egreedy-v1 −0.642
(5 users × 20 rounds, rule judge, seed 42). v2 wins 3/5 personas.

- recommender.ts: switch remotePolicy() to /score/egreedy/v2
- recommender.ts: switch sendRewardWithRetry() to /reward/egreedy/v2 with
  profile_features payload so the ridge update uses the full D=12 vector
- recommender.ts: re-fetch profile at feedback time (TTL-cached, near-instant)
- ADR-0012: status Accepted → Promoted, promotion record appended

Shadow entry egreedy-v2-shadow kept in registry (active: false) for rollback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:08:28 +00:00
cba3f1a184 docs(services): update integrations + recommender READMEs for signal abstraction (#78)
integrations/README — replace stale Connector interface and fictional
libsodium vault with the actual SignalSource pattern, SQLite token table,
and real OAuth routes.

recommender/README — document the SignalAggregator pipeline, current
policy registry, and actual /recommend + /feedback contract shapes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:17:38 +00:00
352469162d fix(signals): add missing source field to TaskSyncedEvent (#78)
TaskSyncedPayload in shared-types and ml/serving schemas both require
source, but TaskSyncedEvent in bus.ts and the todoist publish call both
omitted it — causing the JetStream consumer to nak every task.synced
message on validation failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:15:32 +00:00
45416000f9 feat(features): per-feature freshness spec — JIT vs batched (#61)
Each ml/features/*.py now declares freshness, source, and fallback per
feature. ProfileFeature gains ttl_sec (mirrored from registry.ts),
freshness="batched", source, and fallback. context.py adds
ContextFeatureSpec + CONTEXT_FEATURES for the three JIT features
(hour_of_day, day_of_week, tasks). CI test parses ttlSec from registry.ts
to catch drift. ml/README updated with split JIT/batched feature contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:02:55 +00:00
bd3ea1b8b1 docs(schema): update docs for #54 — proto registry + buf CI gate
- packages/shared-types/README.md: new — documents HTTP vs event surfaces,
  proto file layout, schema evolution rules, and how to run buf locally
- ml/serving/README.md: note pydantic payload validation in consumer section
- CLAUDE.md: replace "schema registry enforced when #54 lands" with
  the actual state; remove #54 from active-work list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:53:20 +00:00
377373a95d test(schema): unit tests for schemas.py and nats_consumer._handle (#54)
17 tests covering: pydantic model validation (all payload types, optional
fields, invalid enum values, missing required fields), _handle write path
for task_synced, validation errors surfaced through _make_handler causing
nak instead of ack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:51:15 +00:00
d539fde0c1 feat(schema): protobuf event registry + buf CI gate (#54)
- Add proto schemas in packages/shared-types/events/ (oo.events.v1):
  envelope.proto, signals.proto, integration.proto
- buf.yaml with STANDARD lint + FILE breaking-change rules
- .gitea/workflows/buf-check.yaml: lint + breaking check on every PR
  touching events/ (needs a Gitea Actions runner to execute)
- scripts/buf-check.sh: local equivalent of the CI check
- NormalizedEvent TS envelope gains eventId, schemaVersion, producer
  to align with the proto Envelope message
- ml/serving/schemas.py: pydantic models mirroring the v1 proto types
- nats_consumer.py: validate payloads via pydantic instead of raw .get()

A field-rename PR will now fail buf breaking with exit code 100 and
show the offending messages. To make a breaking change: keep the old
field reserved, add the new one, bump schema_version to v2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:48:24 +00:00
f48b5a7646 docs(ml): serving README + update ml/README and CLAUDE.md for #98
- ml/serving/README.md: new — contract, JetStream consumer docs, config,
  health story, extraction criteria, state file reference
- ml/README.md: note JetStream consumers in serving/ row
- CLAUDE.md: update active work to reflect #98 shipped, #99 still pending

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:21:40 +00:00
4652e4b582 feat(ml): JetStream durable consumers in ml/serving (#98)
Adds a NATS JetStream consumer to ml/serving so the feature pipeline
can react to events without the API triggering every read.

- nats_consumer.py: durable push consumers for signals.> and feedback.>
  streams; acks on success, naks for redeliver, up to NATS_MAX_DELIVER
  attempts; per-consumer health state (last_msg_ts, processed, errors)
- main.py: FastAPI lifespan wires start/stop; /health exposes nats state
- requirements.txt: adds nats-py>=2.9.0
- Dockerfile.ml: copy all *.py from ml/serving (was missing prompts.py)

Handled subjects:
  signals.task.synced   → writes per-user sync metadata to STATE_DIR
  signals.tip.feedback  → logged for observability (reward via HTTP path)

Config: NATS_URL (empty = disabled), NATS_DURABLE_PREFIX, NATS_MAX_DELIVER

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:19:47 +00:00
2d7cf217a9 feat(ml): egreedy-v2 shadow policy — D=12 with profile features (#99)
Ship the scaffolding for #99 (phase B.3 of #81):

- ml/serving: add /score/egreedy/v2, /reward/egreedy/v2, /stats/egreedy/v2
  endpoints (D=12). New feature dims: completion/dismiss rates, mean dwell
  (clipped 10min), preferred-hour alignment (cosine, 1-dim), tip volume (log).
  Separate state file per user (_egreedy_v2.json). /reset clears v2 state too.
- ADR-0012: documents D=7→12 dimension change, normalization choices, shadow
  rollout protocol, and promotion gate (offline sim win per ADR-0002).
- recommender.ts: register egreedy-v2-shadow in shadow-policy map (disabled by
  default). When enabled, calls /score/egreedy/v2 fire-and-forget and publishes
  shadow:egreedy-v2-shadow serve signal. No reward to shadow — sim is the gate.
- sim runner/personas: personas carry synthetic profile_features per persona;
  _call_score/_call_reward thread profile_features through (None-safe for v1/linucb).
- 18 new Python tests; all 56 Python + 170 TS tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:00:38 +00:00