Commit Graph

68 Commits

Author SHA1 Message Date
4cade4868b feat(agents): per-user baseline + stdev inference for momentum agent (#114)
Adds two InferredParams (TTL=7d) computed from 28-day rolling daily done counts:
- baseline_completions_per_day: mean done events/day over the window
- stdev: stdev of daily counts (floored at 0.1 to avoid division by zero)

MomentumAgent.compute() now calculates a z-score from recent done events in
inp.feedback_history vs the inferred baseline. Snippet language switches to
z-score framing ("above your usual pace", "slowing down") when |z| >= 1.0,
falling back to engagement_trend labels when in the normal range.

- engagement_trend InferredParam preserved for backward compatibility
- momentum_window pref added (default 7, user-overridable)
- 14 new tests covering power user, casual user, returning-from-break, and
  relative stdev comparison; engagement_trend tests updated for z-score priority
- Agent bumped to v1.2.0

Closes #114

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 05:18:29 +00:00
04212ff318 feat(agents): p50-lateness tolerance + per-project realness for overdue-task (#115)
Replaces snooze-rate heuristic with p50 of actual task lateness (completedAt − dueAt).
Adds project_realness inference: projects with chronic lateness get realness < 1 and
the agent softens its snippet language from "overdue" to "past target date".

- TaskCompletion added to UserHistory with lateness_days computed property
- _infer_lateness_tolerance: p50 of task_completions, clipped at 0, float
- _infer_project_realness: per-project median lateness normalised by global median
- Both InferredParams use 7d TTL; cold_start = 0.0 / {}
- AgentInferRequest accepts task_completions; endpoint wires them through
- 12 new tests covering punctual/chronic/mixed users and language softening
- Agent bumped to v1.2.0

Closes #115

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 05:14:04 +00:00
35257b7756 docs: mark ADR-0014 complete in CLAUDE.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:50:42 +00:00
ed1705cb5d feat(db): drop users.consentGiven/consentAt (ADR-0014 step 8)
Backfills consent_given=1 rows into user_consents as data:core before
dropping the legacy columns. auth.ts now writes user_consents on signup;
POST /consent writes user_consents; admin/user routes cleaned of the old
fields. Migration is idempotent — DROP COLUMN is wrapped in try/catch so
it no-ops on fresh DBs that never had the columns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:50:27 +00:00
afb0e9b0cb feat(agents): per-agent inference — momentum, overdue-task, recent-patterns, focus-area (ADR-0014 step 7)
All four agents bumped to v1.1.0.

momentum (#114): infers engagement_trend ('up'|'stable'|'down') by comparing
done-rate in the last 7 days vs the prior 7 days. Agent surfaces the trend
in its snippet ("trending up — build on the momentum").

overdue-task (#115): infers lateness_tolerance_days (0/1/2) from snooze rate.
Agent now filters tasks against the tolerance so low-urgency users aren't
nagged about tasks that are only hours overdue.

recent-patterns (#116): infers window_days (7/14/30) from feedback event
density — sparse users get a wider window so the snippet isn't always empty.

focus-area (#113): no inferred params (project-level feedback linkage needed,
tracked under #78). preferred_areas pref was declared but ignored; agent now
honours it as a tiebreaker and mentions it in the snippet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:21:10 +00:00
ad6747c242 feat(profile): /api/profile + eligibility filter + inference framework (ADR-0014 steps 4-6)
Step 4 — /api/profile read-through API:
  GET  /api/profile          → { user, prefs, consents, contexts }
  PATCH /api/profile/prefs/:scope  upsert user_preferences (source='user')
  PATCH /api/profile/consents      grant / revoke consent keys
  PATCH /api/profile/contexts      create / activate / deactivate contexts
  Legacy consentGiven bit folded in as data:core fallback.

Step 5 — registry-driven eligibility filter:
  fetchRegistry() exported from agent-registry.ts.
  profile/eligibility.ts: getEligibleAgentIds(userId) — filters by required
  consents, silenced_in_contexts, and user_preferences[enabled=false].
  fetchOrchestratorTip filters agent_outputs to eligible set before calling
  ml/serving /recommend. Fail-closed: registry unavailable → empty set.

Step 6 — shared context-inference framework (#111) + time-of-day proof (#112):
  ml/agents/inference/: UserHistory, FeedbackEvent, run_inference().
  Framework: cold-start, min_history gating, error fallback, structured logs.
  TimeOfDayAgent v1.1.0: inferred_params=[preferred_hour]; also reads
  quiet_start/quiet_end from agent_prefs. agent_prefs injected by TS caller.
  AgentInput gains agent_prefs field.
  ml/serving: POST /agents/{agent_id}/infer endpoint.
  agent-outputs.ts computeAndStore: loads prefs before compute, calls /infer
  after, persists results (source='inferred'); user overrides never touched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:14:25 +00:00
305eeae38b feat(agents): manifest plumbing + GET /agents/registry (ADR-0014 step 3)
Each agent now exports a module-level MANIFEST declaring id, version,
pref_schema, required_consents, ttl_sec, and silenced_in_contexts. The
registry surfaces both the agent and its manifest, and rejects on
mismatch so the two cannot drift.

ml/serving exposes GET /agents/registry; services/api proxies it as
GET /api/agents/registry with a 60s in-process cache so admin pageviews
don't hammer upstream. Failures aren't cached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:55:54 +00:00
5d43339616 feat(api): unified Profile schema + consent backfill (ADR-0014 step 1-2)
Adds user_preferences, user_consents, user_contexts and the tone /
tip_kinds_json columns on users. Backfills consent_given=1 rows into
user_consents as data:core; INSERT OR IGNORE keeps it idempotent and
respects later revocations.

Migration body moves to db/migrations.ts so tests can apply it to a
fresh in-memory handle without opening the prod DB on import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:28:47 +00:00
d454a0a8bf docs: ADR-0014 — unified Profile model + agent registry
Propose a shared substrate for per-user prefs, contexts, per-key
consents, and per-agent state so adding an agent stays a manifest
change. Updates CLAUDE.md, README, and architecture docs to reflect
the multi-agent pipeline (ADR-0013) and the registry direction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:19:07 +00:00
41302d9f36 fix: repair Docker build — TS errors and missing docs in image
- Remove unused `httpx` import from bench.ts (package does not exist)
- Add explicit `IRouter` type on `router` in agent-outputs.ts and bench.ts
  to resolve TS2742 portable-type errors
- Remove `docs` from .dockerignore so Dockerfile.admin can copy it into
  the runner image (DOCS_ROOT=/app/docs is read at runtime by the admin)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:52:27 +00:00
05f748159b chore: remove shadow policy machinery (ADR-0013 step 10)
Deletes shadowPolicies map, getShadowPolicies, setPolicyActive from
recommender.ts; removes /api/admin/policies routes from admin.ts; removes
getPolicies, togglePolicy, PolicyInfo from admin api.ts; removes the
policy toggle section from the ops page.

168 API tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:45:32 +00:00
8e9718e8ba chore(ml): remove bandit endpoints + helpers (ADR-0013 step 9)
Deletes all LinUCB and ε-greedy code from ml/serving: score, reward,
stats, reset, features endpoints; feature vector builders; per-user state
file helpers; related Pydantic models; numpy/math/time imports.

Removes test_score.py (pure bandit unit tests). 40 remaining tests pass.
STATE_DIR kept — nats_consumer still writes sync metadata there.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:41:58 +00:00
c65bedcf68 feat(api): orchestrator cutover — replace bandit with multi-agent pipeline (ADR-0013 step 6)
POST /recommend now calls ml/serving /recommend with pre-computed agent
snippets + task context instead of /generate + /score/egreedy/v2. Falls
back to a random signal candidate when ml/serving is unavailable.

Removes: remotePolicy, fetchLlmCandidates, sendRewardWithRetry,
candidateCache, pickPromptVersion. Feedback handler keeps inferReward +
tipFeedback writes for observability; reward delivery to the bandit is gone.
tipScores.policy is now 'orchestrator'; promptVersion is 'v4-orchestrator'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:37:15 +00:00
7e958a779d feat(api): agent pre-compute scheduler (ADR-0013 step 5)
Extracts computeAndStore() from the /agents/:agentId/compute route so it
can be called without an HTTP round-trip. startAgentPrecomputeScheduler()
runs every 15 min: fetches active users (tip view in 48h), runs all agents
in parallel per user, then purges outputs expired >24h. Agent IDs are
resolved from ml/serving /health at startup with a fallback hardcoded list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:29:50 +00:00
37aec4fee1 chore: ADR-0007/0012 superseded status + admin users ID column
ADR-0007 and ADR-0012 both superseded by ADR-0013 as of 2026-05-01.
UsersTable gains a truncated ID column for quick user identification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:20:44 +00:00
b3cf588f2f feat(ml): multi-agent context framework + v4 orchestrator prompt
Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum,
time_of_day, recent_patterns, focus_area) each producing a prompt snippet
from user signals. A registry wires them up; the orchestrator prompt in
ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM.

Also wires /api/agents route in the API and updates the Dockerfile to copy
the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:20:05 +00:00
f8d66aa01f chore: remove Airflow completely from the stack
Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:38:46 +00:00
ce1c8bde57 fix(admin): simulations view-only + docs path in Docker (#109 #110)
- simulate/page.tsx: remove launch form — simulations are triggered via
  Airflow DAG, not the admin UI. Page now shows run history + links to
  Airflow and MLflow only (#109)
- docs.ts: use DOCS_ROOT env var (fallback: ../../docs for local dev) so
  the path works in Docker standalone where CWD is /app (#110)
- Dockerfile.admin: copy docs/ into the runner image at /app/docs and set
  DOCS_ROOT=/app/docs so listAllDocs() finds the files at runtime (#110)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:55:50 +00:00
c1f5fcb561 fix(admin): ops page — add section description, remove redundant footer (#107)
Adds a one-line purpose description under the Ops heading so it is clear
what the section is for (shadow policy toggles, signal replay, per-user
actions). Removes the duplicate "User-level actions" subsection whose
content is now covered by the header description.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:53:35 +00:00
9bd60a9835 feat(web): action sheet cleanup + settings page (#100 #101 #102)
- Remove "Helpful"/"Not helpful" from action sheet — reward is inferred
  from done/snooze/dismiss + dwell time; explicit sentiment buttons were
  redundant and cluttered the UI (#100)
- Move "notify me" push subscription button to new /config page (#101)
- Add settings gear icon (bottom-right, fixed) on tip page linking to /config (#102)
- New /config page: push notification toggle + link to /connect integrations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:52:45 +00:00
4267e6ac68 feat(ml/serving): inject profile features + sort tasks in tip prompt (#79)
- prompts.py: sort tasks overdue-first → priority desc → age desc before
  rendering into the LLM prompt (same ordering as ml/features/context.py)
- prompts.py: render User profile summary line (completion_rate, dismiss_rate,
  preferred_hour) when profile_features are present
- main.py: add profile_features field to PromptContext; plumb from
  GenerateRequest into the prompt builder via model_copy
- logging_config.py: drop add_logger_name processor (incompatible with
  PrintLoggerFactory — caused test ordering failures)
- test_generate.py: 6 new tests covering sort order, profile rendering,
  partial fields, empty profile, and end-to-end plumbing through /generate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:46:16 +00:00
0474ad4deb feat(airflow): integrate bench harness into bench_collect DAG
New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks:
1. collect.py — generates candidates, logs to MLflow
2. export_for_judge — exports pending runs for Claude Code scoring
3. compare — generates leaderboard by (model, prompt) cell

Config via dag_run.conf supports all collect.py options (models, prompts,
n_tips, n_scenarios, temperature, experiment name, max_model_b).

New admin API endpoints (`services/api/src/routes/bench.ts`):
- GET /api/bench/experiments — list tip-bench-* experiments
- POST /api/bench/run — trigger DAG with custom config
- GET /api/bench/runs/:experiment — list runs in experiment
- GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt)

All endpoints require admin auth. Human judge (Claude Code) scores are
applied manually post-export; future enhancement: add webhook to DAG.

Admin UI can now trigger and monitor benchmarks from a dashboard panel.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 11:54:30 +00:00
556019b060 feat(bench): MLflow-based tip-generation benchmark harness (#93, #95)
Combines model evaluation (#93) and prompt A/B testing (#95) into one
experiment. Evaluates all (model × prompt × scenario) cells on the same
fixed contexts so quality differences are attributable.

Architecture:
- Phase A (collect.py): generates candidates per cell, logs to MLflow
  with judge_pending=true. Rejects models >4B, uses keep_alive=0 for
  RAM safety (no concurrent model weights in VRAM).
- Phase B (judge_cli.py): exports pending runs as JSON for Claude Code
  to score per the rubric, then applies scores back to MLflow.
- Phase C (compare.py): leaderboard by (model, prompt) cell.

Rubric (tip-v1) defines 1–5 scales for relevance, actionability, tone,
plus format_ok and overlong flags. Composite = rel + act + tone +
2×format_ok − overlong. Rubric is self-describing and persisted in every
run so judges use consistent criteria across sessions.

Artifacts (prompts, candidates, raw responses) stored as MLflow tags
because the server uses a file:// backend not accessible via REST. Full
artifacts accessible in MLflow UI → run → Tags section.

Tested end-to-end on local machine:
- 4 models (qwen2.5:0.5b/1.5b, gemma3:1b, llama3.2:3b) ≤4B
- 3 prompts (v1, v2-mentor, v3-few-shot)
- 4 scenarios (4 personas × 2 time-slots)
- 48 cells total, all judged and ranked

Winner: qwen2.5:1.5b × v3-few-shot (composite=12.75).

Ready for integration into Airflow prompt_ab_eval DAG and admin UI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 11:48:59 +00:00
e40dfdcbb0 chore(infra): wire MLflow/Airflow env vars, fix healthcheck, add .dockerignore
Some checks failed
buf-check / Lint & breaking-change check (push) Has been cancelled
- docker-compose: pass ML_SERVING_URL, MLFLOW_URL, AIRFLOW_URL + creds to api service
- docker-compose: pass NEXT_PUBLIC_MLFLOW_URL/AIRFLOW_URL to admin service
- docker-compose: replace wget healthcheck with node fetch (wget not in node image)
- docker-compose: enable Airflow basic_auth API backend; add MLflow pip dep for DAGs
- Dockerfiles: tighten layer caching, add .dockerignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:08:43 +00:00
bad1bb2cba feat(simulate): MLflow tracking, Airflow DAG integration, health checks for mlflow/airflow
- sim_runs schema: add judge_mode, n_policies, airflow_dag_run_id, mlflow_run_id columns
- admin health endpoint: add mlflow + airflow checks (Basic auth for Airflow API)
- admin nav: add Simulations page link; rename section label
- runner.py: optional MLflow experiment tracking; multi-policy support
- sim_dag.py: Airflow DAG for offline sim pipeline
- admin simulate page + API client methods for sim runs
- shared-types tsconfig: exclude test files from build

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:08:36 +00:00
e96ceb7ee1 feat(auth): token-based admin authentication for Playwright/CI (#105)
Add POST /api/auth/token — validates ADMIN_TOKEN env var, creates a 24h
session and sets the sid cookie so automated tools can access the admin
panel without Google OAuth. Admin login page gains a token input form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:07:43 +00:00
b554970032 docs(observability): add services/api README; update ml/serving + recommender docs (#18)
- services/api/README.md: new — contract, middleware stack, background
  tasks, config table (LOG_LEVEL, SENTRY_DSN), health story, extraction
  criteria
- ml/serving/README.md: add Observability section (structlog JSON,
  traceparent → trace_id binding), add SENTRY_DSN + ENV to config table
- services/recommender/README.md: fix policy table — egreedy-v2 is
  active (#99), egreedy-v1 is shadow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:41:39 +00:00
c4960d0601 feat(observability): structured logs, W3C trace IDs, Sentry hooks (#18)
- TS: pino + pino-http; every HTTP request log includes traceId from
  W3C traceparent header (generated if absent); forwarded to ml/serving
  on all /score, /generate, /reward, and /api/ml proxy calls
- Python: structlog JSON; FastAPI middleware binds trace_id via
  contextvars so every log line within a request carries it
- Sentry: optional SENTRY_DSN init in both runtimes (no-op if unset)
- Replace all console.* calls across services/api with pino logger
- Update tests to spy on logger instead of console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:37:28 +00:00
7281af83a4 feat(bandit): promote egreedy-v2 (D=12, profile features) as active policy (#99)
Offline sim gate passed — egreedy-v2 mean reward −0.629 vs egreedy-v1 −0.642
(5 users × 20 rounds, rule judge, seed 42). v2 wins 3/5 personas.

- recommender.ts: switch remotePolicy() to /score/egreedy/v2
- recommender.ts: switch sendRewardWithRetry() to /reward/egreedy/v2 with
  profile_features payload so the ridge update uses the full D=12 vector
- recommender.ts: re-fetch profile at feedback time (TTL-cached, near-instant)
- ADR-0012: status Accepted → Promoted, promotion record appended

Shadow entry egreedy-v2-shadow kept in registry (active: false) for rollback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:08:28 +00:00
cba3f1a184 docs(services): update integrations + recommender READMEs for signal abstraction (#78)
integrations/README — replace stale Connector interface and fictional
libsodium vault with the actual SignalSource pattern, SQLite token table,
and real OAuth routes.

recommender/README — document the SignalAggregator pipeline, current
policy registry, and actual /recommend + /feedback contract shapes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:17:38 +00:00
352469162d fix(signals): add missing source field to TaskSyncedEvent (#78)
TaskSyncedPayload in shared-types and ml/serving schemas both require
source, but TaskSyncedEvent in bus.ts and the todoist publish call both
omitted it — causing the JetStream consumer to nak every task.synced
message on validation failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:15:32 +00:00
45416000f9 feat(features): per-feature freshness spec — JIT vs batched (#61)
Each ml/features/*.py now declares freshness, source, and fallback per
feature. ProfileFeature gains ttl_sec (mirrored from registry.ts),
freshness="batched", source, and fallback. context.py adds
ContextFeatureSpec + CONTEXT_FEATURES for the three JIT features
(hour_of_day, day_of_week, tasks). CI test parses ttlSec from registry.ts
to catch drift. ml/README updated with split JIT/batched feature contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:02:55 +00:00
bd3ea1b8b1 docs(schema): update docs for #54 — proto registry + buf CI gate
- packages/shared-types/README.md: new — documents HTTP vs event surfaces,
  proto file layout, schema evolution rules, and how to run buf locally
- ml/serving/README.md: note pydantic payload validation in consumer section
- CLAUDE.md: replace "schema registry enforced when #54 lands" with
  the actual state; remove #54 from active-work list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:53:20 +00:00
377373a95d test(schema): unit tests for schemas.py and nats_consumer._handle (#54)
17 tests covering: pydantic model validation (all payload types, optional
fields, invalid enum values, missing required fields), _handle write path
for task_synced, validation errors surfaced through _make_handler causing
nak instead of ack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:51:15 +00:00
d539fde0c1 feat(schema): protobuf event registry + buf CI gate (#54)
- Add proto schemas in packages/shared-types/events/ (oo.events.v1):
  envelope.proto, signals.proto, integration.proto
- buf.yaml with STANDARD lint + FILE breaking-change rules
- .gitea/workflows/buf-check.yaml: lint + breaking check on every PR
  touching events/ (needs a Gitea Actions runner to execute)
- scripts/buf-check.sh: local equivalent of the CI check
- NormalizedEvent TS envelope gains eventId, schemaVersion, producer
  to align with the proto Envelope message
- ml/serving/schemas.py: pydantic models mirroring the v1 proto types
- nats_consumer.py: validate payloads via pydantic instead of raw .get()

A field-rename PR will now fail buf breaking with exit code 100 and
show the offending messages. To make a breaking change: keep the old
field reserved, add the new one, bump schema_version to v2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:48:24 +00:00
f48b5a7646 docs(ml): serving README + update ml/README and CLAUDE.md for #98
- ml/serving/README.md: new — contract, JetStream consumer docs, config,
  health story, extraction criteria, state file reference
- ml/README.md: note JetStream consumers in serving/ row
- CLAUDE.md: update active work to reflect #98 shipped, #99 still pending

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:21:40 +00:00
4652e4b582 feat(ml): JetStream durable consumers in ml/serving (#98)
Adds a NATS JetStream consumer to ml/serving so the feature pipeline
can react to events without the API triggering every read.

- nats_consumer.py: durable push consumers for signals.> and feedback.>
  streams; acks on success, naks for redeliver, up to NATS_MAX_DELIVER
  attempts; per-consumer health state (last_msg_ts, processed, errors)
- main.py: FastAPI lifespan wires start/stop; /health exposes nats state
- requirements.txt: adds nats-py>=2.9.0
- Dockerfile.ml: copy all *.py from ml/serving (was missing prompts.py)

Handled subjects:
  signals.task.synced   → writes per-user sync metadata to STATE_DIR
  signals.tip.feedback  → logged for observability (reward via HTTP path)

Config: NATS_URL (empty = disabled), NATS_DURABLE_PREFIX, NATS_MAX_DELIVER

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:19:47 +00:00
2d7cf217a9 feat(ml): egreedy-v2 shadow policy — D=12 with profile features (#99)
Ship the scaffolding for #99 (phase B.3 of #81):

- ml/serving: add /score/egreedy/v2, /reward/egreedy/v2, /stats/egreedy/v2
  endpoints (D=12). New feature dims: completion/dismiss rates, mean dwell
  (clipped 10min), preferred-hour alignment (cosine, 1-dim), tip volume (log).
  Separate state file per user (_egreedy_v2.json). /reset clears v2 state too.
- ADR-0012: documents D=7→12 dimension change, normalization choices, shadow
  rollout protocol, and promotion gate (offline sim win per ADR-0002).
- recommender.ts: register egreedy-v2-shadow in shadow-policy map (disabled by
  default). When enabled, calls /score/egreedy/v2 fire-and-forget and publishes
  shadow:egreedy-v2-shadow serve signal. No reward to shadow — sim is the gate.
- sim runner/personas: personas carry synthetic profile_features per persona;
  _call_score/_call_reward thread profile_features through (None-safe for v1/linucb).
- 18 new Python tests; all 56 Python + 170 TS tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:00:38 +00:00
b8113d4bda docs(adr-0011): point B.3 at new issue #99
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:41:20 +00:00
ee4eb15022 feat(profile): event-driven invalidation (#81 phase B.2)
Features now declare invalidatedBy subjects in the registry; the new
profile/subscriber.ts subscribes to each unique subject and drops
matching stored rows for the userId in the payload. Next getProfile
call recomputes from current data instead of waiting up to ttlSec.

Wiring:
  completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d,
  preferred_hour  ← signals.tip.feedback
  tip_volume_30d  ← signals.tip.served

TTL stays as a safety net for clock drift and dropped events.
Registration validates each declared subject against KNOWN_SUBJECTS
(mirror of EventMap) so typos throw at startup, not silently.

ADR-0011 updated.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:38:45 +00:00
4a42a6aabf feat(admin): profile freshness panel in data-quality (#81 phase B.4)
Adds a per-feature freshness summary to /admin/data-quality so the admin
can spot features that are systematically stale or never computed:

  totalEligible — distinct users with tip_views in the last 30 days
  missing       — eligible users with no row stored for the feature
  stale         — eligible users whose stored row is past its TTL

Backend exposes summarizeProfileFreshness() in profile/builder.ts; one
query per feature joins eligible users LEFT JOIN profile rows.
Coverage = (eligible − missing − stale) / eligible, colored
green/yellow/red via the new PctGood helper (high-is-good, opposite of
the existing Pct used for missing-feature/stale-token rates).

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:34:46 +00:00
9e96540bcc feat(admin): per-user profile view + rebuild action (#81 phase B.1)
Surfaces phase A's profile features in /admin/users/:id so we can verify
they're actually computing useful values before investing in bandit
consumption. The detail GET now includes profile rows joined with registry
metadata (name, value, age, fresh badge, ttlSec, description). Read does
NOT trigger compute — staleness must be visible. A new POST
.../profile/rebuild button force-recomputes and is audit-logged like
reset-bandit.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:27:08 +00:00
7d4c29e137 feat(profile): user-profile feature registry + builder (phase A)
Centralizes user-level features (completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that
owns both definition and SQL aggregation, since the data lives in the
TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps
recommend latency bounded; values persist in user_profile_features (KV).

ml/serving accepts profile_features on /score + /generate but does not
yet consume them — extending the bandit feature vector changes D and
resets every user's learned state, so that's a deliberate phase-B step.

Includes ml/features/profile_schema.py as a contract mirror with a sync
test that diffs name sets against registry.ts.

ADR-0011 records the data-locality reasoning (registry in TS, not Python
as the issue originally suggested).

Phase B (deferred): event-driven incremental updates, bandit consumption
with state migration, admin per-user profile page, staleness alerts.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:22:22 +00:00
430804e9a5 feat(ml): prompt registry + per-request variant selection
Replaces the hardcoded "v1" label with a real prompt registry:

  ml/serving/prompts.py       — keyed by version: v1 (baseline),
                                v2-mentor (calm/specific persona),
                                v3-few-shot (v1 persona + curated examples)
  ml/serving/main.py          — POST /generate accepts optional prompt_version,
                                422 on unknown, echoes the version actually used
                                back in the response
  services/api/src/config.ts  — TIP_PROMPT_VERSION: empty / single / comma-list
                                (uniform random per request)
  services/api/src/routes/recommender.ts
                              — pickPromptVersion() drives selection; the
                                response's prompt_version (not a stale TS
                                constant) is what lands in tip_scores so the
                                #92 reward-analytics dashboard shows real
                                per-variant reaction rates

Closes #84.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 15:44:04 +00:00
aa4bdd8f09 feat(admin): LLM tip quality dashboard — per-model/prompt/kind breakdowns
/admin/reward-analytics now surfaces served count, reaction rate, and avg
reward grouped by llm_model, prompt_version, and tip_kind — closing the
loop so model/prompt iterations in M2 are legible next to the bandit
policy view. Data comes from the tip_scores columns added in ffdf707 and
tip_feedback.reward_milli; bandit-only tips show as "(bandit-only)".

Closes #92.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 15:24:52 +00:00
75d0e89906 fix(infra): ml-serving LITELLM_URL default → host.docker.internal:4000
Inside the container, llm.alogins.net times out (public-DNS route, not the
loopback path Caddy listens on). host.docker.internal:4000 reaches the Agap
LiteLLM directly and is equivalent for dev. Prod deploys override via env.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 12:20:41 +00:00
d4205a00cf refactor(infra): drop ai profile; ollama + litellm move to Agap
Ollama and LiteLLM are shared Agap services (agap_git/openai/docker-compose.yml);
oO never starts them. Removes the ai profile, the litellm config, and the
--profile ai runbook; points ml-serving at https://llm.alogins.net by default
and adds host.docker.internal host-gateway so the container can hit Agap ollama
on the host.

Also updates the tip-generator model alias to qwen2.5:1.5b to match the model
actually pulled on Agap ollama (7b is ~4.7 GB and would blow VRAM budget).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 12:16:21 +00:00
d7a2423940 fix(infra): mlflow image tag + python-based healthchecks for ml-serving/mlflow
- Corrects mlflow image tag (2.14.3 → v2.14.3); the former tag does not exist
  on ghcr.io/mlflow/mlflow and caused a manifest-unknown error on pull.
- Replaces wget/curl healthchecks with inline python urllib calls — the
  python:3.12-slim (ml-serving) and ghcr.io/mlflow/mlflow images ship
  neither wget nor curl, so both containers reported unhealthy despite
  /health returning 200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 15:04:18 +00:00
bb879c5f0f refactor(admin): drop simulations/experiments/models pages; group nav into sections
Removes the in-shell MLOps pages (experiments, models, simulations) and their
client API helpers in favour of external MLflow/Airflow links. Nav is regrouped
into Signals / Recommender status / Ops sections for clarity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 14:41:17 +00:00
5b52c6bf40 test: cover NATS bridge + Todoist scheduler; ADR-0010
- bus.test.ts: 4 cases for the new onPublish hook contract
- nats.test.ts: stream creation idempotency + JSON publish bridge
- scheduler.test.ts: startup delay, fan-out, per-user failure isolation
- ADR-0010 documents the bridge-don't-replace decision and the
  Todoist scheduler isolation, plus open follow-ups (#98 ml/serving
  consumer, #54 protobuf migration, graceful shutdown, metrics)
- README/overview/services README reflect the bridged event substrate
- CLAUDE.md gains a "don't nats.publish() directly" rule
- .env.example documents NATS_URL + TODOIST_SYNC_INTERVAL_MS

Verified in deployment 2026-04-18: api -> nats bridge connects on
boot, signals + feedback streams created, scheduler tick logs
"todoist sync: 1 ok, 0 failed (1 users)" within 10s. Closes #21, #22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 07:55:25 +00:00