Logs one MLflow run per /recommend (params, token metrics, latency,
full prompt + tip as artifacts) and per /agents/{id}/compute and
/infer call (signals snapshot, inferred prefs, latency).
Tracing is a no-op when MLFLOW_TRACKING_URI is unset; ml-serving
starts and serves tips correctly without MLflow configured.
Refs #118 (M4: remove from production / move off critical path).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces snooze-rate heuristic with p50 of actual task lateness (completedAt − dueAt).
Adds project_realness inference: projects with chronic lateness get realness < 1 and
the agent softens its snippet language from "overdue" to "past target date".
- TaskCompletion added to UserHistory with lateness_days computed property
- _infer_lateness_tolerance: p50 of task_completions, clipped at 0, float
- _infer_project_realness: per-project median lateness normalised by global median
- Both InferredParams use 7d TTL; cold_start = 0.0 / {}
- AgentInferRequest accepts task_completions; endpoint wires them through
- 12 new tests covering punctual/chronic/mixed users and language softening
- Agent bumped to v1.2.0
Closes#115
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All four agents bumped to v1.1.0.
momentum (#114): infers engagement_trend ('up'|'stable'|'down') by comparing
done-rate in the last 7 days vs the prior 7 days. Agent surfaces the trend
in its snippet ("trending up — build on the momentum").
overdue-task (#115): infers lateness_tolerance_days (0/1/2) from snooze rate.
Agent now filters tasks against the tolerance so low-urgency users aren't
nagged about tasks that are only hours overdue.
recent-patterns (#116): infers window_days (7/14/30) from feedback event
density — sparse users get a wider window so the snippet isn't always empty.
focus-area (#113): no inferred params (project-level feedback linkage needed,
tracked under #78). preferred_areas pref was declared but ignored; agent now
honours it as a tiebreaker and mentions it in the snippet.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each agent now exports a module-level MANIFEST declaring id, version,
pref_schema, required_consents, ttl_sec, and silenced_in_contexts. The
registry surfaces both the agent and its manifest, and rejects on
mismatch so the two cannot drift.
ml/serving exposes GET /agents/registry; services/api proxies it as
GET /api/agents/registry with a 60s in-process cache so admin pageviews
don't hammer upstream. Failures aren't cached.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum,
time_of_day, recent_patterns, focus_area) each producing a prompt snippet
from user signals. A registry wires them up; the orchestrator prompt in
ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM.
Also wires /api/agents route in the API and updates the Dockerfile to copy
the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- prompts.py: sort tasks overdue-first → priority desc → age desc before
rendering into the LLM prompt (same ordering as ml/features/context.py)
- prompts.py: render User profile summary line (completion_rate, dismiss_rate,
preferred_hour) when profile_features are present
- main.py: add profile_features field to PromptContext; plumb from
GenerateRequest into the prompt builder via model_copy
- logging_config.py: drop add_logger_name processor (incompatible with
PrintLoggerFactory — caused test ordering failures)
- test_generate.py: 6 new tests covering sort order, profile rendering,
partial fields, empty profile, and end-to-end plumbing through /generate
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TS: pino + pino-http; every HTTP request log includes traceId from
W3C traceparent header (generated if absent); forwarded to ml/serving
on all /score, /generate, /reward, and /api/ml proxy calls
- Python: structlog JSON; FastAPI middleware binds trace_id via
contextvars so every log line within a request carries it
- Sentry: optional SENTRY_DSN init in both runtimes (no-op if unset)
- Replace all console.* calls across services/api with pino logger
- Update tests to spy on logger instead of console
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- packages/shared-types/README.md: new — documents HTTP vs event surfaces,
proto file layout, schema evolution rules, and how to run buf locally
- ml/serving/README.md: note pydantic payload validation in consumer section
- CLAUDE.md: replace "schema registry enforced when #54 lands" with
the actual state; remove #54 from active-work list
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add proto schemas in packages/shared-types/events/ (oo.events.v1):
envelope.proto, signals.proto, integration.proto
- buf.yaml with STANDARD lint + FILE breaking-change rules
- .gitea/workflows/buf-check.yaml: lint + breaking check on every PR
touching events/ (needs a Gitea Actions runner to execute)
- scripts/buf-check.sh: local equivalent of the CI check
- NormalizedEvent TS envelope gains eventId, schemaVersion, producer
to align with the proto Envelope message
- ml/serving/schemas.py: pydantic models mirroring the v1 proto types
- nats_consumer.py: validate payloads via pydantic instead of raw .get()
A field-rename PR will now fail buf breaking with exit code 100 and
show the offending messages. To make a breaking change: keep the old
field reserved, add the new one, bump schema_version to v2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ml/serving/README.md: new — contract, JetStream consumer docs, config,
health story, extraction criteria, state file reference
- ml/README.md: note JetStream consumers in serving/ row
- CLAUDE.md: update active work to reflect #98 shipped, #99 still pending
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a NATS JetStream consumer to ml/serving so the feature pipeline
can react to events without the API triggering every read.
- nats_consumer.py: durable push consumers for signals.> and feedback.>
streams; acks on success, naks for redeliver, up to NATS_MAX_DELIVER
attempts; per-consumer health state (last_msg_ts, processed, errors)
- main.py: FastAPI lifespan wires start/stop; /health exposes nats state
- requirements.txt: adds nats-py>=2.9.0
- Dockerfile.ml: copy all *.py from ml/serving (was missing prompts.py)
Handled subjects:
signals.task.synced → writes per-user sync metadata to STATE_DIR
signals.tip.feedback → logged for observability (reward via HTTP path)
Config: NATS_URL (empty = disabled), NATS_DURABLE_PREFIX, NATS_MAX_DELIVER
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ship the scaffolding for #99 (phase B.3 of #81):
- ml/serving: add /score/egreedy/v2, /reward/egreedy/v2, /stats/egreedy/v2
endpoints (D=12). New feature dims: completion/dismiss rates, mean dwell
(clipped 10min), preferred-hour alignment (cosine, 1-dim), tip volume (log).
Separate state file per user (_egreedy_v2.json). /reset clears v2 state too.
- ADR-0012: documents D=7→12 dimension change, normalization choices, shadow
rollout protocol, and promotion gate (offline sim win per ADR-0002).
- recommender.ts: register egreedy-v2-shadow in shadow-policy map (disabled by
default). When enabled, calls /score/egreedy/v2 fire-and-forget and publishes
shadow:egreedy-v2-shadow serve signal. No reward to shadow — sim is the gate.
- sim runner/personas: personas carry synthetic profile_features per persona;
_call_score/_call_reward thread profile_features through (None-safe for v1/linucb).
- 18 new Python tests; all 56 Python + 170 TS tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Centralizes user-level features (completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that
owns both definition and SQL aggregation, since the data lives in the
TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps
recommend latency bounded; values persist in user_profile_features (KV).
ml/serving accepts profile_features on /score + /generate but does not
yet consume them — extending the bandit feature vector changes D and
resets every user's learned state, so that's a deliberate phase-B step.
Includes ml/features/profile_schema.py as a contract mirror with a sync
test that diffs name sets against registry.ts.
ADR-0011 records the data-locality reasoning (registry in TS, not Python
as the issue originally suggested).
Phase B (deferred): event-driven incremental updates, bandit consumption
with state migration, admin per-user profile page, staleness alerts.
Refs #81.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the hardcoded "v1" label with a real prompt registry:
ml/serving/prompts.py — keyed by version: v1 (baseline),
v2-mentor (calm/specific persona),
v3-few-shot (v1 persona + curated examples)
ml/serving/main.py — POST /generate accepts optional prompt_version,
422 on unknown, echoes the version actually used
back in the response
services/api/src/config.ts — TIP_PROMPT_VERSION: empty / single / comma-list
(uniform random per request)
services/api/src/routes/recommender.ts
— pickPromptVersion() drives selection; the
response's prompt_version (not a stale TS
constant) is what lands in tip_scores so the
#92 reward-analytics dashboard shows real
per-variant reaction rates
Closes#84.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ML serving:
- LinUCB contextual bandit (disjoint, d=5 features: hour_sin/cos, is_overdue, task_age, priority)
- /score endpoint replaces stub random; /reward endpoint for online learning
- Per-user model state persisted to disk as JSON (survives restarts)
- venv at ml/serving/.venv; start with pnpm dev from ml/serving
Recommender:
- Todoist fetch now extracts features (is_overdue, task_age_days, priority)
- RemotePolicy calls ml/serving with 3s timeout; falls back to RandomPolicy
- Reward sent to /reward on feedback (done=+1, snooze=0, dismiss=-1)
Web Push:
- VAPID keys in config; push_subscriptions table in DB
- POST/DELETE /api/push/subscribe; GET /api/push/vapid-public-key
- Service worker (public/sw.js): push → showNotification, notificationclick → focus/open
- "notify me" button on tip page; registers SW + subscribes on permission grant
Event bus:
- services/api/src/events/bus.ts: typed EventEmitter wrapper
- Subjects: signals.tip.served, signals.tip.feedback, signals.task.synced
- Same publish/subscribe API NATS JetStream will implement — swap is mechanical
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>