alvis/oO - oO - AgapGit

alvis/oO

Author	SHA1	Message	Date
alvis	772bb6e194	feat(consents): auto-grant data:<provider> on connect; remove agent: consents (ADR-0015) - integrations.ts: grant data:<provider> on OAuth callback, revoke on disconnect - Backfill migration: INSERT OR IGNORE data:<provider> for all active tokens - Agent manifests: drop agent:<id> from required_consents (momentum, time-of-day, overdue-task, recent-patterns, health-vitals) — per-agent control is a preference - eligibility.ts: update comment to reflect data:-only consent model - test_manifest.py: assert no agent: consents remain in any manifest - migrations.test.ts: backfill idempotency tests for issue #127 - Dockerfile.api: drop --offline flag (fixes ERR_PNPM_NO_OFFLINE_META) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 15:09:58 +00:00
alvis	34925310cf	docs: update focus-area manifest description and CLAUDE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 15:00:06 +00:00
alvis	f66f337779	feat(focus-area): use enriched descriptions in cluster output cluster_tasks now attaches enriched_description to each task dict. focus-area reads enriched_description (falling back to raw content) when building the area summary, so the orchestrator sees the expanded 3-sentence descriptions instead of terse raw titles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 14:58:31 +00:00
alvis	f6b89fc849	refactor(focus-area): output all clusters as context; remove scoring and preferred_areas The agent no longer picks a winner — it summarises every cluster so the orchestrator can decide what's relevant. Scoring by overdue count overlapped with the overdue-task agent. preferred_areas (project-ID based, broken label matching) removed entirely. Output format: numbered list of areas with task titles included. Snapshot: {cluster_count, clusters: [{label, task_count, tasks}]}. Version bumped to 3.0.0; inferred_params cleared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 14:57:04 +00:00
alvis	12c956b588	fix(clustering): drop TTL check from isUpToDate; task hash is the only signal If tasks haven't changed, the output is valid forever. If they changed, always recompute regardless of age. TTL on focus-area restored to 24h — it only controls recommender eligibility, not recompute frequency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 14:46:43 +00:00
alvis	d12f11d29d	feat(clustering): 1h TTL + skip recompute when tasks unchanged focus-area now recomputes at most once per hour, and only if the task list actually changed since the last compute. - focus-area TTL: 43200s → 3600s; version bumped to 2.1.0 - computeAndStore hashes sorted task contents (MD5) and checks the stored _task_hash in the existing snapshot; skips the ml-serving call when the hash matches and the output isn't expired - ml-serving injects _task_hash into the snapshot so the next cycle can compare Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 14:45:15 +00:00
alvis	9ddeea6cac	feat(clustering): persistent enrichment cache in task_enrichments table Each unique task title is now enriched by LiteLLM once and cached in the DB. Subsequent agent compute cycles (every 12h) fetch the cache before calling ml-serving; only new titles hit the tip-generator. - DB: task_enrichments(content_hash PK, description, model, created_at) - TS: fetchEnrichmentCache / persistEnrichments helpers in agent-outputs.ts; enrichment_cache passed in compute request, new_enrichments persisted from response - Python: AgentComputeRequest.enrichment_cache / AgentComputeResponse.new_enrichments; AgentInput.enrichment_cache; _enrich_batch returns (descriptions, new_entries); cluster_tasks returns (clusters, new_enrichments) - FocusAreaAgent stashes new_enrichments in signals_snapshot under _new_enrichments; compute_agent endpoint pops it before storing the snapshot Closes part of #129 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 14:39:35 +00:00
alvis	08d08ad7b0	feat(clustering): LLM-enrichment before embedding (port from taskpile #129 ) Ported from taskpile experiments/clustering_eval (prompt v1, qwen2.5:1.5b). The experiment showed ARI 0.22→0.77 and AUROC 0.76→0.91 on synthetic tasks when embedding LLM-expanded descriptions instead of raw titles. - Expand each task title via LiteLLM tip-generator before embedding - Prefix with "clustering: " (nomic-embed-text task instruction prefix) - Cache expansions in-memory by content hash within a compute cycle - Falls back to raw title if enrichment fails; no change to fallback behaviour Fixes #129 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 14:20:48 +00:00
alvis	1ca2351488	fix(clustering): route embeddings through LiteLLM instead of Ollama directly The old code called Ollama's /api/embeddings one task at a time, which caused silent fallback to project-based grouping when host.docker.internal:11434 was unreachable from the ml-serving container. - Switch to LiteLLM /embeddings (model alias "embedder") as primary path - Batch all task contents in one request instead of N serial calls - Fall back to Ollama /api/embed (updated to current API) when LITELLM_URL is absent - Update tests to mock _embed_batch instead of the removed _embed Fixes #123 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 13:42:53 +00:00
alvis	4e9210fcef	fix(web): wrap loadTip in arrow fn to satisfy MouseEventHandler type	2026-05-12 13:34:46 +00:00
alvis	59c493323f	fix(recommender): remove Todoist fallback on orchestrator failure; add snooze exclusion When fetchOrchestratorTip returned null (LiteLLM timeout, bad JSON, etc.) the recommender silently fell back to randomPolicy, serving a raw Todoist task with no rationale — explaining both reported symptoms. - Remove randomPolicy/signalToCandidate; return 204 when orchestrator fails so the UI shows "All clear" instead of a confusing Todoist task - Pass recent_tip through the stack (frontend → POST /recommend → fetchOrchestratorTip → ml/serving RecommendRequest → build_orchestrator_messages) so after snooze the LLM is instructed not to repeat the snoozed content Fixes #122 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 13:28:32 +00:00
alvis	d4b40e2590	docs: document MLflow trace API, span inspection, and no-agent diagnosis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 11:23:13 +00:00
alvis	a0a069c525	fix(admin): break redirect loop on /forbidden for non-admin users The middleware was redirecting non-admins to /forbidden but /forbidden wasn't excluded from the matcher, so the middleware ran again on that page, saw a non-admin, and redirected again — infinite loop. Added /forbidden to the pass-through list alongside /login. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 11:12:16 +00:00
alvis	d1f28666b0	feat(integrations): add Google Health (Fit) integration with full permissions OAuth2 flow with all 11 Google Fitness scopes (activity, body, sleep, heart rate, nutrition, location, blood glucose/pressure/temperature, oxygen saturation, reproductive health). Stores access + refresh tokens; auto-refreshes on expiry. GoogleHealthSignalSource fetches steps, sleep sessions, active minutes, calories, and heart rate from the Fit aggregate + sessions APIs. Signals flow into both the tip orchestrator and the health-vitals pre-compute agent, which generates prompt snippets about step progress, sleep deficit, sedentary time, and elevated heart rate. Signal.kind extended with 'health'; IntegrationProvider extended with 'google-health'. Agent compute signal mapping enriched to include source, kind, and all features so health-vitals can filter its own signals. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 11:12:11 +00:00
alvis	161e654027	feat(serving): replace MLflow run logging with native trace spans Convert ml-serving from isolated MLflow runs to nested traces using mlflow.start_span_no_context(). The recommend endpoint now emits a full span tree: recommend (CHAIN) → build_context (TOOL), agent:* (AGENT) ×N, llm_orchestrator (LLM). Compute and infer endpoints each emit a single span. Supporting changes: - mlflow-skinny>=3.1.0 added to requirements - MLflow configured with --serve-artifacts + mlflow-artifacts:/ default root for cross-container artifact proxy (spans now persist from ml-serving) - --allowed-hosts extended to include mlflow:5000 (SDK includes port in Host) - science_destiny slider wired through prompts.py and recommend endpoint - Config page exposes science/destiny slider (0=data-driven, 100=intuitive) - Tip page shows rationale inline on tap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 08:26:05 +00:00
alvis	afacc34969	fix(agents): instruct orchestrator to output tip in English Small models (qwen2.5:1.5b) mirror the language of task title content in the prompt. Adding an explicit English note to snippets that embed raw task titles (focus-area, overdue-task) prevents language bleed. Also added the instruction to the orchestrator system prompt and user message as belt-and-suspenders. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 11:53:21 +00:00
alvis	c124ff4d24	docs: update CLAUDE.md with session learnings (#118 tracing, compose gotchas) - Clarify compose profile requirement for build/up (silent no-op without --profile) - Add --force-recreate pattern for env-var-only changes - Document MLflow host_header and auth gotchas for container-to-container calls - Record MLflow tracing addition and #118 M4 tracking issue Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 10:41:57 +00:00
alvis	95e1b342b4	fix(serving): wire MLflow auth and Host header for container-to-container calls - Pass MLFLOW_ADMIN_PASSWORD as fallback password credential - Set host_header='localhost' to satisfy MLflow's --allowed-hosts check (MLflow rejects Host: mlflow but accepts Host: localhost) - Default MLFLOW_TRACKING_URI to http://mlflow:5000 in compose so the env_file value is not silently overridden to empty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 10:39:08 +00:00
alvis	c43dbaf23d	feat(serving): add MLflow tracing to ml-serving for all agent calls Logs one MLflow run per /recommend (params, token metrics, latency, full prompt + tip as artifacts) and per /agents/{id}/compute and /infer call (signals snapshot, inferred prefs, latency). Tracing is a no-op when MLFLOW_TRACKING_URI is unset; ml-serving starts and serves tips correctly without MLflow configured. Refs #118 (M4: remove from production / move off critical path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 10:30:24 +00:00
alvis	488a764519	docs: mark M2 complete in README All M2 items shipped: ADR-0014 (unified profile + inference framework), per-agent auto-inference, tip generator, TipCandidate schema, prompt versioning, model benchmark, task clustering, UX refinements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 08:02:44 +00:00
alvis	c67f2b14c4	docs: update CLAUDE.md with #61 completion and feature test patterns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 07:45:40 +00:00
alvis	17b9516903	feat(features): mirror invalidatedBy into Python ProfileFeature (#61 ) Adds invalidated_by: tuple[str, ...] to ProfileFeature, mirroring the invalidatedBy bus subjects from registry.ts. Adds a test that parses the TS source and asserts Python stays in sync — same drift-detection pattern used for names and ttlSec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 07:10:36 +00:00
alvis	a75be0d832	docs: update CLAUDE.md with session learnings (#97 , #113 ) - focus-area v2.0.0 completion in recent completions; remove from active work - Update focus-area inferred params table row - min_history gotcha: checked against events, not task_completions - httpx trust_env=False rule for ml/ code - Agent test command Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 06:56:17 +00:00
alvis	26fc67776f	feat(agents): semantic task clustering + focus-area inferred preferred_areas (#97 , #113 ) - New ml/agents/clustering.py: embed task content via nomic-embed-text (Ollama), greedy cosine clustering (threshold 0.72, max 6 clusters), graceful fallback to project-id grouping when Ollama is unreachable - focus_area v2.0.0: compute() uses semantic clusters as focus areas; adds preferred_areas InferredParam inferred from top-2 projects by task_completion count - 135 tests, all passing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 06:54:46 +00:00
alvis	336644a90a	docs: update CLAUDE.md with rich per-agent inference completions (#112–#116) - Inference framework table updated: all agents at v1.2.0 with full param list - Documents UserHistory.task_completions and AgentInferRequest.task_completions - Marks #112/114/115/116 complete in recent completions - Active work updated: #78 closed, #61 and #97/#113 as next priorities Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 06:28:30 +00:00
alvis	1d9a395591	feat(agents): quiet window + peak hours + tz prefs for time-of-day agent (#112 ) Adds four InferredParams (all TTL=24h, min_history=50 except preferred_hour=10): - quiet_start / quiet_end: longest contiguous below-baseline hour run (HH:MM) - peak_hours: top-quartile done-event hours, sorted ascending - tz: cold-start only ("UTC"); populated from auth provider, no inference function compute() updated: - in_quiet check (quiet window) takes precedence over peak hours - in_peak emits "peak productivity hour" language when current hour is in peak_hours - approaching peak (within 2h) surfaces for orchestrator timing - tz surfaced in snippet header when not UTC - snapshot adds peak_hours, in_quiet, in_peak, tz - Agent bumped to v1.2.0 - 21 new tests: night-owl, early-bird, shift-worker, quiet/peak snippet rendering - Fixed test_snapshot_keys in test_agents.py to include new snapshot fields Closes #112 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 06:05:51 +00:00
alvis	bc71dc203d	feat(agents): adaptive lookback + weekly/daily cycle detection for recent-patterns (#116 ) Replaces the coarse density-bucket window_days with three InferredParams (all TTL=24h): - lookback_days: min window containing ≥30 done events, capped at 30d (min_history=5) - weekly_cycle: per-DOW peak-to-mean strength list (min_history=21, ≥3 weeks of signal) - daily_cycle: per-hour peak-to-mean strength list (min_history=14) compute() renders cycle hints when strength > 0.5: "User tends to complete tips on Tuesdays and Saturdays." "User is most active around 8pm." Legacy window_days pref key still accepted as a fallback. - window_days pref renamed lookback_days; backward-compat fallback in compute() - Agent bumped to v1.2.0 - 19 new tests: weekend-warrior, weekday-only, evening-person, no-pattern, legacy compat, snippet rendering with strong/weak signals Closes #116 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 05:51:45 +00:00
alvis	4cade4868b	feat(agents): per-user baseline + stdev inference for momentum agent (#114 ) Adds two InferredParams (TTL=7d) computed from 28-day rolling daily done counts: - baseline_completions_per_day: mean done events/day over the window - stdev: stdev of daily counts (floored at 0.1 to avoid division by zero) MomentumAgent.compute() now calculates a z-score from recent done events in inp.feedback_history vs the inferred baseline. Snippet language switches to z-score framing ("above your usual pace", "slowing down") when \|z\| >= 1.0, falling back to engagement_trend labels when in the normal range. - engagement_trend InferredParam preserved for backward compatibility - momentum_window pref added (default 7, user-overridable) - 14 new tests covering power user, casual user, returning-from-break, and relative stdev comparison; engagement_trend tests updated for z-score priority - Agent bumped to v1.2.0 Closes #114 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 05:18:29 +00:00
alvis	04212ff318	feat(agents): p50-lateness tolerance + per-project realness for overdue-task (#115 ) Replaces snooze-rate heuristic with p50 of actual task lateness (completedAt − dueAt). Adds project_realness inference: projects with chronic lateness get realness < 1 and the agent softens its snippet language from "overdue" to "past target date". - TaskCompletion added to UserHistory with lateness_days computed property - _infer_lateness_tolerance: p50 of task_completions, clipped at 0, float - _infer_project_realness: per-project median lateness normalised by global median - Both InferredParams use 7d TTL; cold_start = 0.0 / {} - AgentInferRequest accepts task_completions; endpoint wires them through - 12 new tests covering punctual/chronic/mixed users and language softening - Agent bumped to v1.2.0 Closes #115 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 05:14:04 +00:00
alvis	35257b7756	docs: mark ADR-0014 complete in CLAUDE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 11:50:42 +00:00
alvis	ed1705cb5d	feat(db): drop users.consentGiven/consentAt (ADR-0014 step 8) Backfills consent_given=1 rows into user_consents as data:core before dropping the legacy columns. auth.ts now writes user_consents on signup; POST /consent writes user_consents; admin/user routes cleaned of the old fields. Migration is idempotent — DROP COLUMN is wrapped in try/catch so it no-ops on fresh DBs that never had the columns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 11:50:27 +00:00
alvis	afb0e9b0cb	feat(agents): per-agent inference — momentum, overdue-task, recent-patterns, focus-area (ADR-0014 step 7) All four agents bumped to v1.1.0. momentum (#114): infers engagement_trend ('up'\|'stable'\|'down') by comparing done-rate in the last 7 days vs the prior 7 days. Agent surfaces the trend in its snippet ("trending up — build on the momentum"). overdue-task (#115): infers lateness_tolerance_days (0/1/2) from snooze rate. Agent now filters tasks against the tolerance so low-urgency users aren't nagged about tasks that are only hours overdue. recent-patterns (#116): infers window_days (7/14/30) from feedback event density — sparse users get a wider window so the snippet isn't always empty. focus-area (#113): no inferred params (project-level feedback linkage needed, tracked under #78). preferred_areas pref was declared but ignored; agent now honours it as a tiebreaker and mentions it in the snippet. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 11:21:10 +00:00
alvis	ad6747c242	feat(profile): /api/profile + eligibility filter + inference framework (ADR-0014 steps 4-6) Step 4 — /api/profile read-through API: GET /api/profile → { user, prefs, consents, contexts } PATCH /api/profile/prefs/:scope upsert user_preferences (source='user') PATCH /api/profile/consents grant / revoke consent keys PATCH /api/profile/contexts create / activate / deactivate contexts Legacy consentGiven bit folded in as data:core fallback. Step 5 — registry-driven eligibility filter: fetchRegistry() exported from agent-registry.ts. profile/eligibility.ts: getEligibleAgentIds(userId) — filters by required consents, silenced_in_contexts, and user_preferences[enabled=false]. fetchOrchestratorTip filters agent_outputs to eligible set before calling ml/serving /recommend. Fail-closed: registry unavailable → empty set. Step 6 — shared context-inference framework (#111) + time-of-day proof (#112): ml/agents/inference/: UserHistory, FeedbackEvent, run_inference(). Framework: cold-start, min_history gating, error fallback, structured logs. TimeOfDayAgent v1.1.0: inferred_params=[preferred_hour]; also reads quiet_start/quiet_end from agent_prefs. agent_prefs injected by TS caller. AgentInput gains agent_prefs field. ml/serving: POST /agents/{agent_id}/infer endpoint. agent-outputs.ts computeAndStore: loads prefs before compute, calls /infer after, persists results (source='inferred'); user overrides never touched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 11:14:25 +00:00
alvis	305eeae38b	feat(agents): manifest plumbing + GET /agents/registry (ADR-0014 step 3) Each agent now exports a module-level MANIFEST declaring id, version, pref_schema, required_consents, ttl_sec, and silenced_in_contexts. The registry surfaces both the agent and its manifest, and rejects on mismatch so the two cannot drift. ml/serving exposes GET /agents/registry; services/api proxies it as GET /api/agents/registry with a 60s in-process cache so admin pageviews don't hammer upstream. Failures aren't cached. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 10:55:54 +00:00
alvis	5d43339616	feat(api): unified Profile schema + consent backfill (ADR-0014 step 1-2) Adds user_preferences, user_consents, user_contexts and the tone / tip_kinds_json columns on users. Backfills consent_given=1 rows into user_consents as data:core; INSERT OR IGNORE keeps it idempotent and respects later revocations. Migration body moves to db/migrations.ts so tests can apply it to a fresh in-memory handle without opening the prod DB on import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 10:28:47 +00:00
alvis	d454a0a8bf	docs: ADR-0014 — unified Profile model + agent registry Propose a shared substrate for per-user prefs, contexts, per-key consents, and per-agent state so adding an agent stays a manifest change. Updates CLAUDE.md, README, and architecture docs to reflect the multi-agent pipeline (ADR-0013) and the registry direction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 10:19:07 +00:00
alvis	41302d9f36	fix: repair Docker build — TS errors and missing docs in image - Remove unused `httpx` import from bench.ts (package does not exist) - Add explicit `IRouter` type on `router` in agent-outputs.ts and bench.ts to resolve TS2742 portable-type errors - Remove `docs` from .dockerignore so Dockerfile.admin can copy it into the runner image (DOCS_ROOT=/app/docs is read at runtime by the admin) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:52:27 +00:00
alvis	05f748159b	chore: remove shadow policy machinery (ADR-0013 step 10) Deletes shadowPolicies map, getShadowPolicies, setPolicyActive from recommender.ts; removes /api/admin/policies routes from admin.ts; removes getPolicies, togglePolicy, PolicyInfo from admin api.ts; removes the policy toggle section from the ops page. 168 API tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:45:32 +00:00
alvis	8e9718e8ba	chore(ml): remove bandit endpoints + helpers (ADR-0013 step 9) Deletes all LinUCB and ε-greedy code from ml/serving: score, reward, stats, reset, features endpoints; feature vector builders; per-user state file helpers; related Pydantic models; numpy/math/time imports. Removes test_score.py (pure bandit unit tests). 40 remaining tests pass. STATE_DIR kept — nats_consumer still writes sync metadata there. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:41:58 +00:00
alvis	c65bedcf68	feat(api): orchestrator cutover — replace bandit with multi-agent pipeline (ADR-0013 step 6) POST /recommend now calls ml/serving /recommend with pre-computed agent snippets + task context instead of /generate + /score/egreedy/v2. Falls back to a random signal candidate when ml/serving is unavailable. Removes: remotePolicy, fetchLlmCandidates, sendRewardWithRetry, candidateCache, pickPromptVersion. Feedback handler keeps inferReward + tipFeedback writes for observability; reward delivery to the bandit is gone. tipScores.policy is now 'orchestrator'; promptVersion is 'v4-orchestrator'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:37:15 +00:00
alvis	7e958a779d	feat(api): agent pre-compute scheduler (ADR-0013 step 5) Extracts computeAndStore() from the /agents/:agentId/compute route so it can be called without an HTTP round-trip. startAgentPrecomputeScheduler() runs every 15 min: fetches active users (tip view in 48h), runs all agents in parallel per user, then purges outputs expired >24h. Agent IDs are resolved from ml/serving /health at startup with a fallback hardcoded list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:29:50 +00:00
alvis	37aec4fee1	chore: ADR-0007/0012 superseded status + admin users ID column ADR-0007 and ADR-0012 both superseded by ADR-0013 as of 2026-05-01. UsersTable gains a truncated ID column for quick user identification. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:20:44 +00:00
alvis	b3cf588f2f	feat(ml): multi-agent context framework + v4 orchestrator prompt Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum, time_of_day, recent_patterns, focus_area) each producing a prompt snippet from user signals. A registry wires them up; the orchestrator prompt in ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM. Also wires /api/agents route in the API and updates the Dockerfile to copy the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:20:05 +00:00
alvis	f8d66aa01f	chore: remove Airflow completely from the stack Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-03 16:38:46 +00:00
alvis	ce1c8bde57	fix(admin): simulations view-only + docs path in Docker (#109 #110 ) - simulate/page.tsx: remove launch form — simulations are triggered via Airflow DAG, not the admin UI. Page now shows run history + links to Airflow and MLflow only (#109) - docs.ts: use DOCS_ROOT env var (fallback: ../../docs for local dev) so the path works in Docker standalone where CWD is /app (#110) - Dockerfile.admin: copy docs/ into the runner image at /app/docs and set DOCS_ROOT=/app/docs so listAllDocs() finds the files at runtime (#110) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:55:50 +00:00
alvis	c1f5fcb561	fix(admin): ops page — add section description, remove redundant footer (#107 ) Adds a one-line purpose description under the Ops heading so it is clear what the section is for (shadow policy toggles, signal replay, per-user actions). Removes the duplicate "User-level actions" subsection whose content is now covered by the header description. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:53:35 +00:00
alvis	9bd60a9835	feat(web): action sheet cleanup + settings page (#100 #101 #102 ) - Remove "Helpful"/"Not helpful" from action sheet — reward is inferred from done/snooze/dismiss + dwell time; explicit sentiment buttons were redundant and cluttered the UI (#100) - Move "notify me" push subscription button to new /config page (#101) - Add settings gear icon (bottom-right, fixed) on tip page linking to /config (#102) - New /config page: push notification toggle + link to /connect integrations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:52:45 +00:00
alvis	4267e6ac68	feat(ml/serving): inject profile features + sort tasks in tip prompt (#79 ) - prompts.py: sort tasks overdue-first → priority desc → age desc before rendering into the LLM prompt (same ordering as ml/features/context.py) - prompts.py: render User profile summary line (completion_rate, dismiss_rate, preferred_hour) when profile_features are present - main.py: add profile_features field to PromptContext; plumb from GenerateRequest into the prompt builder via model_copy - logging_config.py: drop add_logger_name processor (incompatible with PrintLoggerFactory — caused test ordering failures) - test_generate.py: 6 new tests covering sort order, profile rendering, partial fields, empty profile, and end-to-end plumbing through /generate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:46:16 +00:00
alvis	0474ad4deb	feat(airflow): integrate bench harness into bench_collect DAG New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks: 1. collect.py — generates candidates, logs to MLflow 2. export_for_judge — exports pending runs for Claude Code scoring 3. compare — generates leaderboard by (model, prompt) cell Config via dag_run.conf supports all collect.py options (models, prompts, n_tips, n_scenarios, temperature, experiment name, max_model_b). New admin API endpoints (`services/api/src/routes/bench.ts`): - GET /api/bench/experiments — list tip-bench-* experiments - POST /api/bench/run — trigger DAG with custom config - GET /api/bench/runs/:experiment — list runs in experiment - GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt) All endpoints require admin auth. Human judge (Claude Code) scores are applied manually post-export; future enhancement: add webhook to DAG. Admin UI can now trigger and monitor benchmarks from a dashboard panel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 11:54:30 +00:00
alvis	556019b060	feat(bench): MLflow-based tip-generation benchmark harness (#93 , #95 ) Combines model evaluation (#93) and prompt A/B testing (#95) into one experiment. Evaluates all (model × prompt × scenario) cells on the same fixed contexts so quality differences are attributable. Architecture: - Phase A (collect.py): generates candidates per cell, logs to MLflow with judge_pending=true. Rejects models >4B, uses keep_alive=0 for RAM safety (no concurrent model weights in VRAM). - Phase B (judge_cli.py): exports pending runs as JSON for Claude Code to score per the rubric, then applies scores back to MLflow. - Phase C (compare.py): leaderboard by (model, prompt) cell. Rubric (tip-v1) defines 1–5 scales for relevance, actionability, tone, plus format_ok and overlong flags. Composite = rel + act + tone + 2×format_ok − overlong. Rubric is self-describing and persisted in every run so judges use consistent criteria across sessions. Artifacts (prompts, candidates, raw responses) stored as MLflow tags because the server uses a file:// backend not accessible via REST. Full artifacts accessible in MLflow UI → run → Tags section. Tested end-to-end on local machine: - 4 models (qwen2.5:0.5b/1.5b, gemma3:1b, llama3.2:3b) ≤4B - 3 prompts (v1, v2-mentor, v3-few-shot) - 4 scenarios (4 personas × 2 time-slots) - 48 cells total, all judged and ranked Winner: qwen2.5:1.5b × v3-few-shot (composite=12.75). Ready for integration into Airflow prompt_ab_eval DAG and admin UI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 11:48:59 +00:00

1 2

95 Commits