alvis/oO - oO - AgapGit

alvis/oO

Author	SHA1	Message	Date
alvis	b3cf588f2f	feat(ml): multi-agent context framework + v4 orchestrator prompt Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum, time_of_day, recent_patterns, focus_area) each producing a prompt snippet from user signals. A registry wires them up; the orchestrator prompt in ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM. Also wires /api/agents route in the API and updates the Dockerfile to copy the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 10:20:05 +00:00
alvis	f8d66aa01f	chore: remove Airflow completely from the stack Drop all four Airflow containers (db, init, webserver, scheduler) from the mlops compose profile, leaving MLflow as the sole mlops service. Remove AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav links and DAG-run links in the admin UI, the two Airflow DAG files (bench_dag.py, sim_dag.py), and all related docs/ADR references. Simulations now run exclusively via the subprocess path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-03 16:38:46 +00:00
alvis	ce1c8bde57	fix(admin): simulations view-only + docs path in Docker (#109 #110 ) - simulate/page.tsx: remove launch form — simulations are triggered via Airflow DAG, not the admin UI. Page now shows run history + links to Airflow and MLflow only (#109) - docs.ts: use DOCS_ROOT env var (fallback: ../../docs for local dev) so the path works in Docker standalone where CWD is /app (#110) - Dockerfile.admin: copy docs/ into the runner image at /app/docs and set DOCS_ROOT=/app/docs so listAllDocs() finds the files at runtime (#110) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:55:50 +00:00
alvis	c1f5fcb561	fix(admin): ops page — add section description, remove redundant footer (#107 ) Adds a one-line purpose description under the Ops heading so it is clear what the section is for (shadow policy toggles, signal replay, per-user actions). Removes the duplicate "User-level actions" subsection whose content is now covered by the header description. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:53:35 +00:00
alvis	9bd60a9835	feat(web): action sheet cleanup + settings page (#100 #101 #102 ) - Remove "Helpful"/"Not helpful" from action sheet — reward is inferred from done/snooze/dismiss + dwell time; explicit sentiment buttons were redundant and cluttered the UI (#100) - Move "notify me" push subscription button to new /config page (#101) - Add settings gear icon (bottom-right, fixed) on tip page linking to /config (#102) - New /config page: push notification toggle + link to /connect integrations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:52:45 +00:00
alvis	4267e6ac68	feat(ml/serving): inject profile features + sort tasks in tip prompt (#79 ) - prompts.py: sort tasks overdue-first → priority desc → age desc before rendering into the LLM prompt (same ordering as ml/features/context.py) - prompts.py: render User profile summary line (completion_rate, dismiss_rate, preferred_hour) when profile_features are present - main.py: add profile_features field to PromptContext; plumb from GenerateRequest into the prompt builder via model_copy - logging_config.py: drop add_logger_name processor (incompatible with PrintLoggerFactory — caused test ordering failures) - test_generate.py: 6 new tests covering sort order, profile rendering, partial fields, empty profile, and end-to-end plumbing through /generate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-27 13:46:16 +00:00
alvis	0474ad4deb	feat(airflow): integrate bench harness into bench_collect DAG New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks: 1. collect.py — generates candidates, logs to MLflow 2. export_for_judge — exports pending runs for Claude Code scoring 3. compare — generates leaderboard by (model, prompt) cell Config via dag_run.conf supports all collect.py options (models, prompts, n_tips, n_scenarios, temperature, experiment name, max_model_b). New admin API endpoints (`services/api/src/routes/bench.ts`): - GET /api/bench/experiments — list tip-bench-* experiments - POST /api/bench/run — trigger DAG with custom config - GET /api/bench/runs/:experiment — list runs in experiment - GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt) All endpoints require admin auth. Human judge (Claude Code) scores are applied manually post-export; future enhancement: add webhook to DAG. Admin UI can now trigger and monitor benchmarks from a dashboard panel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 11:54:30 +00:00
alvis	556019b060	feat(bench): MLflow-based tip-generation benchmark harness (#93 , #95 ) Combines model evaluation (#93) and prompt A/B testing (#95) into one experiment. Evaluates all (model × prompt × scenario) cells on the same fixed contexts so quality differences are attributable. Architecture: - Phase A (collect.py): generates candidates per cell, logs to MLflow with judge_pending=true. Rejects models >4B, uses keep_alive=0 for RAM safety (no concurrent model weights in VRAM). - Phase B (judge_cli.py): exports pending runs as JSON for Claude Code to score per the rubric, then applies scores back to MLflow. - Phase C (compare.py): leaderboard by (model, prompt) cell. Rubric (tip-v1) defines 1–5 scales for relevance, actionability, tone, plus format_ok and overlong flags. Composite = rel + act + tone + 2×format_ok − overlong. Rubric is self-describing and persisted in every run so judges use consistent criteria across sessions. Artifacts (prompts, candidates, raw responses) stored as MLflow tags because the server uses a file:// backend not accessible via REST. Full artifacts accessible in MLflow UI → run → Tags section. Tested end-to-end on local machine: - 4 models (qwen2.5:0.5b/1.5b, gemma3:1b, llama3.2:3b) ≤4B - 3 prompts (v1, v2-mentor, v3-few-shot) - 4 scenarios (4 personas × 2 time-slots) - 48 cells total, all judged and ranked Winner: qwen2.5:1.5b × v3-few-shot (composite=12.75). Ready for integration into Airflow prompt_ab_eval DAG and admin UI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 11:48:59 +00:00
alvis	e40dfdcbb0	chore(infra): wire MLflow/Airflow env vars, fix healthcheck, add .dockerignore Some checks failed buf-check / Lint & breaking-change check (push) Has been cancelled Details - docker-compose: pass ML_SERVING_URL, MLFLOW_URL, AIRFLOW_URL + creds to api service - docker-compose: pass NEXT_PUBLIC_MLFLOW_URL/AIRFLOW_URL to admin service - docker-compose: replace wget healthcheck with node fetch (wget not in node image) - docker-compose: enable Airflow basic_auth API backend; add MLflow pip dep for DAGs - Dockerfiles: tighten layer caching, add .dockerignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 12:08:43 +00:00
alvis	bad1bb2cba	feat(simulate): MLflow tracking, Airflow DAG integration, health checks for mlflow/airflow - sim_runs schema: add judge_mode, n_policies, airflow_dag_run_id, mlflow_run_id columns - admin health endpoint: add mlflow + airflow checks (Basic auth for Airflow API) - admin nav: add Simulations page link; rename section label - runner.py: optional MLflow experiment tracking; multi-policy support - sim_dag.py: Airflow DAG for offline sim pipeline - admin simulate page + API client methods for sim runs - shared-types tsconfig: exclude test files from build Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 12:08:36 +00:00
alvis	e96ceb7ee1	feat(auth): token-based admin authentication for Playwright/CI (#105 ) Add POST /api/auth/token — validates ADMIN_TOKEN env var, creates a 24h session and sets the sid cookie so automated tools can access the admin panel without Google OAuth. Admin login page gains a token input form. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 12:07:43 +00:00
alvis	b554970032	docs(observability): add services/api README; update ml/serving + recommender docs (#18 ) - services/api/README.md: new — contract, middleware stack, background tasks, config table (LOG_LEVEL, SENTRY_DSN), health story, extraction criteria - ml/serving/README.md: add Observability section (structlog JSON, traceparent → trace_id binding), add SENTRY_DSN + ENV to config table - services/recommender/README.md: fix policy table — egreedy-v2 is active (#99), egreedy-v1 is shadow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 03:41:39 +00:00
alvis	c4960d0601	feat(observability): structured logs, W3C trace IDs, Sentry hooks (#18 ) - TS: pino + pino-http; every HTTP request log includes traceId from W3C traceparent header (generated if absent); forwarded to ml/serving on all /score, /generate, /reward, and /api/ml proxy calls - Python: structlog JSON; FastAPI middleware binds trace_id via contextvars so every log line within a request carries it - Sentry: optional SENTRY_DSN init in both runtimes (no-op if unset) - Replace all console.* calls across services/api with pino logger - Update tests to spy on logger instead of console Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 03:37:28 +00:00
alvis	7281af83a4	feat(bandit): promote egreedy-v2 (D=12, profile features) as active policy (#99 ) Offline sim gate passed — egreedy-v2 mean reward −0.629 vs egreedy-v1 −0.642 (5 users × 20 rounds, rule judge, seed 42). v2 wins 3/5 personas. - recommender.ts: switch remotePolicy() to /score/egreedy/v2 - recommender.ts: switch sendRewardWithRetry() to /reward/egreedy/v2 with profile_features payload so the ridge update uses the full D=12 vector - recommender.ts: re-fetch profile at feedback time (TTL-cached, near-instant) - ADR-0012: status Accepted → Promoted, promotion record appended Shadow entry egreedy-v2-shadow kept in registry (active: false) for rollback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 03:08:28 +00:00
alvis	cba3f1a184	docs(services): update integrations + recommender READMEs for signal abstraction (#78 ) integrations/README — replace stale Connector interface and fictional libsodium vault with the actual SignalSource pattern, SQLite token table, and real OAuth routes. recommender/README — document the SignalAggregator pipeline, current policy registry, and actual /recommend + /feedback contract shapes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 17:17:38 +00:00
alvis	352469162d	fix(signals): add missing source field to TaskSyncedEvent (#78 ) TaskSyncedPayload in shared-types and ml/serving schemas both require source, but TaskSyncedEvent in bus.ts and the todoist publish call both omitted it — causing the JetStream consumer to nak every task.synced message on validation failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 17:15:32 +00:00
alvis	45416000f9	feat(features): per-feature freshness spec — JIT vs batched (#61 ) Each ml/features/*.py now declares freshness, source, and fallback per feature. ProfileFeature gains ttl_sec (mirrored from registry.ts), freshness="batched", source, and fallback. context.py adds ContextFeatureSpec + CONTEXT_FEATURES for the three JIT features (hour_of_day, day_of_week, tasks). CI test parses ttlSec from registry.ts to catch drift. ml/README updated with split JIT/batched feature contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 17:02:55 +00:00
alvis	bd3ea1b8b1	docs(schema): update docs for #54 — proto registry + buf CI gate - packages/shared-types/README.md: new — documents HTTP vs event surfaces, proto file layout, schema evolution rules, and how to run buf locally - ml/serving/README.md: note pydantic payload validation in consumer section - CLAUDE.md: replace "schema registry enforced when #54 lands" with the actual state; remove #54 from active-work list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:53:20 +00:00
alvis	377373a95d	test(schema): unit tests for schemas.py and nats_consumer._handle (#54 ) 17 tests covering: pydantic model validation (all payload types, optional fields, invalid enum values, missing required fields), _handle write path for task_synced, validation errors surfaced through _make_handler causing nak instead of ack. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:51:15 +00:00
alvis	d539fde0c1	feat(schema): protobuf event registry + buf CI gate (#54 ) - Add proto schemas in packages/shared-types/events/ (oo.events.v1): envelope.proto, signals.proto, integration.proto - buf.yaml with STANDARD lint + FILE breaking-change rules - .gitea/workflows/buf-check.yaml: lint + breaking check on every PR touching events/ (needs a Gitea Actions runner to execute) - scripts/buf-check.sh: local equivalent of the CI check - NormalizedEvent TS envelope gains eventId, schemaVersion, producer to align with the proto Envelope message - ml/serving/schemas.py: pydantic models mirroring the v1 proto types - nats_consumer.py: validate payloads via pydantic instead of raw .get() A field-rename PR will now fail buf breaking with exit code 100 and show the offending messages. To make a breaking change: keep the old field reserved, add the new one, bump schema_version to v2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:48:24 +00:00
alvis	f48b5a7646	docs(ml): serving README + update ml/README and CLAUDE.md for #98 - ml/serving/README.md: new — contract, JetStream consumer docs, config, health story, extraction criteria, state file reference - ml/README.md: note JetStream consumers in serving/ row - CLAUDE.md: update active work to reflect #98 shipped, #99 still pending Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:21:40 +00:00
alvis	4652e4b582	feat(ml): JetStream durable consumers in ml/serving (#98 ) Adds a NATS JetStream consumer to ml/serving so the feature pipeline can react to events without the API triggering every read. - nats_consumer.py: durable push consumers for signals.> and feedback.> streams; acks on success, naks for redeliver, up to NATS_MAX_DELIVER attempts; per-consumer health state (last_msg_ts, processed, errors) - main.py: FastAPI lifespan wires start/stop; /health exposes nats state - requirements.txt: adds nats-py>=2.9.0 - Dockerfile.ml: copy all *.py from ml/serving (was missing prompts.py) Handled subjects: signals.task.synced → writes per-user sync metadata to STATE_DIR signals.tip.feedback → logged for observability (reward via HTTP path) Config: NATS_URL (empty = disabled), NATS_DURABLE_PREFIX, NATS_MAX_DELIVER Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:19:47 +00:00
alvis	2d7cf217a9	feat(ml): egreedy-v2 shadow policy — D=12 with profile features (#99 ) Ship the scaffolding for #99 (phase B.3 of #81): - ml/serving: add /score/egreedy/v2, /reward/egreedy/v2, /stats/egreedy/v2 endpoints (D=12). New feature dims: completion/dismiss rates, mean dwell (clipped 10min), preferred-hour alignment (cosine, 1-dim), tip volume (log). Separate state file per user (_egreedy_v2.json). /reset clears v2 state too. - ADR-0012: documents D=7→12 dimension change, normalization choices, shadow rollout protocol, and promotion gate (offline sim win per ADR-0002). - recommender.ts: register egreedy-v2-shadow in shadow-policy map (disabled by default). When enabled, calls /score/egreedy/v2 fire-and-forget and publishes shadow:egreedy-v2-shadow serve signal. No reward to shadow — sim is the gate. - sim runner/personas: personas carry synthetic profile_features per persona; _call_score/_call_reward thread profile_features through (None-safe for v1/linucb). - 18 new Python tests; all 56 Python + 170 TS tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:00:38 +00:00
alvis	b8113d4bda	docs(adr-0011): point B.3 at new issue #99 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:41:20 +00:00
alvis	ee4eb15022	feat(profile): event-driven invalidation (#81 phase B.2) Features now declare invalidatedBy subjects in the registry; the new profile/subscriber.ts subscribes to each unique subject and drops matching stored rows for the userId in the payload. Next getProfile call recomputes from current data instead of waiting up to ttlSec. Wiring: completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour ← signals.tip.feedback tip_volume_30d ← signals.tip.served TTL stays as a safety net for clock drift and dropped events. Registration validates each declared subject against KNOWN_SUBJECTS (mirror of EventMap) so typos throw at startup, not silently. ADR-0011 updated. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:38:45 +00:00
alvis	4a42a6aabf	feat(admin): profile freshness panel in data-quality (#81 phase B.4) Adds a per-feature freshness summary to /admin/data-quality so the admin can spot features that are systematically stale or never computed: totalEligible — distinct users with tip_views in the last 30 days missing — eligible users with no row stored for the feature stale — eligible users whose stored row is past its TTL Backend exposes summarizeProfileFreshness() in profile/builder.ts; one query per feature joins eligible users LEFT JOIN profile rows. Coverage = (eligible − missing − stale) / eligible, colored green/yellow/red via the new PctGood helper (high-is-good, opposite of the existing Pct used for missing-feature/stale-token rates). Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:34:46 +00:00
alvis	9e96540bcc	feat(admin): per-user profile view + rebuild action (#81 phase B.1) Surfaces phase A's profile features in /admin/users/:id so we can verify they're actually computing useful values before investing in bandit consumption. The detail GET now includes profile rows joined with registry metadata (name, value, age, fresh badge, ttlSec, description). Read does NOT trigger compute — staleness must be visible. A new POST .../profile/rebuild button force-recomputes and is audit-logged like reset-bandit. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:27:08 +00:00
alvis	7d4c29e137	feat(profile): user-profile feature registry + builder (phase A) Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:22:22 +00:00
alvis	430804e9a5	feat(ml): prompt registry + per-request variant selection Replaces the hardcoded "v1" label with a real prompt registry: ml/serving/prompts.py — keyed by version: v1 (baseline), v2-mentor (calm/specific persona), v3-few-shot (v1 persona + curated examples) ml/serving/main.py — POST /generate accepts optional prompt_version, 422 on unknown, echoes the version actually used back in the response services/api/src/config.ts — TIP_PROMPT_VERSION: empty / single / comma-list (uniform random per request) services/api/src/routes/recommender.ts — pickPromptVersion() drives selection; the response's prompt_version (not a stale TS constant) is what lands in tip_scores so the #92 reward-analytics dashboard shows real per-variant reaction rates Closes #84. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 15:44:04 +00:00
alvis	aa4bdd8f09	feat(admin): LLM tip quality dashboard — per-model/prompt/kind breakdowns /admin/reward-analytics now surfaces served count, reaction rate, and avg reward grouped by llm_model, prompt_version, and tip_kind — closing the loop so model/prompt iterations in M2 are legible next to the bandit policy view. Data comes from the tip_scores columns added in `ffdf707` and tip_feedback.reward_milli; bandit-only tips show as "(bandit-only)". Closes #92. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 15:24:52 +00:00
alvis	75d0e89906	fix(infra): ml-serving LITELLM_URL default → host.docker.internal:4000 Inside the container, llm.alogins.net times out (public-DNS route, not the loopback path Caddy listens on). host.docker.internal:4000 reaches the Agap LiteLLM directly and is equivalent for dev. Prod deploys override via env. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 12:20:41 +00:00
alvis	d4205a00cf	refactor(infra): drop ai profile; ollama + litellm move to Agap Ollama and LiteLLM are shared Agap services (agap_git/openai/docker-compose.yml); oO never starts them. Removes the ai profile, the litellm config, and the --profile ai runbook; points ml-serving at https://llm.alogins.net by default and adds host.docker.internal host-gateway so the container can hit Agap ollama on the host. Also updates the tip-generator model alias to qwen2.5:1.5b to match the model actually pulled on Agap ollama (7b is ~4.7 GB and would blow VRAM budget). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 12:16:21 +00:00
alvis	d7a2423940	fix(infra): mlflow image tag + python-based healthchecks for ml-serving/mlflow - Corrects mlflow image tag (2.14.3 → v2.14.3); the former tag does not exist on ghcr.io/mlflow/mlflow and caused a manifest-unknown error on pull. - Replaces wget/curl healthchecks with inline python urllib calls — the python:3.12-slim (ml-serving) and ghcr.io/mlflow/mlflow images ship neither wget nor curl, so both containers reported unhealthy despite /health returning 200. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 15:04:18 +00:00
alvis	bb879c5f0f	refactor(admin): drop simulations/experiments/models pages; group nav into sections Removes the in-shell MLOps pages (experiments, models, simulations) and their client API helpers in favour of external MLflow/Airflow links. Nav is regrouped into Signals / Recommender status / Ops sections for clarity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 14:41:17 +00:00
alvis	5b52c6bf40	test: cover NATS bridge + Todoist scheduler; ADR-0010 - bus.test.ts: 4 cases for the new onPublish hook contract - nats.test.ts: stream creation idempotency + JSON publish bridge - scheduler.test.ts: startup delay, fan-out, per-user failure isolation - ADR-0010 documents the bridge-don't-replace decision and the Todoist scheduler isolation, plus open follow-ups (#98 ml/serving consumer, #54 protobuf migration, graceful shutdown, metrics) - README/overview/services README reflect the bridged event substrate - CLAUDE.md gains a "don't nats.publish() directly" rule - .env.example documents NATS_URL + TODOIST_SYNC_INTERVAL_MS Verified in deployment 2026-04-18: api -> nats bridge connects on boot, signals + feedback streams created, scheduler tick logs "todoist sync: 1 ok, 0 failed (1 users)" within 10s. Closes #21, #22. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 07:55:25 +00:00
alvis	2a7380933c	feat: NATS JetStream + Todoist background sync (#21 , #22 ) Issue 21 — event infrastructure: - NormalizedEvent<T> + payload types in packages/shared-types/src/events/ - Bus.onPublish() hook for side-effect bridges - NATS JetStream adapter (services/api/src/events/nats.ts): connects when NATS_URL is set, creates signals.> and feedback.> streams, bridges all in-process bus publishes to JetStream — no-ops gracefully when NATS is absent - NATS service added to docker-compose (profile: events\|full, port 4222/8222) Issue 22 — Todoist background sync: - services/api/src/signals/scheduler.ts: queries all active-token users every 15 min (TODOIST_SYNC_INTERVAL_MS), fan-out via todoistSource.fetchSignals() which emits signals.task.synced; on-demand fetch remains as freshness fallback - NATS_URL + TODOIST_SYNC_INTERVAL_MS added to config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 01:18:51 +00:00
alvis	e3ca3ba733	feat: SignalSource abstraction — generalize signal ingestion beyond Todoist (#78 ) - Add Signal + SignalSource interfaces to packages/shared-types - TipCandidate.features widened to Record<string,number\|boolean> to match Signal - TodoistSignalSource: encapsulates fetch, cache, 401 handling, bus events, and act() - SignalAggregator: parallel fan-out across sources with per-source failure isolation - Recommender refactored to consume Signal[] via aggregator; source action dispatch via aggregator.act() - ADR-0009: signal normalization strategy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 01:11:56 +00:00
alvis	46dee7377e	fix: api healthcheck + port mapping corrected to 3078 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:17:52 +00:00
alvis	4c8ef9ad86	fix: consentGiven boolean in test fixture (was number, broke docker build) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:14:07 +00:00
alvis	ffdf70733f	feat: M2 AI tips — LiteLLM gateway, context assembler, end-to-end generation pipeline Issues closed: #86, #87, #88, #89, #90, #91, #79, #80, #82 infra: - docker-compose `ai` profile: Ollama + LiteLLM services - infra/litellm/litellm_config.yaml: tip-generator / embedder / judge aliases - .env.example: LITELLM_URL, LITELLM_MASTER_KEY, OLLAMA_URL ml/serving: - POST /generate: calls LiteLLM tip-generator alias, returns TipCandidate[] - JSON retry loop (2 retries with correction prompt on malformed response) - _parse_llm_json strips markdown fences ml/features: - context.py: build_context() assembles user signals → PromptContext (sorts overdue/high-priority tasks first for LLM prompt quality) shared-types: - TipKind, TipSource, TipCandidate types - Tip gains kind + rationale fields services/api: - recommender: 3-stage pipeline (assemble → score → serve) Stage 1: Todoist tasks + LLM candidates fetched in parallel Stage 2: egreedy bandit scores merged candidate pool Stage 3: serve + log with prompt_version, llm_model, tip_kind - tip_scores: prompt_version, llm_model, tip_kind columns + migrations - config: LITELLM_URL added - integrations: surface token_status in /integrations response tests: - ml/serving/tests/test_generate.py: 13 tests (retry, 502/503, fence variants) - ml/features/test_context.py: 9 tests (sorting, edge cases) - services/api recommender.unit.test.ts: 16 pure-function tests (inferReward, dueAgeDays) - services/api recommender.test.ts: 4 integration tests (tip_scores columns, LLM fallback) - shared-types: TipCandidate, rationale, full TipFeedback action set docs: - ADR-0008: LiteLLM AI gateway decision - overview.md: M2 pipeline description updated - ml/README.md: serving + features roles updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:09:02 +00:00
alvis	85367aeaa0	feat: MLOps external services, AI stack planning, admin MLOps hub Infrastructure: - Add `mlops` compose profile: MLflow (basic-auth, /mlflow path) + Airflow (LocalExecutor, /airflow path) + airflow-db - infra/mlflow/basic_auth.ini for MLflow auth config - Caddy routes /mlflow* and /airflow* inside existing o.alogins.net block (see agap_git) - Dockerfile.admin: NEXT_PUBLIC_MLFLOW_URL / NEXT_PUBLIC_AIRFLOW_URL build args (default /mlflow, /airflow) Admin panel: - /admin/models: replace MLflow iframe with external link cards - /admin/experiments: replace LinUCB stats with MLOps hub (links to MLflow experiments/models + Airflow DAGs/datasets) - AdminShell: external nav links for MLflow ↗ and Airflow ↗ under MLOps section Docs & planning: - README: new AI stack section (Ollama/LiteLLM/OpenWebUI three-tier, tip generation pipeline, model aliases) - README: Phase 2 expanded with AI infra issues (#86-#93) and granular pipeline breakdown - README: Phase 4 expanded with LLM MLOps items (#94-#97) - CLAUDE.md: AI stack section, updated current phase (M1 shipped / M2 in progress), compose profiles, updated What NOT to do - docs/architecture/overview.md: AI stack section, updated decision flow diagram for Phase 2 LLM pipeline - ADR-0006: updated to reflect external services (path-based, not embedded) - Gitea issues #86-#97 created (M2: AI infra + pipeline; M4: LLM MLOps) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 08:20:44 +00:00
alvis	faf44c18fc	feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework - Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606) - Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward): dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3 - Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id} with d=7 feature vector (base 5 + sin/cos day-of-week encoding) - Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges, two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events - Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables - Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0 - Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls - Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture - Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns - ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 07:44:37 +00:00
alvis	c5ea18ec6e	docs: mark M1 fully shipped in roadmap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 03:57:29 +00:00
alvis	e62c726ea4	feat: M1 admin console — all 10 remaining pages + signal/quality/ops infrastructure Admin console (issues #63–72): - Event stream viewer: live-tail ring buffer (500 events) with subject/user filters - Feature store browser: per-user feature vector history from ml/serving - Model registry panel: MLflow embed at /admin/models - Experiment dashboard: LinUCB per-user stats (pulls, reward, θ) + bandit reset - Recommendation log: per-tip explainability (policy, score, features, latency) - Reward analytics: daily reaction breakdown + per-policy compare - Data quality widget: missing-feature rate, stale-token rate, daily completeness - Ops actions: replay-signal, policy enable/disable; user actions link to Users page - SQL runner: read-only SELECT runner with saved queries - Health rollup: fan-out to api/ml/sqlite/event-bus with auto-refresh Backend: - tip_scores table: logs features+policy+score+latency at every scoring call (#67) - saved_queries table: per-admin saved SQL (#71) - Event bus: 500-event ring buffer + tail() API (#63) - Admin routes: /events, /tips, /reward-analytics, /data-quality, /health, /policies, /replay-signal, /sql, /saved-queries endpoints - /api/ml/* admin-gated proxy to ml/serving (#64, #66) - Shadow-policy registry in recommender (#56) ML serving: - /reset/{user_id}: clear bandit state + feature history (#66) - /stats/{user_id}: pulls, cumulative reward, estimated mean, θ (#66) - /features/{user_id}: last 100 feature vectors logged at scoring time (#64) - Meta (pulls, rewards) persisted alongside A/b matrices Web: - Tip action sheet adds Helpful / Not helpful buttons (#62) - TipFeedback type extended with helpful/not_helpful actions - Rewards mapped: helpful=+0.5, not_helpful=−0.5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 03:56:48 +00:00
alvis	2402a140e9	docs: mark M1 shipped in roadmap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 14:08:20 +00:00
alvis	c7edd92e15	feat: M1 — LinUCB bandit, RemotePolicy, Web Push, event bus ML serving: - LinUCB contextual bandit (disjoint, d=5 features: hour_sin/cos, is_overdue, task_age, priority) - /score endpoint replaces stub random; /reward endpoint for online learning - Per-user model state persisted to disk as JSON (survives restarts) - venv at ml/serving/.venv; start with pnpm dev from ml/serving Recommender: - Todoist fetch now extracts features (is_overdue, task_age_days, priority) - RemotePolicy calls ml/serving with 3s timeout; falls back to RandomPolicy - Reward sent to /reward on feedback (done=+1, snooze=0, dismiss=-1) Web Push: - VAPID keys in config; push_subscriptions table in DB - POST/DELETE /api/push/subscribe; GET /api/push/vapid-public-key - Service worker (public/sw.js): push → showNotification, notificationclick → focus/open - "notify me" button on tip page; registers SW + subscribes on permission grant Event bus: - services/api/src/events/bus.ts: typed EventEmitter wrapper - Subjects: signals.tip.served, signals.tip.feedback, signals.task.synced - Same publish/subscribe API NATS JetStream will implement — swap is mechanical Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 14:08:00 +00:00
alvis	08dfa1d8c9	chore: gitignore playwright artifacts; mark M0 fully complete in README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 09:09:26 +00:00
alvis	f6c890213b	feat: complete M0 — legal pages, consent, tip_views metrics, account deletion UI - /legal/terms and /legal/privacy pages (linked from sign-in) - Consent (consentGiven=true) recorded on first Google sign-in - tip_views table: one row per tip served — enables activation + reaction rate queries - tip_views purged on account deletion - Delete account button on /connect (confirm → revoke tokens → purge data → sign out) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 09:09:08 +00:00
alvis	888f8b9a99	docs: mark Phase 0 shipped in roadmap, note remaining M0 items Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 08:53:56 +00:00
alvis	3123cb73fb	feat: Phase 0 walking skeleton — auth, Todoist integration, tip page - Google OAuth2/PKCE flow via openid-client v6; session cookie (30-day) - Next.js middleware auth guard — redirects before any client render - Todoist OAuth2 connect/disconnect; REST v1 task fetch (today\|overdue) - RandomPolicy recommender behind stable POST /recommend contract - Feedback endpoint (done/dismiss/snooze); marks task complete in Todoist - 30s in-memory task cache per user (~1ms recommend on cache hit) - Tip page: pure opacity fade-in (3.5s), fast fade-out (0.3s), no motion - "reading you…" loading text with breathe animation - PWA icons + manifest - Ports pinned: API=3078, web=3079; Caddy at o.alogins.net Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 08:53:38 +00:00

1 2

53 Commits