Commit Graph

38 Commits

Author SHA1 Message Date
ad6747c242 feat(profile): /api/profile + eligibility filter + inference framework (ADR-0014 steps 4-6)
Step 4 — /api/profile read-through API:
  GET  /api/profile          → { user, prefs, consents, contexts }
  PATCH /api/profile/prefs/:scope  upsert user_preferences (source='user')
  PATCH /api/profile/consents      grant / revoke consent keys
  PATCH /api/profile/contexts      create / activate / deactivate contexts
  Legacy consentGiven bit folded in as data:core fallback.

Step 5 — registry-driven eligibility filter:
  fetchRegistry() exported from agent-registry.ts.
  profile/eligibility.ts: getEligibleAgentIds(userId) — filters by required
  consents, silenced_in_contexts, and user_preferences[enabled=false].
  fetchOrchestratorTip filters agent_outputs to eligible set before calling
  ml/serving /recommend. Fail-closed: registry unavailable → empty set.

Step 6 — shared context-inference framework (#111) + time-of-day proof (#112):
  ml/agents/inference/: UserHistory, FeedbackEvent, run_inference().
  Framework: cold-start, min_history gating, error fallback, structured logs.
  TimeOfDayAgent v1.1.0: inferred_params=[preferred_hour]; also reads
  quiet_start/quiet_end from agent_prefs. agent_prefs injected by TS caller.
  AgentInput gains agent_prefs field.
  ml/serving: POST /agents/{agent_id}/infer endpoint.
  agent-outputs.ts computeAndStore: loads prefs before compute, calls /infer
  after, persists results (source='inferred'); user overrides never touched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:14:25 +00:00
305eeae38b feat(agents): manifest plumbing + GET /agents/registry (ADR-0014 step 3)
Each agent now exports a module-level MANIFEST declaring id, version,
pref_schema, required_consents, ttl_sec, and silenced_in_contexts. The
registry surfaces both the agent and its manifest, and rejects on
mismatch so the two cannot drift.

ml/serving exposes GET /agents/registry; services/api proxies it as
GET /api/agents/registry with a 60s in-process cache so admin pageviews
don't hammer upstream. Failures aren't cached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:55:54 +00:00
5d43339616 feat(api): unified Profile schema + consent backfill (ADR-0014 step 1-2)
Adds user_preferences, user_consents, user_contexts and the tone /
tip_kinds_json columns on users. Backfills consent_given=1 rows into
user_consents as data:core; INSERT OR IGNORE keeps it idempotent and
respects later revocations.

Migration body moves to db/migrations.ts so tests can apply it to a
fresh in-memory handle without opening the prod DB on import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:28:47 +00:00
41302d9f36 fix: repair Docker build — TS errors and missing docs in image
- Remove unused `httpx` import from bench.ts (package does not exist)
- Add explicit `IRouter` type on `router` in agent-outputs.ts and bench.ts
  to resolve TS2742 portable-type errors
- Remove `docs` from .dockerignore so Dockerfile.admin can copy it into
  the runner image (DOCS_ROOT=/app/docs is read at runtime by the admin)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:52:27 +00:00
05f748159b chore: remove shadow policy machinery (ADR-0013 step 10)
Deletes shadowPolicies map, getShadowPolicies, setPolicyActive from
recommender.ts; removes /api/admin/policies routes from admin.ts; removes
getPolicies, togglePolicy, PolicyInfo from admin api.ts; removes the
policy toggle section from the ops page.

168 API tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:45:32 +00:00
c65bedcf68 feat(api): orchestrator cutover — replace bandit with multi-agent pipeline (ADR-0013 step 6)
POST /recommend now calls ml/serving /recommend with pre-computed agent
snippets + task context instead of /generate + /score/egreedy/v2. Falls
back to a random signal candidate when ml/serving is unavailable.

Removes: remotePolicy, fetchLlmCandidates, sendRewardWithRetry,
candidateCache, pickPromptVersion. Feedback handler keeps inferReward +
tipFeedback writes for observability; reward delivery to the bandit is gone.
tipScores.policy is now 'orchestrator'; promptVersion is 'v4-orchestrator'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:37:15 +00:00
7e958a779d feat(api): agent pre-compute scheduler (ADR-0013 step 5)
Extracts computeAndStore() from the /agents/:agentId/compute route so it
can be called without an HTTP round-trip. startAgentPrecomputeScheduler()
runs every 15 min: fetches active users (tip view in 48h), runs all agents
in parallel per user, then purges outputs expired >24h. Agent IDs are
resolved from ml/serving /health at startup with a fallback hardcoded list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:29:50 +00:00
b3cf588f2f feat(ml): multi-agent context framework + v4 orchestrator prompt
Adds ml/agents/ — five specialised sub-agents (overdue_task, momentum,
time_of_day, recent_patterns, focus_area) each producing a prompt snippet
from user signals. A registry wires them up; the orchestrator prompt in
ml/serving/prompts.py synthesises their outputs into one tip via LiteLLM.

Also wires /api/agents route in the API and updates the Dockerfile to copy
the full ml/ tree with PYTHONPATH=/app so agent imports resolve correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 10:20:05 +00:00
f8d66aa01f chore: remove Airflow completely from the stack
Drop all four Airflow containers (db, init, webserver, scheduler) from the
mlops compose profile, leaving MLflow as the sole mlops service. Remove
AIRFLOW_* env vars, config fields, health-check entries, DAG trigger code
in admin/bench routes, the airflow_dag_run_id schema column, Airflow nav
links and DAG-run links in the admin UI, the two Airflow DAG files
(bench_dag.py, sim_dag.py), and all related docs/ADR references.
Simulations now run exclusively via the subprocess path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:38:46 +00:00
0474ad4deb feat(airflow): integrate bench harness into bench_collect DAG
New DAG (`ml/pipelines/bench_dag.py`) with three linked tasks:
1. collect.py — generates candidates, logs to MLflow
2. export_for_judge — exports pending runs for Claude Code scoring
3. compare — generates leaderboard by (model, prompt) cell

Config via dag_run.conf supports all collect.py options (models, prompts,
n_tips, n_scenarios, temperature, experiment name, max_model_b).

New admin API endpoints (`services/api/src/routes/bench.ts`):
- GET /api/bench/experiments — list tip-bench-* experiments
- POST /api/bench/run — trigger DAG with custom config
- GET /api/bench/runs/:experiment — list runs in experiment
- GET /api/bench/leaderboard/:experiment — leaderboard by (model, prompt)

All endpoints require admin auth. Human judge (Claude Code) scores are
applied manually post-export; future enhancement: add webhook to DAG.

Admin UI can now trigger and monitor benchmarks from a dashboard panel.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 11:54:30 +00:00
bad1bb2cba feat(simulate): MLflow tracking, Airflow DAG integration, health checks for mlflow/airflow
- sim_runs schema: add judge_mode, n_policies, airflow_dag_run_id, mlflow_run_id columns
- admin health endpoint: add mlflow + airflow checks (Basic auth for Airflow API)
- admin nav: add Simulations page link; rename section label
- runner.py: optional MLflow experiment tracking; multi-policy support
- sim_dag.py: Airflow DAG for offline sim pipeline
- admin simulate page + API client methods for sim runs
- shared-types tsconfig: exclude test files from build

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:08:36 +00:00
e96ceb7ee1 feat(auth): token-based admin authentication for Playwright/CI (#105)
Add POST /api/auth/token — validates ADMIN_TOKEN env var, creates a 24h
session and sets the sid cookie so automated tools can access the admin
panel without Google OAuth. Admin login page gains a token input form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 12:07:43 +00:00
b554970032 docs(observability): add services/api README; update ml/serving + recommender docs (#18)
- services/api/README.md: new — contract, middleware stack, background
  tasks, config table (LOG_LEVEL, SENTRY_DSN), health story, extraction
  criteria
- ml/serving/README.md: add Observability section (structlog JSON,
  traceparent → trace_id binding), add SENTRY_DSN + ENV to config table
- services/recommender/README.md: fix policy table — egreedy-v2 is
  active (#99), egreedy-v1 is shadow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:41:39 +00:00
c4960d0601 feat(observability): structured logs, W3C trace IDs, Sentry hooks (#18)
- TS: pino + pino-http; every HTTP request log includes traceId from
  W3C traceparent header (generated if absent); forwarded to ml/serving
  on all /score, /generate, /reward, and /api/ml proxy calls
- Python: structlog JSON; FastAPI middleware binds trace_id via
  contextvars so every log line within a request carries it
- Sentry: optional SENTRY_DSN init in both runtimes (no-op if unset)
- Replace all console.* calls across services/api with pino logger
- Update tests to spy on logger instead of console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:37:28 +00:00
7281af83a4 feat(bandit): promote egreedy-v2 (D=12, profile features) as active policy (#99)
Offline sim gate passed — egreedy-v2 mean reward −0.629 vs egreedy-v1 −0.642
(5 users × 20 rounds, rule judge, seed 42). v2 wins 3/5 personas.

- recommender.ts: switch remotePolicy() to /score/egreedy/v2
- recommender.ts: switch sendRewardWithRetry() to /reward/egreedy/v2 with
  profile_features payload so the ridge update uses the full D=12 vector
- recommender.ts: re-fetch profile at feedback time (TTL-cached, near-instant)
- ADR-0012: status Accepted → Promoted, promotion record appended

Shadow entry egreedy-v2-shadow kept in registry (active: false) for rollback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 03:08:28 +00:00
cba3f1a184 docs(services): update integrations + recommender READMEs for signal abstraction (#78)
integrations/README — replace stale Connector interface and fictional
libsodium vault with the actual SignalSource pattern, SQLite token table,
and real OAuth routes.

recommender/README — document the SignalAggregator pipeline, current
policy registry, and actual /recommend + /feedback contract shapes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:17:38 +00:00
352469162d fix(signals): add missing source field to TaskSyncedEvent (#78)
TaskSyncedPayload in shared-types and ml/serving schemas both require
source, but TaskSyncedEvent in bus.ts and the todoist publish call both
omitted it — causing the JetStream consumer to nak every task.synced
message on validation failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 17:15:32 +00:00
2d7cf217a9 feat(ml): egreedy-v2 shadow policy — D=12 with profile features (#99)
Ship the scaffolding for #99 (phase B.3 of #81):

- ml/serving: add /score/egreedy/v2, /reward/egreedy/v2, /stats/egreedy/v2
  endpoints (D=12). New feature dims: completion/dismiss rates, mean dwell
  (clipped 10min), preferred-hour alignment (cosine, 1-dim), tip volume (log).
  Separate state file per user (_egreedy_v2.json). /reset clears v2 state too.
- ADR-0012: documents D=7→12 dimension change, normalization choices, shadow
  rollout protocol, and promotion gate (offline sim win per ADR-0002).
- recommender.ts: register egreedy-v2-shadow in shadow-policy map (disabled by
  default). When enabled, calls /score/egreedy/v2 fire-and-forget and publishes
  shadow:egreedy-v2-shadow serve signal. No reward to shadow — sim is the gate.
- sim runner/personas: personas carry synthetic profile_features per persona;
  _call_score/_call_reward thread profile_features through (None-safe for v1/linucb).
- 18 new Python tests; all 56 Python + 170 TS tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:00:38 +00:00
ee4eb15022 feat(profile): event-driven invalidation (#81 phase B.2)
Features now declare invalidatedBy subjects in the registry; the new
profile/subscriber.ts subscribes to each unique subject and drops
matching stored rows for the userId in the payload. Next getProfile
call recomputes from current data instead of waiting up to ttlSec.

Wiring:
  completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d,
  preferred_hour  ← signals.tip.feedback
  tip_volume_30d  ← signals.tip.served

TTL stays as a safety net for clock drift and dropped events.
Registration validates each declared subject against KNOWN_SUBJECTS
(mirror of EventMap) so typos throw at startup, not silently.

ADR-0011 updated.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:38:45 +00:00
4a42a6aabf feat(admin): profile freshness panel in data-quality (#81 phase B.4)
Adds a per-feature freshness summary to /admin/data-quality so the admin
can spot features that are systematically stale or never computed:

  totalEligible — distinct users with tip_views in the last 30 days
  missing       — eligible users with no row stored for the feature
  stale         — eligible users whose stored row is past its TTL

Backend exposes summarizeProfileFreshness() in profile/builder.ts; one
query per feature joins eligible users LEFT JOIN profile rows.
Coverage = (eligible − missing − stale) / eligible, colored
green/yellow/red via the new PctGood helper (high-is-good, opposite of
the existing Pct used for missing-feature/stale-token rates).

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:34:46 +00:00
9e96540bcc feat(admin): per-user profile view + rebuild action (#81 phase B.1)
Surfaces phase A's profile features in /admin/users/:id so we can verify
they're actually computing useful values before investing in bandit
consumption. The detail GET now includes profile rows joined with registry
metadata (name, value, age, fresh badge, ttlSec, description). Read does
NOT trigger compute — staleness must be visible. A new POST
.../profile/rebuild button force-recomputes and is audit-logged like
reset-bandit.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:27:08 +00:00
7d4c29e137 feat(profile): user-profile feature registry + builder (phase A)
Centralizes user-level features (completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that
owns both definition and SQL aggregation, since the data lives in the
TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps
recommend latency bounded; values persist in user_profile_features (KV).

ml/serving accepts profile_features on /score + /generate but does not
yet consume them — extending the bandit feature vector changes D and
resets every user's learned state, so that's a deliberate phase-B step.

Includes ml/features/profile_schema.py as a contract mirror with a sync
test that diffs name sets against registry.ts.

ADR-0011 records the data-locality reasoning (registry in TS, not Python
as the issue originally suggested).

Phase B (deferred): event-driven incremental updates, bandit consumption
with state migration, admin per-user profile page, staleness alerts.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:22:22 +00:00
430804e9a5 feat(ml): prompt registry + per-request variant selection
Replaces the hardcoded "v1" label with a real prompt registry:

  ml/serving/prompts.py       — keyed by version: v1 (baseline),
                                v2-mentor (calm/specific persona),
                                v3-few-shot (v1 persona + curated examples)
  ml/serving/main.py          — POST /generate accepts optional prompt_version,
                                422 on unknown, echoes the version actually used
                                back in the response
  services/api/src/config.ts  — TIP_PROMPT_VERSION: empty / single / comma-list
                                (uniform random per request)
  services/api/src/routes/recommender.ts
                              — pickPromptVersion() drives selection; the
                                response's prompt_version (not a stale TS
                                constant) is what lands in tip_scores so the
                                #92 reward-analytics dashboard shows real
                                per-variant reaction rates

Closes #84.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 15:44:04 +00:00
aa4bdd8f09 feat(admin): LLM tip quality dashboard — per-model/prompt/kind breakdowns
/admin/reward-analytics now surfaces served count, reaction rate, and avg
reward grouped by llm_model, prompt_version, and tip_kind — closing the
loop so model/prompt iterations in M2 are legible next to the bandit
policy view. Data comes from the tip_scores columns added in ffdf707 and
tip_feedback.reward_milli; bandit-only tips show as "(bandit-only)".

Closes #92.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 15:24:52 +00:00
5b52c6bf40 test: cover NATS bridge + Todoist scheduler; ADR-0010
- bus.test.ts: 4 cases for the new onPublish hook contract
- nats.test.ts: stream creation idempotency + JSON publish bridge
- scheduler.test.ts: startup delay, fan-out, per-user failure isolation
- ADR-0010 documents the bridge-don't-replace decision and the
  Todoist scheduler isolation, plus open follow-ups (#98 ml/serving
  consumer, #54 protobuf migration, graceful shutdown, metrics)
- README/overview/services README reflect the bridged event substrate
- CLAUDE.md gains a "don't nats.publish() directly" rule
- .env.example documents NATS_URL + TODOIST_SYNC_INTERVAL_MS

Verified in deployment 2026-04-18: api -> nats bridge connects on
boot, signals + feedback streams created, scheduler tick logs
"todoist sync: 1 ok, 0 failed (1 users)" within 10s. Closes #21, #22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 07:55:25 +00:00
2a7380933c feat: NATS JetStream + Todoist background sync (#21, #22)
Issue 21 — event infrastructure:
- NormalizedEvent<T> + payload types in packages/shared-types/src/events/
- Bus.onPublish() hook for side-effect bridges
- NATS JetStream adapter (services/api/src/events/nats.ts): connects when
  NATS_URL is set, creates signals.> and feedback.> streams, bridges all
  in-process bus publishes to JetStream — no-ops gracefully when NATS is absent
- NATS service added to docker-compose (profile: events|full, port 4222/8222)

Issue 22 — Todoist background sync:
- services/api/src/signals/scheduler.ts: queries all active-token users every
  15 min (TODOIST_SYNC_INTERVAL_MS), fan-out via todoistSource.fetchSignals()
  which emits signals.task.synced; on-demand fetch remains as freshness fallback
- NATS_URL + TODOIST_SYNC_INTERVAL_MS added to config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 01:18:51 +00:00
e3ca3ba733 feat: SignalSource abstraction — generalize signal ingestion beyond Todoist (#78)
- Add Signal + SignalSource interfaces to packages/shared-types
- TipCandidate.features widened to Record<string,number|boolean> to match Signal
- TodoistSignalSource: encapsulates fetch, cache, 401 handling, bus events, and act()
- SignalAggregator: parallel fan-out across sources with per-source failure isolation
- Recommender refactored to consume Signal[] via aggregator; source action dispatch via aggregator.act()
- ADR-0009: signal normalization strategy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 01:11:56 +00:00
4c8ef9ad86 fix: consentGiven boolean in test fixture (was number, broke docker build)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:14:07 +00:00
ffdf70733f feat: M2 AI tips — LiteLLM gateway, context assembler, end-to-end generation pipeline
Issues closed: #86, #87, #88, #89, #90, #91, #79, #80, #82

infra:
- docker-compose `ai` profile: Ollama + LiteLLM services
- infra/litellm/litellm_config.yaml: tip-generator / embedder / judge aliases
- .env.example: LITELLM_URL, LITELLM_MASTER_KEY, OLLAMA_URL

ml/serving:
- POST /generate: calls LiteLLM tip-generator alias, returns TipCandidate[]
- JSON retry loop (2 retries with correction prompt on malformed response)
- _parse_llm_json strips markdown fences

ml/features:
- context.py: build_context() assembles user signals → PromptContext
  (sorts overdue/high-priority tasks first for LLM prompt quality)

shared-types:
- TipKind, TipSource, TipCandidate types
- Tip gains kind + rationale fields

services/api:
- recommender: 3-stage pipeline (assemble → score → serve)
  Stage 1: Todoist tasks + LLM candidates fetched in parallel
  Stage 2: egreedy bandit scores merged candidate pool
  Stage 3: serve + log with prompt_version, llm_model, tip_kind
- tip_scores: prompt_version, llm_model, tip_kind columns + migrations
- config: LITELLM_URL added
- integrations: surface token_status in /integrations response

tests:
- ml/serving/tests/test_generate.py: 13 tests (retry, 502/503, fence variants)
- ml/features/test_context.py: 9 tests (sorting, edge cases)
- services/api recommender.unit.test.ts: 16 pure-function tests (inferReward, dueAgeDays)
- services/api recommender.test.ts: 4 integration tests (tip_scores columns, LLM fallback)
- shared-types: TipCandidate, rationale, full TipFeedback action set

docs:
- ADR-0008: LiteLLM AI gateway decision
- overview.md: M2 pipeline description updated
- ml/README.md: serving + features roles updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:09:02 +00:00
85367aeaa0 feat: MLOps external services, AI stack planning, admin MLOps hub
Infrastructure:
- Add `mlops` compose profile: MLflow (basic-auth, /mlflow path) + Airflow (LocalExecutor, /airflow path) + airflow-db
- infra/mlflow/basic_auth.ini for MLflow auth config
- Caddy routes /mlflow* and /airflow* inside existing o.alogins.net block (see agap_git)
- Dockerfile.admin: NEXT_PUBLIC_MLFLOW_URL / NEXT_PUBLIC_AIRFLOW_URL build args (default /mlflow, /airflow)

Admin panel:
- /admin/models: replace MLflow iframe with external link cards
- /admin/experiments: replace LinUCB stats with MLOps hub (links to MLflow experiments/models + Airflow DAGs/datasets)
- AdminShell: external nav links for MLflow ↗ and Airflow ↗ under MLOps section

Docs & planning:
- README: new AI stack section (Ollama/LiteLLM/OpenWebUI three-tier, tip generation pipeline, model aliases)
- README: Phase 2 expanded with AI infra issues (#86-#93) and granular pipeline breakdown
- README: Phase 4 expanded with LLM MLOps items (#94-#97)
- CLAUDE.md: AI stack section, updated current phase (M1 shipped / M2 in progress), compose profiles, updated What NOT to do
- docs/architecture/overview.md: AI stack section, updated decision flow diagram for Phase 2 LLM pipeline
- ADR-0006: updated to reflect external services (path-based, not embedded)
- Gitea issues #86-#97 created (M2: AI infra + pipeline; M4: LLM MLOps)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:20:44 +00:00
faf44c18fc feat: ε-greedy v1 as active policy; dwell-time reward inference; offline sim framework
- Promote egreedy-v1 to active serving policy (ADR-0007): /score/egreedy + /reward/egreedy
  replaces linucb-v1 endpoints after offline sim shows +10.7% mean reward (−0.548 vs −0.606)
- Replace explicit helpful/not_helpful feedback with dwell-time inferred reward (inferReward):
  dismiss=−1.0, snooze=+0.1, done<15s=−0.3, done 15s–2min=+1.0, done 2–10min=+0.6, done>10min=+0.3
- Add ml/serving ε-greedy endpoints: /score/egreedy, /reward/egreedy, /stats/egreedy/{user_id}
  with d=7 feature vector (base 5 + sin/cos day-of-week encoding)
- Add offline simulation framework (ml/experiments/sim): rule/LLM/claude-code judges,
  two-phase score+reward, synthetic personas, task generator; results stored in sim_runs/sim_events
- Add /admin/simulations page: start runs, live-poll status, reward curve SVG, action/persona tables
- Fix egreedy day_of_week training skew: reward endpoint now uses actual dow instead of hardcoded 0
- Fix runner.py proxy bypass: httpx.Client(trust_env=False) for localhost ML calls
- Add dwellMs to TipFeedbackEvent contract and bus.test.ts fixture
- Schema: sim_runs, sim_events tables; tip_feedback gains dwell_ms, reward_milli columns
- ADR-0006: admin console framework; ADR-0007: egreedy-v1 policy selection rationale

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 07:44:37 +00:00
e62c726ea4 feat: M1 admin console — all 10 remaining pages + signal/quality/ops infrastructure
Admin console (issues #63–72):
- Event stream viewer: live-tail ring buffer (500 events) with subject/user filters
- Feature store browser: per-user feature vector history from ml/serving
- Model registry panel: MLflow embed at /admin/models
- Experiment dashboard: LinUCB per-user stats (pulls, reward, θ) + bandit reset
- Recommendation log: per-tip explainability (policy, score, features, latency)
- Reward analytics: daily reaction breakdown + per-policy compare
- Data quality widget: missing-feature rate, stale-token rate, daily completeness
- Ops actions: replay-signal, policy enable/disable; user actions link to Users page
- SQL runner: read-only SELECT runner with saved queries
- Health rollup: fan-out to api/ml/sqlite/event-bus with auto-refresh

Backend:
- tip_scores table: logs features+policy+score+latency at every scoring call (#67)
- saved_queries table: per-admin saved SQL (#71)
- Event bus: 500-event ring buffer + tail() API (#63)
- Admin routes: /events, /tips, /reward-analytics, /data-quality, /health,
  /policies, /replay-signal, /sql, /saved-queries endpoints
- /api/ml/* admin-gated proxy to ml/serving (#64, #66)
- Shadow-policy registry in recommender (#56)

ML serving:
- /reset/{user_id}: clear bandit state + feature history (#66)
- /stats/{user_id}: pulls, cumulative reward, estimated mean, θ (#66)
- /features/{user_id}: last 100 feature vectors logged at scoring time (#64)
- Meta (pulls, rewards) persisted alongside A/b matrices

Web:
- Tip action sheet adds Helpful / Not helpful buttons (#62)
- TipFeedback type extended with helpful/not_helpful actions
- Rewards mapped: helpful=+0.5, not_helpful=−0.5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:56:48 +00:00
c7edd92e15 feat: M1 — LinUCB bandit, RemotePolicy, Web Push, event bus
ML serving:
- LinUCB contextual bandit (disjoint, d=5 features: hour_sin/cos, is_overdue, task_age, priority)
- /score endpoint replaces stub random; /reward endpoint for online learning
- Per-user model state persisted to disk as JSON (survives restarts)
- venv at ml/serving/.venv; start with pnpm dev from ml/serving

Recommender:
- Todoist fetch now extracts features (is_overdue, task_age_days, priority)
- RemotePolicy calls ml/serving with 3s timeout; falls back to RandomPolicy
- Reward sent to /reward on feedback (done=+1, snooze=0, dismiss=-1)

Web Push:
- VAPID keys in config; push_subscriptions table in DB
- POST/DELETE /api/push/subscribe; GET /api/push/vapid-public-key
- Service worker (public/sw.js): push → showNotification, notificationclick → focus/open
- "notify me" button on tip page; registers SW + subscribes on permission grant

Event bus:
- services/api/src/events/bus.ts: typed EventEmitter wrapper
- Subjects: signals.tip.served, signals.tip.feedback, signals.task.synced
- Same publish/subscribe API NATS JetStream will implement — swap is mechanical

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 14:08:00 +00:00
f6c890213b feat: complete M0 — legal pages, consent, tip_views metrics, account deletion UI
- /legal/terms and /legal/privacy pages (linked from sign-in)
- Consent (consentGiven=true) recorded on first Google sign-in
- tip_views table: one row per tip served — enables activation + reaction rate queries
- tip_views purged on account deletion
- Delete account button on /connect (confirm → revoke tokens → purge data → sign out)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 09:09:08 +00:00
3123cb73fb feat: Phase 0 walking skeleton — auth, Todoist integration, tip page
- Google OAuth2/PKCE flow via openid-client v6; session cookie (30-day)
- Next.js middleware auth guard — redirects before any client render
- Todoist OAuth2 connect/disconnect; REST v1 task fetch (today|overdue)
- RandomPolicy recommender behind stable POST /recommend contract
- Feedback endpoint (done/dismiss/snooze); marks task complete in Todoist
- 30s in-memory task cache per user (~1ms recommend on cache hit)
- Tip page: pure opacity fade-in (3.5s), fast fade-out (0.3s), no motion
- "reading you…" loading text with breathe animation
- PWA icons + manifest
- Ports pinned: API=3078, web=3079; Caddy at o.alogins.net

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:53:38 +00:00
65218762be feat: Phase 0 walking skeleton — monorepo, API, web, ML stub
Sets up the full Phase 0 foundation:

- pnpm workspaces + turbo build graph; native module build approval
- packages/shared-types: HTTP contracts (Tip, Auth, Integrations, User)
- services/api: Express modular monolith with better-sqlite3/drizzle
  - auth: Google OAuth2 + PKCE via openid-client v6, cookie sessions
  - integrations: Todoist OAuth2 connect/disconnect, token vault
  - recommender: RandomPolicy over Todoist tasks, feedback sink
  - user: profile, consent capture, full account deletion (GDPR)
- apps/web: Next.js 15, three pages (sign-in → connect → tip)
  - tip page: black canvas, hold-to-act gesture, action sheet
  - PWA manifest + theme
- ml/serving: FastAPI stub implementing the POST /score contract
- infra: docker-compose (core/full profiles), Dockerfiles, CI skeleton
- .env.example with all required vars documented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 12:41:24 +00:00
7f173f88d3 refactor: architecture revision — modular monolith, auth-commit, event protobuf, privacy-from-day-0
- ADR-0003: modular monolith for Phase 0 with documented extraction triggers
- ADR-0004: Auth.js + OIDC-shaped boundary; dedicated provider when mobile ships
- ADR-0005: protobuf for events, OpenAPI for HTTP, schema-registry CI gate
- New architecture docs: data-model, metrics (magic proxies), privacy (Phase-0 feature)
- Prime directives updated: privacy-as-feature, modular-by-package-deployable-by-stage
- Roadmap revised: Apple OAuth deferred to M1; web push in M1; k3s intermediate; tip-kind-aware UI
- PLAN updated: Phase-0 deletion endpoint, metrics baseline, compose profiles, import-boundary lint
- License decision in README (ARR with OSS plan in Phase 5)
2026-04-13 14:36:11 +00:00
cf4c7a0eb4 chore: scaffold oO monorepo with architecture, roadmap, and module stubs 2026-04-13 14:19:56 +00:00