feat: MLOps external services, AI stack planning, admin MLOps hub

Infrastructure: - Add `mlops` compose profile: MLflow (basic-auth, /mlflow path) + Airflow (LocalExecutor, /airflow path) + airflow-db - infra/mlflow/basic_auth.ini for MLflow auth config - Caddy routes /mlflow* and /airflow* inside existing o.alogins.net block (see agap_git) - Dockerfile.admin: NEXT_PUBLIC_MLFLOW_URL / NEXT_PUBLIC_AIRFLOW_URL build args (default /mlflow, /airflow) Admin panel: - /admin/models: replace MLflow iframe with external link cards - /admin/experiments: replace LinUCB stats with MLOps hub (links to MLflow experiments/models + Airflow DAGs/datasets) - AdminShell: external nav links for MLflow ↗ and Airflow ↗ under MLOps section Docs & planning: - README: new AI stack section (Ollama/LiteLLM/OpenWebUI three-tier, tip generation pipeline, model aliases) - README: Phase 2 expanded with AI infra issues (#86-#93) and granular pipeline breakdown - README: Phase 4 expanded with LLM MLOps items (#94-#97) - CLAUDE.md: AI stack section, updated current phase (M1 shipped / M2 in progress), compose profiles, updated What NOT to do - docs/architecture/overview.md: AI stack section, updated decision flow diagram for Phase 2 LLM pipeline - ADR-0006: updated to reflect external services (path-based, not embedded) - Gitea issues #86-#97 created (M2: AI infra + pipeline; M4: LLM MLOps) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:20:44 +00:00
parent faf44c18fc
commit 85367aeaa0
25 changed files with 695 additions and 222 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -65,7 +65,7 @@ docs/              architecture notes, ADRs, API specs
 - One PR = one concern. Conventional-commit prefixes (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`).
 - ADRs go in `docs/adr/NNNN-title.md` for any decision that constrains future work.
 - No secrets in repo. Local dev via `.env.local` (gitignored), prod via the server's secret store (Vaultwarden now; k8s secrets later).
- Compose profiles (`core`, `full`) so devs can run a subset without 16 GB of RAM.
+- Compose profiles: `core` (api + web + admin), `full` (adds ml-serving), `mlops` (adds MLflow + Airflow), `ai` (adds Ollama + LiteLLM). Mix as needed.
 ## Definition of done (per feature)
@@ -76,15 +76,38 @@ docs/              architecture notes, ADRs, API specs
 5. Deployable via `docker compose up` locally.
 6. If it touches user data → a deletion path exists and is tested.
 ## AI stack
 oO generates tips with an LLM and ranks them with a bandit. All LLM calls route through **LiteLLM** at `llm.alogins.net` using model aliases — swapping models is a config change, not a code change.
 | Alias | Model | Used by |
 |-------|-------|---------|
 | `tip-generator` | qwen2.5:7b (default) | `ml/serving` tip generation |
 | `embedder` | nomic-embed-text | task clustering, dedup |
 | `judge` | claude-haiku-4-5 (cloud, eval only) | offline sim |
 Env vars: `LITELLM_URL` (default `http://localhost:4000`), `OLLAMA_URL` (default `http://localhost:11434`).
 Start with: `docker compose --profile ai up` (adds Ollama + LiteLLM locally). In prod both are shared Agap services.
 **LLM tip generation pipeline:**
 1. `ml/features/context.py` assembles user signals → structured prompt context
 2. `POST /generate` in `ml/serving` calls LiteLLM → returns `TipCandidate[]`
 3. Bandit policy in `ml/serving` scores + ranks candidates
 4. Best candidate returned as tip; reaction closes the online reward loop
 ## Current phase
-**Phase 0 — Prototype.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`.
+**M1 shipped. M2 (AI tips) in progress.** See `README.md` for the phase roadmap and `docs/architecture/` for diagrams. Work is tracked as Gitea milestones + issues on `alvis/oO`.
 Active work: AI tip generation pipeline — issues #86–#93 in M2 milestone.
 ## What NOT to do
 - Don't copy Todoist's data into our DB. Store the OAuth token + computed features/derivatives we need, fetch raw on demand.
- Don't implement auth by hand. Phase 0 uses **Auth.js** behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
+- Don't implement auth by hand. Auth.js behind an OIDC-shaped boundary (ADR-0004); swap to a dedicated OIDC provider only when mobile ships.
- Don't hardwire a recommender. The "random todo" v0 must live behind the same interface the real ML model will implement (`POST /recommend` → `{tip}`). Swap internals, keep contract.
+- Don't hardwire a recommender. The contract is `POST /recommend → {tip}`. Swap internals (bandit, LLM, hybrid), keep contract.
 - Don't replace a policy in one step. New policies deploy shadow-first; promoted only after offline + online agreement with the incumbent (ADR-0002).
 - Don't build an admin UI before the user-facing black page is polished.
 - Don't over-split processes. Extract a service when pressure demands it, not in anticipation (ADR-0003).
 - Don't call LLMs directly from application code. All LLM calls go through `ml/serving` (Python) via `LITELLM_URL`. The TS recommender never holds a model name.
 - Don't embed MLflow/Airflow/OpenWebUI in the admin panel. They are external services; link out to them. The admin shell links to `o.alogins.net/mlflow`, `/airflow`, `ai.alogins.net`.
--- a/README.md
+++ b/README.md
@@ -67,6 +67,53 @@ docs/        architecture, adr, api
 ---
 ## AI stack
 oO is AI-native: the recommender's job is to **rank**, not to write. An LLM generates candidate tips from the user's context; the bandit picks the best one.
 ### Three-tier layout
 | Tier | Service | Purpose | Where |
 |------|---------|---------|-------|
 | Inference | **Ollama** | Local LLM + embedding; no data leaves the host | `localhost:11434` |
 | Routing | **LiteLLM** | Unified OpenAI-compatible API; model aliases; cloud fallback | `llm.alogins.net` (Agap shared) |
 | Testing | **OpenWebUI** | Prompt iteration, model comparison, manual evals | `ai.alogins.net` (Agap shared) |
 ### Tip generation pipeline (Phase 2 target)
 ```
 User signals  ──▶  Context assembler  ──▶  LiteLLM  ──▶  Ollama (local)
 (tasks, calendar,    (ml/features/)         (routing)     or cloud fallback
 patterns, time)
                                                ▼
                                     N typed TipCandidates
                                     {content, kind, model,
                                      prompt_version, confidence}
                                                ▼
                                    Bandit policy (ml/serving)
                                    scores + ranks candidates
                                                ▼
                                         Best tip shown
                                                ▼
                              User reaction (done / snooze / dismiss + dwell)
                                                ▼
                              Online bandit update + prompt_version tracking
 ```
 **Why LiteLLM as gateway:**  All LLM calls use a single `LITELLM_URL` env var. Swapping from qwen2.5 to llama3.2, or routing a fraction to Claude for A/B, is a config change in LiteLLM — zero code change in oO. The model name in `tip_scores` tells you exactly which model produced each tip.
 **Why Ollama first:**  Tips contain personal context. Local inference means no user data leaves the host for the inference path. Cloud models (Anthropic, OpenAI) are opt-in fallbacks for evaluation and simulation only, gated behind `ANTHROPIC_API_KEY`.
 ### Models (planned)
 | Alias | Model | Task |
 |-------|-------|------|
 | `tip-generator` | qwen2.5:7b (default) | Generate typed tip candidates from user context |
 | `embedder` | nomic-embed-text | Task clustering, semantic similarity for dedup |
 | `judge` | claude-haiku-4-5 (cloud, eval-only) | Offline sim judge; rates tip quality for A/B |
 ---
 ## Roadmap
 ### Phase 0 — Walking skeleton  *(M0)* ✓ shipped
@@ -102,7 +149,7 @@ Goal: tips are picked, not drawn from a hat — and they arrive at the right mom
 oO is ML-heavy. Without a cockpit, every model change ships blind. This console is the team's single pane for users, signals, features, models, experiments, and tip outcomes — with the ability to *act* on them (revoke a token, replay an event, promote a model, reset a bandit).
-**Framework pick — `apps/admin` on Next.js 15 + Tremor + shadcn/ui.**  Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses `packages/shared-types`, `sdk-js`, and the Auth.js session. Specialized ML tooling (MLflow, Grafana, Marimo) is **embedded** via authenticated reverse-proxy, not re-implemented.
+**Framework pick — `apps/admin` on Next.js 15 + Tremor + shadcn/ui.**  Analytics-first UI for an analytics-first product, stays on our existing TS/React/Tailwind stack, reuses `packages/shared-types`, `sdk-js`, and the Auth.js session. Specialized ML tooling (MLflow, Airflow) runs as **separate external services** linked from the admin shell; Grafana panels are embedded.
 | Layer | Tool | Why |
 |-------|------|-----|
@@ -111,7 +158,8 @@ oO is ML-heavy. Without a cockpit, every model change ships blind. This console
 | CRUD primitives | **[shadcn/ui](https://ui.shadcn.com)** | Copy-paste Radix components; forms, dialogs, command palette |
 | Heavy grids | **[TanStack Table v8](https://tanstack.com/table)** | Sortable / paginated / virtualized tables (events, users, tips) |
 | Extra charts | **[Recharts](https://recharts.org)** / **[visx](https://airbnb.io/visx)** | Fallbacks where Tremor falls short (e.g. force graphs, Sankey) |
-| Model registry | **[MLflow UI](https://mlflow.org)** *(embedded)* | Artifact + run browser; don't re-build |
+| Model registry / experiments | **[MLflow](https://mlflow.org)** *(external — `o.alogins.net/mlflow`)* | Experiment tracking, artifact browser, model registry; own basic-auth |
 | Pipeline orchestration | **[Airflow](https://airflow.apache.org)** *(external — `o.alogins.net/airflow`)* | Batch feature + retraining DAGs; own web-auth |
 | Infra metrics | **[Grafana](https://grafana.com)** *(embedded panels)* | One ops source of truth |
 | Ad-hoc analysis | **[Marimo](https://marimo.io)** reactive notebooks | Python-native for the ML side; launch-out link |
 | AuthZ | `profile.role='admin'` + Next.js middleware | Reuses existing session; no new auth surface |
@@ -130,8 +178,8 @@ oO is ML-heavy. Without a cockpit, every model change ships blind. This console
 5. [x] **User explorer** — list + detail page: identity, consents, integrations, last tip, reward history; revoke-integration + reset-bandit actions
 6. [x] **Event stream viewer** — live tail of `signals.*` with filters by subject/user/time; same UI when the bus swaps to NATS
 7. [x] **Feature store browser** — features sent to `ml/serving` per scoring call; diff across time for a user
-8. [x] **Model registry panel** — embed MLflow UI at `/admin/models`; promote / archive via admin context menu (writes audit-logged)
+8. [x] **Model registry panel** — `/admin/models` links out to MLflow (`mlflow.o.alogins.net`); experiment tracking and dataset management in MLflow + Airflow
-9. [x] **Experiment dashboard** — LinUCB per-arm stats (pulls, reward mean, α), cohort compare, bandit reset control
+9. [x] **MLOps hub** — `/admin/experiments` links to MLflow experiments/models and Airflow DAGs/datasets; bandit reset on Users page
 10. [x] **Recommendation log (explainability)** — per served tip: `(user, features, policy, score, feedback, latency)`; `tip_scores` table, 30-day retention
 11. [x] **Reward analytics** — reaction distribution over time; per-policy compare; slice by `hour_of_day`, `priority`, cohort
 12. [x] **Data quality widget** — missing-feature rate, stale-token rate, daily completeness heatmap
@@ -142,28 +190,69 @@ oO is ML-heavy. Without a cockpit, every model change ships blind. This console
 - [ ] Apple OAuth (deferred to M2)
-### Phase 2 — Multi-source profile & trust  *(M2)*
+### Phase 2 — AI tips + multi-source signals  *(M2)*
-Goal: oO knows more than tasks, and users can see/control what we know.
+Goal: tips are AI-generated from user context, not just raw Todoist tasks. Multiple signal sources feed a generalized pipeline. Research-intensive milestone.
- [ ] Integrations: Google Calendar, Apple Health (web import), generic webhook ingress
+
- [ ] Unified `Profile` model (identity, preferences, contexts, consents)
+**AI infrastructure (unblock everything else):**
- [ ] Timing signals (Page Visibility, Idle Detection, coarse location) — opt-in, transparent
+- [ ] `ai` compose profile — Ollama + LiteLLM for local dev; env vars `OLLAMA_URL` / `LITELLM_URL` (#86)
- [ ] Advice library + mixing policy (todo vs advice vs ambient)
+- [ ] AI gateway — wire `ml/serving` to LiteLLM; model aliases `tip-generator` + `embedder` (#87)
- [ ] User-facing data dashboard: what's stored, what's computed, export, delete-by-category
+
- [ ] Cost/usage observability
+**AI tip generation pipeline:**
 - [ ] Context assembler — user signals + feature store → structured prompt context (`ml/features/context.py`) (#88)
 - [ ] Tip generator endpoint — `POST /generate` in `ml/serving`; LLM → N typed `TipCandidate` objects (#79)
 - [ ] `TipCandidate` shared schema — `{content, kind, source, model, prompt_version, confidence}`; update recommender pipeline (#89)
 - [ ] LLM output validation + retry — JSON schema gate, clarification retry (2×), fallback to task-based (#90)
 - [ ] Prompt versioning — `prompt_version` + `model` columns in `tip_scores`; content-hash invalidation (#91)
 - [ ] LLM tip quality dashboard — reaction breakdown by model / prompt_version in `/admin/reward-analytics` (#92)
 **Evaluation & model selection:**
 - [ ] Model benchmark — compare qwen2.5:7b / llama3.2:3b / gemma3:4b via offline sim + LLM judge (#93)
 - [ ] LLM prompt research — persona design, context injection strategies, few-shot examples (#84)
 **Pipeline architecture:**
 - [ ] Signal source abstraction — `SignalSource` interface generalizing beyond Todoist (#78)
 - [ ] Generalized recommendation pipeline — candidate → rank → render stages (#80)
 - [ ] Feature registry + user profile builder — centralized features, persistent profiles (#81)
 - [ ] Tip kind system — task, advice, insight, reminder with kind-aware UI + rewards (#82)
 **Policy research:**
 - [ ] Next-gen policies — Thompson sampling, neural bandits, hybrid transfer learning (#83)
 **Integrations & infra (carried from M1):**
 - [ ] Apple OAuth (#7)
 - [ ] NATS JetStream replacing in-process bus (#21)
 - [ ] Todoist sync via events (#22)
 - [ ] Event schema registry + protobuf CI gate (#54)
 - [ ] Per-user freshness SLAs for features (#61)
 - [ ] CI skeleton (#3), observability (#18), E2E tests (#20)
 **Bugs (fix before new features):**
 - [ ] TipFeedback type mismatch (#73)
 - [ ] Todoist token refresh (#74)
 - [ ] Reward fire-and-forget (#75)
 - [ ] Data retention purge (#76)
 - [ ] Port mismatch (#77)
 ### Phase 3 — Native mobile  *(M3)*
 - [ ] iOS app (SwiftUI) with APNs push
 - [ ] Android app (Compose) with FCM push
 - [ ] `notifier` gains APNs + FCM channels, per-device rate limits
 - [ ] Migrate auth from Auth.js to dedicated OIDC provider (trigger from ADR-0004)
 - [ ] Consolidate MLflow + Airflow behind shared OIDC (SSO for all internal services)
 - [ ] Decide-and-deliver scheduler: per-user "is this tip worth interrupting now?" threshold
 ### Phase 4 — MLOps at scale  *(M4)*
- [ ] Prefect/Airflow for batch feature materialization + retraining
+- [x] Airflow + MLflow deployed as external services (`mlops` compose profile); each with own auth
- [ ] MLflow registry; shadow → A/B → launch pipeline as first-class
+- [ ] Write first retraining DAG (Airflow) + first MLflow experiment logging from `ml/serving`
 - [ ] Feature-to-prompt pipeline — nightly Airflow DAG materializes context for LLM; cuts inline latency (#94)
 - [ ] Prompt optimization loop — sim A/B → MLflow experiment → human-approved promotion (#95)
 - [ ] LLM fine-tuning — tip reactions as training signal; LoRA on base model; MLflow tracks runs (#96)
 - [ ] Embedding-based task clustering — `nomic-embed-text` for dedup + user pattern features (#97)
 - [ ] Consolidate MLflow + Airflow auth into shared OIDC provider (tracked as M3 issue #85)
 - [ ] Shadow → A/B → launch pipeline as first-class in MLflow
 - [ ] Online experiments framework: deterministic assignment + bandit policies alongside fixed-split A/B
 - [ ] Cross-user collaborative features (opt-in only); cohort slicing; fairness checks
- [ ] Drift monitoring (feature drift, prediction drift, reward drift); model cards per version
+- [ ] Drift monitoring (feature + prediction + reward drift); model cards per LLM version
 ### Phase 5 — Production hardening  *(M5)*
 - [ ] Audit logging, rotation of provider tokens + internal signing keys
--- a/apps/admin/next.config.ts
+++ b/apps/admin/next.config.ts
@@ -1,6 +1,10 @@
 import type { NextConfig } from 'next';
 import path from 'node:path';
 const nextConfig: NextConfig = {
  output: 'standalone',
  outputFileTracingRoot: path.join(__dirname, '../../'),
  basePath: '/admin',
  async rewrites() {
    return [
      {
--- a/apps/admin/src/app/docs/[category]/[slug]/page.tsx
+++ b/apps/admin/src/app/docs/[category]/[slug]/page.tsx
@@ -17,14 +17,15 @@ function isDocCategory(value: string): value is DocCategory {
 export default async function DocDetailPage({
  params,
 }: {
-  params: { category: string; slug: string };
+  params: Promise<{ category: string; slug: string }>;
 }) {
-  if (!isDocCategory(params.category)) notFound();
+  const { category, slug } = await params;
  if (!isDocCategory(category)) notFound();
-  const doc = await getDoc(params.category, params.slug);
+  const doc = await getDoc(category, slug);
  if (!doc) notFound();
-  const categoryLabel = CATEGORY_LABELS[params.category];
+  const categoryLabel = CATEGORY_LABELS[category];
  return (
    <AdminShell>
--- a/apps/admin/src/app/experiments/page.tsx
+++ b/apps/admin/src/app/experiments/page.tsx
@@ -1,124 +1,89 @@
 'use client';
 import { useEffect, useState } from 'react';
 import { AdminShell } from '@/components/AdminShell';
 import { resetBandit } from '@/lib/api';
-interface BanditStats {
+const mlflowUrl = process.env.NEXT_PUBLIC_MLFLOW_URL ?? '/mlflow';
-  user_id: string;
+const airflowUrl = process.env.NEXT_PUBLIC_AIRFLOW_URL ?? '/airflow';
  pulls: number;
  reward_count: number;
  cumulative_reward: number;
  estimated_mean_reward: number;
  theta: number[];
  last_updated: string | null;
 }
 const FEATURE_LABELS = ['hour_sin', 'hour_cos', 'is_overdue', 'task_age', 'priority'];
 export default function ExperimentsPage() {
  const [userId, setUserId] = useState('');
  const [stats, setStats] = useState<BanditStats | null>(null);
  const [loading, setLoading] = useState(false);
  const [resetting, setResetting] = useState(false);
  const [error, setError] = useState('');
  const [resetMsg, setResetMsg] = useState('');
  const fetchStats = async () => {
    if (!userId.trim()) return;
    setLoading(true);
    setError('');
    try {
      const res = await fetch(`/api/ml/stats/${encodeURIComponent(userId.trim())}`, { credentials: 'include' });
      if (!res.ok) throw new Error(res.statusText);
      setStats(await res.json());
    } catch (e: any) {
      setError(e.message);
    } finally {
      setLoading(false);
    }
  };
  const handleReset = async () => {
    if (!userId.trim()) return;
    if (!confirm(`Reset LinUCB state for user ${userId}?`)) return;
    setResetting(true);
    try {
      await resetBandit(userId.trim());
      setResetMsg('Bandit state reset.');
      setStats(null);
    } catch (e: any) {
      setError(e.message);
    } finally {
      setResetting(false);
    }
  };
  return (
    <AdminShell>
      <div className="space-y-6">
-        <h1 className="text-xl font-semibold">Experiment dashboard</h1>
+        <h1 className="text-xl font-semibold">MLOps</h1>
-        <p className="text-sm text-gray-500">LinUCB per-user bandit stats pulled from ml/serving.</p>
+        <p className="text-sm text-gray-500">
          Experiment tracking, dataset management, and pipeline orchestration live in dedicated external services.
          Each has its own auth — see{' '}
          <a href="/admin/docs/ops/mlops" className="text-indigo-400 hover:underline">MLOps runbook</a>
          {' '}for credentials and first-time setup.
        </p>
-        <div className="flex gap-2">
+        <section className="space-y-3">
-          <input
+          <h2 className="text-sm font-semibold text-gray-400 uppercase tracking-wider">Experiment tracking</h2>
-            value={userId}
+          <div className="grid gap-3 md:grid-cols-2">
-            onChange={(e) => setUserId(e.target.value)}
+            <ExternalCard
-            onKeyDown={(e) => e.key === 'Enter' && fetchStats()}
+              title="Experiments"
-            placeholder="User ID"
+              description="Training runs · metrics · parameter sweeps · run comparison"
-            className="bg-gray-900 border border-gray-700 rounded px-3 py-1.5 text-sm text-gray-300 w-80"
+              href={`${mlflowUrl}/#/experiments`}
              label="Open in MLflow ↗"
            />
            <ExternalCard
              title="Registered models"
              description="Model versions · stage promotion (Staging → Production) · artifact browser"
              href={`${mlflowUrl}/#/models`}
              label="Open in MLflow ↗"
            />
          <button onClick={fetchStats} className="bg-indigo-600 hover:bg-indigo-500 text-white rounded px-4 py-1.5 text-sm">
            Load
          </button>
          {stats && (
            <button onClick={handleReset} disabled={resetting} className="bg-red-800 hover:bg-red-700 text-white rounded px-4 py-1.5 text-sm disabled:opacity-50">
              Reset bandit
            </button>
          )}
          </div>
        </section>
-        {error && <p className="text-red-400 text-sm">{error}</p>}
+        <section className="space-y-3">
-        {resetMsg && <p className="text-green-400 text-sm">{resetMsg}</p>}
+          <h2 className="text-sm font-semibold text-gray-400 uppercase tracking-wider">Pipeline orchestration</h2>
-        {loading && <p className="text-gray-500 text-sm">Loading…</p>}
+          <div className="grid gap-3 md:grid-cols-2">
            <ExternalCard
              title="DAGs"
              description="Batch feature materialization · retraining pipelines · data quality jobs"
              href={`${airflowUrl}/dags`}
              label="Open in Airflow ↗"
            />
            <ExternalCard
              title="Dataset lineage"
              description="Pipeline runs · dataset inputs/outputs · data versioning"
              href={`${airflowUrl}/datasets`}
              label="Open in Airflow ↗"
            />
          </div>
        </section>
-        {stats && (
+        <section className="space-y-2 pt-2 border-t border-gray-800">
-          <div className="grid grid-cols-2 gap-4 md:grid-cols-4">
+          <h2 className="text-sm font-semibold text-gray-400 uppercase tracking-wider">Bandit state ops</h2>
-            <StatCard label="Pulls" value={stats.pulls} />
+          <p className="text-xs text-gray-500">
-            <StatCard label="Reward samples" value={stats.reward_count} />
+            Per-user LinUCB reset is available on the{' '}
-            <StatCard label="Cumulative reward" value={stats.cumulative_reward.toFixed(2)} />
+            <a href="/admin/users" className="text-indigo-400 hover:underline">Users page</a>
-            <StatCard label="Mean reward" value={stats.estimated_mean_reward.toFixed(3)} />
+            {' '}→ user detail view.
-          </div>
+          </p>
-        )}
+        </section>
        {stats?.theta && (
          <div className="space-y-2">
            <h2 className="text-sm font-medium text-gray-400">θ (learned weight vector)</h2>
            <div className="flex gap-3 flex-wrap">
              {stats.theta.map((v, i) => (
                <div key={i} className="bg-gray-900 border border-gray-800 rounded p-3 text-center min-w-[100px]">
                  <div className="text-xs text-gray-500 mb-1">{FEATURE_LABELS[i] ?? `feat_${i}`}</div>
                  <div className={`text-sm font-mono ${v > 0 ? 'text-green-400' : v < 0 ? 'text-red-400' : 'text-gray-400'}`}>
                    {v.toFixed(4)}
                  </div>
                </div>
              ))}
            </div>
            {stats.last_updated && (
              <p className="text-xs text-gray-600">Last updated: {stats.last_updated}</p>
            )}
          </div>
        )}
      </div>
    </AdminShell>
  );
 }
-function StatCard({ label, value }: { label: string; value: string | number }) {
+function ExternalCard({ title, description, href, label }: {
  title: string;
  description: string;
  href: string;
  label: string;
 }) {
  return (
-    <div className="bg-gray-900 border border-gray-800 rounded p-4">
+    <div className="bg-gray-900 border border-gray-800 rounded-lg p-5 flex items-start justify-between gap-4">
-      <div className="text-xs text-gray-500 mb-1">{label}</div>
+      <div className="space-y-1">
-      <div className="text-2xl font-semibold text-white">{value}</div>
+        <h2 className="text-sm font-medium text-gray-200">{title}</h2>
        <p className="text-xs text-gray-500">{description}</p>
      </div>
      <a
        href={href}
        target="_blank"
        rel="noreferrer"
        className="flex-shrink-0 text-indigo-400 hover:text-indigo-300 text-xs whitespace-nowrap"
      >
        {label}
      </a>
    </div>
  );
 }
--- a/apps/admin/src/app/login/page.tsx
+++ b/apps/admin/src/app/login/page.tsx
@@ -5,7 +5,7 @@ export default function LoginPage() {
        <h1 className="text-2xl font-semibold">oO Admin</h1>
        <p className="text-gray-400 text-sm">Sign in via the main app first, then return here.</p>
        <a
-          href={`${process.env.NEXT_PUBLIC_WEB_URL ?? 'http://localhost:3079'}/sign-in`}
+          href="/sign-in"
          className="inline-block px-4 py-2 bg-white text-black rounded text-sm font-medium hover:bg-gray-200 transition-colors"
        >
          Sign in with Google
--- a/apps/admin/src/app/models/page.tsx
+++ b/apps/admin/src/app/models/page.tsx
@@ -1,30 +1,53 @@
 import { AdminShell } from '@/components/AdminShell';
-export default function ModelsPage() {
+const mlflowUrl = process.env.NEXT_PUBLIC_MLFLOW_URL ?? '/mlflow';
  const mlflowUrl = process.env.NEXT_PUBLIC_MLFLOW_URL ?? 'http://localhost:5000';
 export default function ModelsPage() {
  return (
    <AdminShell>
-      <div className="space-y-4 h-[calc(100vh-4rem)]">
+      <div className="space-y-6">
        <div className="flex items-center justify-between flex-shrink-0">
        <h1 className="text-xl font-semibold">Model registry</h1>
-          <a href={mlflowUrl} target="_blank" rel="noreferrer" className="text-xs text-gray-400 hover:text-white border border-gray-700 rounded px-2 py-1">
+        <p className="text-sm text-gray-500">
-            Open MLflow ↗
+          Model lifecycle (runs, versions, promotions, artifacts) is managed in MLflow.
-          </a>
+          Auth is separate — log in with your MLflow credentials.
        </div>
        <p className="text-sm text-gray-500 flex-shrink-0">
          MLflow is embedded below when running under the <code className="text-xs bg-gray-800 px-1 rounded">full</code> compose profile.
          Promote or archive model versions via the MLflow UI; each action writes to the audit log automatically.
        </p>
-        <div className="flex-1 rounded border border-gray-800 overflow-hidden" style={{ height: 'calc(100vh - 12rem)' }}>
+        <ExternalCard
          <iframe
            src={`${mlflowUrl}/#/models`}
            className="w-full h-full bg-white"
          title="MLflow Model Registry"
-            sandbox="allow-scripts allow-same-origin allow-forms allow-popups"
+          description="Experiment runs · registered models · version promotion · artifact browser"
          href={mlflowUrl}
          label="Open MLflow"
        />
        <ExternalCard
          title="MLflow Experiments"
          description="Training runs, metrics, parameters, and comparison across runs"
          href={`${mlflowUrl}/#/experiments`}
          label="Browse experiments"
        />
        </div>
      </div>
    </AdminShell>
  );
 }
 function ExternalCard({ title, description, href, label }: {
  title: string;
  description: string;
  href: string;
  label: string;
 }) {
  return (
    <div className="bg-gray-900 border border-gray-800 rounded-lg p-5 flex items-start justify-between gap-4">
      <div className="space-y-1">
        <h2 className="text-sm font-medium text-gray-200">{title}</h2>
        <p className="text-xs text-gray-500">{description}</p>
      </div>
      <a
        href={href}
        target="_blank"
        rel="noreferrer"
        className="flex-shrink-0 bg-indigo-600 hover:bg-indigo-500 text-white text-xs rounded px-3 py-1.5 whitespace-nowrap"
      >
        {label} ↗
      </a>
    </div>
  );
 }
--- a/apps/admin/src/app/users/[id]/page.tsx
+++ b/apps/admin/src/app/users/[id]/page.tsx
@@ -3,10 +3,15 @@ import { UserDetail } from '@/components/UserDetail';
 export const dynamic = 'force-dynamic';
-export default function UserDetailPage({ params }: { params: { id: string } }) {
+export default async function UserDetailPage({
  params,
 }: {
  params: Promise<{ id: string }>;
 }) {
  const { id } = await params;
  return (
    <AdminShell>
-      <UserDetail userId={params.id} />
+      <UserDetail userId={id} />
    </AdminShell>
  );
 }
--- a/apps/admin/src/components/AdminShell.tsx
+++ b/apps/admin/src/components/AdminShell.tsx
@@ -3,14 +3,21 @@
 import Link from 'next/link';
 import { usePathname } from 'next/navigation';
-const NAV = [
+const mlflowUrl = process.env.NEXT_PUBLIC_MLFLOW_URL ?? '/mlflow';
 const airflowUrl = process.env.NEXT_PUBLIC_AIRFLOW_URL ?? '/airflow';
 type NavItem =
  | { href: string; label: string; external?: false }
  | { href: string; label: string; external: true };
 const NAV: NavItem[] = [
  { href: '/', label: 'Overview' },
  { href: '/users', label: 'Users' },
  { href: '/events', label: 'Events' },
  { href: '/features', label: 'Features' },
  { href: '/tips', label: 'Rec log' },
  { href: '/reward-analytics', label: 'Rewards' },
-  { href: '/experiments', label: 'Experiments' },
+  { href: '/experiments', label: 'MLOps' },
  { href: '/simulations', label: 'Simulations' },
  { href: '/models', label: 'Models' },
  { href: '/data-quality', label: 'Data quality' },
@@ -21,6 +28,11 @@ const NAV = [
  { href: '/docs', label: 'Docs' },
 ];
 const NAV_EXTERNAL: NavItem[] = [
  { href: mlflowUrl, label: 'MLflow ↗', external: true },
  { href: airflowUrl, label: 'Airflow ↗', external: true },
 ];
 export function AdminShell({ children }: { children: React.ReactNode }) {
  const pathname = usePathname();
  return (
@@ -33,7 +45,7 @@ export function AdminShell({ children }: { children: React.ReactNode }) {
            Admin
          </span>
        </div>
-        <nav className="flex-1 px-2 py-3 space-y-0.5">
+        <nav className="flex-1 px-2 py-3 space-y-0.5 overflow-y-auto">
          {NAV.map(({ href, label }) => {
            const active = href === '/' ? pathname === '/' : pathname.startsWith(href);
            return (
@@ -50,6 +62,20 @@ export function AdminShell({ children }: { children: React.ReactNode }) {
              </Link>
            );
          })}
          <div className="pt-3 pb-1 px-3">
            <span className="text-xs text-gray-600 uppercase tracking-wider font-medium">MLOps</span>
          </div>
          {NAV_EXTERNAL.map(({ href, label }) => (
            <a
              key={href}
              href={href}
              target="_blank"
              rel="noreferrer"
              className="flex items-center px-3 py-2 rounded text-sm text-gray-500 hover:text-white hover:bg-gray-900 transition-colors"
            >
              {label}
            </a>
          ))}
        </nav>
      </aside>
      {/* Main content */}
--- a/apps/admin/src/middleware.ts
+++ b/apps/admin/src/middleware.ts
@@ -16,9 +16,13 @@ export async function middleware(req: NextRequest) {
    return NextResponse.redirect(url);
  }
-  // Verify admin role via API. The API is same-origin in production (Caddy routes
+  // Verify admin role via API. INTERNAL_API_URL (e.g. http://api:3078) is preferred
-  // /api/* to the Express service), so we use the rewrite target in dev.
+  // when set — it points to the API service on the internal Docker network, avoiding
-  const apiBase = process.env.NEXT_PUBLIC_API_URL ?? 'http://localhost:3078';
+  // a Caddy round-trip. Falls back to NEXT_PUBLIC_API_URL for dev, or localhost.
  const apiBase =
    process.env.INTERNAL_API_URL ||
    process.env.NEXT_PUBLIC_API_URL ||
    'http://localhost:3078';
  try {
    const profile = await fetch(`${apiBase}/api/user/me`, {
      headers: { cookie: `sid=${sid}` },
@@ -41,5 +45,5 @@ export async function middleware(req: NextRequest) {
 }
 export const config = {
-  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
+  matcher: ['/', '/((?!_next/static|_next/image|favicon.ico).*)'],
 };
--- a/apps/web/next.config.ts
+++ b/apps/web/next.config.ts
@@ -1,6 +1,9 @@
 import type { NextConfig } from 'next';
 import path from 'node:path';
 const nextConfig: NextConfig = {
  output: 'standalone',
  outputFileTracingRoot: path.join(__dirname, '../../'),
  async rewrites() {
    return [
      {
--- a/apps/web/tsconfig.tsbuildinfo
+++ b/apps/web/tsconfig.tsbuildinfo
--- a/docs/adr/0006-admin-console-framework.md
+++ b/docs/adr/0006-admin-console-framework.md
@@ -28,15 +28,16 @@ Same stack as `apps/web`. Reuses `packages/shared-types`, the Auth.js session co
 | Heavy grids | **TanStack Table v8** | Sortable / paginated / virtualized tables for events, users, tips. |
 | Extra charts | **Recharts** | Fallback where Tremor falls short (histograms, distributions). |
-### Embed, don't rebuild
+### Link out, don't embed
-Specialized tooling is **reverse-proxied into the admin shell**, not reimplemented:
+Specialized MLOps tooling runs as **separate external services** with their own auth, linked from the admin shell — not embedded or reimplemented:
- **MLflow UI** → `/admin/models` (Caddy sub-path proxy)
+- **MLflow** → `https://o.alogins.net/mlflow` — experiment tracking, model registry, artifact browser; own basic-auth for now; see M3 for SSO consolidation
- **Grafana panels** → `/admin/infra` (iframed or embedded panels)
+- **Airflow** → `https://o.alogins.net/airflow` — batch pipeline orchestration, dataset management; own web-auth for now
 - **Grafana panels** → `/admin/infra` (iframed panels) — infra metrics
 - **Marimo notebooks** → launch-out link from admin
-This prevents reimplementing artifact browsers or graph renderers we'd never do as well.
+The admin shell links to these services; clicking them opens a new tab. The `/experiments` and `/models` admin pages are hub pages with direct links to the relevant MLflow/Airflow views.
 ### AuthZ
@@ -55,5 +56,7 @@ This prevents reimplementing artifact browsers or graph renderers we'd never do
 - One more Next.js app in the monorepo. Build/dev added to Turborepo.
 - Tremor + shadcn/ui are added as dependencies. shadcn components are copied into `apps/admin/src/components/ui/` — no runtime version coupling.
- MLflow and Grafana must be reachable from the Caddy reverse proxy; they are not embedded in the JS bundle.
+- MLflow (`o.alogins.net/mlflow*` → port 5000) and Airflow (`o.alogins.net/airflow*` → port 8080) are path-based routes in the existing `o.alogins.net` Caddy block, started via `docker compose --profile mlops up`.
 - Each service manages its own auth (MLflow: built-in basic-auth; Airflow: built-in web UI auth). M3 will consolidate both behind the shared OIDC provider.
 - The `NEXT_PUBLIC_MLFLOW_URL` and `NEXT_PUBLIC_AIRFLOW_URL` build args in `Dockerfile.admin` default to the production URLs; override for dev builds.
 - `admin_actions` audit log grows unboundedly — needs a retention policy before M4.
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -46,21 +46,42 @@ User reactions (done / snooze / dismiss) are events too. They close the loop as
 - **Protobuf** for event schemas with a schema registry (ADR-0005) — train/serve parity depends on this.
 - **OpenAPI** for HTTP; TS client auto-generated; Python pydantic hand-written while consumers are few.
 - **Feast** for feature store when we get there; homegrown adapter until then (Phase 1 seam).
- **MLflow** for model registry; artifacts in MinIO/S3.
+- **MLflow** for model registry and experiment tracking; deployed at `o.alogins.net/mlflow`.
 - **Airflow** for batch pipelines; deployed at `o.alogins.net/airflow`.
 - **Auth.js** embedded behind an OIDC-shaped boundary (ADR-0004). Swap to a standalone OIDC provider when mobile ships.
 - **k3s** as the first step beyond docker-compose — no "compose → full k8s" cliff.
-## Decision flow for a new tip
+## AI stack
 All LLM inference routes through **LiteLLM** (`llm.alogins.net`) backed by **Ollama** (local, `localhost:11434`). This means:
 - Model aliases (`tip-generator`, `embedder`, `judge`) decouple code from model names.
 - Swapping qwen2.5 → llama3.2 = one-line config change in LiteLLM, zero code change in oO.
 - Cloud fallback (Anthropic) is opt-in and gated behind `ANTHROPIC_API_KEY` — used only in offline simulation.
 **OpenWebUI** (`ai.alogins.net`) is the human-facing interface for prompt iteration and model testing during development.
 ## Decision flow for a new tip (Phase 2 target)
 ```
-client ─► gateway ─► recommender
+client ─► gateway ─► recommender (TS)
                          │
-                       ├─► candidates:   integrations.fetchCandidates(user)  + advice.library
+                          ▼
-                       ├─► context:      FeatureAssembler(user, request)
+                     ml/serving (Python)
-                       ├─► policy:       PolicyRegistry.get(policyName).pick(candidates, context)
+                          │
-                       ├─► shadows:      run shadow policies in parallel, log their picks
+                          ├─► context:    ml/features/context.py
-                       └─► persist:      TipInstance{context_snapshot, policy, tip}
+                          │               (tasks + reactions + time patterns → prompt)
-                       ◄─  tip
+                          │
                          ├─► generate:   LiteLLM → Ollama
                          │               → N TipCandidates {content, kind, model, prompt_version}
                          │
                          ├─► score:      bandit policy scores each candidate
                          │
                          ├─► shadows:    shadow policies log picks without serving
                          │
                          └─► persist:    tip_scores {candidate, policy, features, latency}
                          ◄─  best TipCandidate
 ```
-Feedback travels back the same path: `POST /feedback → events.emit(feedback.reaction)` → pipelines consume → bandit/model updated on next retrain.
+**Phase 1 (current):** candidates come from Todoist task list, no LLM. The bandit scores tasks directly.
 Feedback: `POST /feedback → events.emit(reaction)` → online bandit update + `prompt_version` tracked for A/B analysis.
--- a/infra/docker/Dockerfile.admin
+++ b/infra/docker/Dockerfile.admin
@@ -0,0 +1,32 @@
 FROM node:22-alpine AS base
 RUN npm install -g pnpm
 FROM base AS deps
 WORKDIR /app
 COPY package.json pnpm-workspace.yaml pnpm-lock.yaml* ./
 COPY packages/shared-types/package.json ./packages/shared-types/
 COPY apps/admin/package.json ./apps/admin/
 RUN pnpm install --frozen-lockfile
 FROM base AS builder
 WORKDIR /app
 COPY --from=deps /app/node_modules ./node_modules
 COPY --from=deps /app/packages/shared-types/node_modules ./packages/shared-types/node_modules
 COPY --from=deps /app/apps/admin/node_modules ./apps/admin/node_modules
 COPY tsconfig.base.json ./
 COPY packages/shared-types ./packages/shared-types
 COPY apps/admin ./apps/admin
 RUN pnpm --filter @oo/shared-types build
 ARG NEXT_PUBLIC_MLFLOW_URL=/mlflow
 ARG NEXT_PUBLIC_AIRFLOW_URL=/airflow
 ENV NEXT_TELEMETRY_DISABLED=1 \
    NEXT_PUBLIC_MLFLOW_URL=$NEXT_PUBLIC_MLFLOW_URL \
    NEXT_PUBLIC_AIRFLOW_URL=$NEXT_PUBLIC_AIRFLOW_URL
 RUN pnpm --filter @oo/admin build
 FROM node:22-alpine AS runner
 ENV NODE_ENV=production NEXT_TELEMETRY_DISABLED=1 PORT=3080
 WORKDIR /app
 COPY --from=builder /app/apps/admin/.next/standalone ./
 COPY --from=builder /app/apps/admin/.next/static ./apps/admin/.next/static
 CMD ["node", "apps/admin/server.js"]
--- a/infra/docker/Dockerfile.api
+++ b/infra/docker/Dockerfile.api
@@ -22,7 +22,7 @@ RUN pnpm --filter @oo/api build
 FROM node:22-alpine AS runner
 WORKDIR /app
 RUN npm install -g pnpm
-COPY package.json pnpm-workspace.yaml ./
+COPY package.json pnpm-workspace.yaml pnpm-lock.yaml* ./
 COPY packages/shared-types/package.json ./packages/shared-types/
 COPY services/api/package.json ./services/api/
 RUN pnpm install --prod --frozen-lockfile
--- a/infra/docker/docker-compose.yml
+++ b/infra/docker/docker-compose.yml
@@ -10,15 +10,13 @@ services:
    profiles: [core, full]
    env_file: ../../.env.local
    environment:
      DATABASE_PATH: /data/oo.db
      PORT: "3001"
      NODE_ENV: production
    volumes:
-      - api-data:/data
+      - /mnt/ssd/dbs/oo:/mnt/ssd/dbs/oo
    ports:
-      - "3001:3001"
+      - "127.0.0.1:3078:3078"
    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:3001/health"]
+      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3078/health"]
      interval: 10s
      timeout: 5s
      retries: 5
@@ -30,9 +28,30 @@ services:
    profiles: [core, full]
    env_file: ../../.env.local
    environment:
-      NEXT_PUBLIC_API_URL: ""   # rewrites proxy to /api, no cross-origin needed in prod
+      NODE_ENV: production
      PORT: "3079"
      HOSTNAME: "0.0.0.0"
      NEXT_PUBLIC_API_URL: ""   # Caddy routes /api/* directly to the API in prod
    ports:
-      - "3000:3000"
+      - "127.0.0.1:3079:3079"
    depends_on:
      api:
        condition: service_healthy
  admin:
    build:
      context: ../..
      dockerfile: infra/docker/Dockerfile.admin
    profiles: [core, full]
    env_file: ../../.env.local
    environment:
      NODE_ENV: production
      PORT: "3080"
      HOSTNAME: "0.0.0.0"
      NEXT_PUBLIC_API_URL: ""
      INTERNAL_API_URL: "http://api:3078"
    ports:
      - "127.0.0.1:3080:3080"
    depends_on:
      api:
        condition: service_healthy
@@ -45,12 +64,117 @@ services:
      dockerfile: infra/docker/Dockerfile.ml
    profiles: [full]
    ports:
-      - "8000:8000"
+      - "127.0.0.1:8000:8000"
    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 5
-volumes:
+  # ── mlops profile — MLflow + Airflow ──────────────────────────────────────
-  api-data:
+  # Start: docker compose --profile mlops up
  # MLflow UI:  http://localhost:5000       or https://o.alogins.net/mlflow  (admin / password — change via basic_auth.ini)
  # Airflow UI: http://localhost:8080/airflow  or https://o.alogins.net/airflow  (admin / AIRFLOW_ADMIN_PASSWORD)
  # Caddy routes /mlflow* and /airflow* inside the o.alogins.net block
  airflow-db:
    image: postgres:16-alpine
    profiles: [mlops]
    environment:
      POSTGRES_DB: airflow
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: ${AIRFLOW_DB_PASSWORD:-airflow}
    volumes:
      - /mnt/ssd/dbs/oo/airflow-db:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U airflow"]
      interval: 10s
      timeout: 5s
      retries: 5
  airflow-init:
    image: apache/airflow:2.9.3
    profiles: [mlops]
    entrypoint: /bin/bash
    command:
      - -c
      - |
        airflow db migrate
        airflow users create \
          --username admin \
          --firstname Admin \
          --lastname User \
          --role Admin \
          --email admin@oo.local \
          --password "$${AIRFLOW_ADMIN_PASSWORD:-admin}"
    environment:
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${AIRFLOW_DB_PASSWORD:-airflow}@airflow-db/airflow
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__WEBSERVER__SECRET_KEY: ${AIRFLOW_SECRET_KEY:-change-me-in-prod}
      AIRFLOW__WEBSERVER__BASE_URL: ${AIRFLOW_BASE_URL:-https://o.alogins.net/airflow}
    depends_on:
      airflow-db:
        condition: service_healthy
    restart: "no"
  airflow-webserver:
    image: apache/airflow:2.9.3
    profiles: [mlops]
    command: webserver
    environment:
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${AIRFLOW_DB_PASSWORD:-airflow}@airflow-db/airflow
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__WEBSERVER__SECRET_KEY: ${AIRFLOW_SECRET_KEY:-change-me-in-prod}
      AIRFLOW__CORE__FERNET_KEY: ${AIRFLOW_FERNET_KEY:-}
      AIRFLOW__WEBSERVER__BASE_URL: ${AIRFLOW_BASE_URL:-https://o.alogins.net/airflow}
    volumes:
      - ../../ml/pipelines:/opt/airflow/dags:ro
    ports:
      - "127.0.0.1:8080:8080"
    depends_on:
      airflow-init:
        condition: service_completed_successfully
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
  airflow-scheduler:
    image: apache/airflow:2.9.3
    profiles: [mlops]
    command: scheduler
    environment:
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${AIRFLOW_DB_PASSWORD:-airflow}@airflow-db/airflow
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__CORE__FERNET_KEY: ${AIRFLOW_FERNET_KEY:-}
    volumes:
      - ../../ml/pipelines:/opt/airflow/dags:ro
    depends_on:
      airflow-init:
        condition: service_completed_successfully
  mlflow:
    image: ghcr.io/mlflow/mlflow:2.14.3
    profiles: [mlops]
    command: >
      mlflow server
      --backend-store-uri sqlite:////mlflow/mlflow.db
      --default-artifact-root /mlflow/artifacts
      --host 0.0.0.0
      --port 5000
      --app-name basic-auth
      --static-prefix /mlflow
    environment:
      MLFLOW_AUTH_CONFIG_PATH: /mlflow/basic_auth.ini
    volumes:
      - /mnt/ssd/dbs/oo/mlflow:/mlflow
      - ../../infra/mlflow/basic_auth.ini:/mlflow/basic_auth.ini:ro
    ports:
      - "127.0.0.1:5000:5000"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5000/health"]
      interval: 10s
      timeout: 5s
      retries: 5
--- a/infra/mlflow/basic_auth.ini
+++ b/infra/mlflow/basic_auth.ini
@@ -0,0 +1,6 @@
 [mlflow]
 default_permission = NO_PERMISSIONS
 database_uri = sqlite:////mlflow/basic_auth.db
 admin_username = admin
 # Change this before deploying — the admin can reset other users' passwords via the MLflow UI
 admin_password = password
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -45,9 +45,6 @@ importers:
        specifier: ^2.15.3
        version: 2.15.4(react-dom@19.2.5(react@19.2.5))(react@19.2.5)
    devDependencies:
      '@types/marked':
        specifier: ^6.0.0
        version: 6.0.0
      '@types/node':
        specifier: ^22.10.5
        version: 22.19.17
@@ -1335,10 +1332,6 @@ packages:
  '@types/http-errors@2.0.5':
    resolution: {integrity: sha512-r8Tayk8HJnX0FztbZN7oVqGccWgw98T/0neJphO91KkmOzug1KkofZURD4UaD5uH8AqcFLfdPErnBod0u71/qg==}
  '@types/marked@6.0.0':
    resolution: {integrity: sha512-jmjpa4BwUsmhxcfsgUit/7A9KbrC48Q0q8KvnY107ogcjGgTFDlIL3RpihNpx2Mu1hM4mdFQjoVc4O6JoGKHsA==}
    deprecated: This is a stub types definition. marked provides its own type definitions, so you do not need this installed.
  '@types/node@22.19.17':
    resolution: {integrity: sha512-wGdMcf+vPYM6jikpS/qhg6WiqSV/OhG+jeeHT/KlVqxYfD40iYJf9/AE1uQxVWFvU7MipKRkRv8NSHiCGgPr8Q==}
@@ -3817,10 +3810,6 @@ snapshots:
  '@types/http-errors@2.0.5': {}
  '@types/marked@6.0.0':
    dependencies:
      marked: 14.1.4
  '@types/node@22.19.17':
    dependencies:
      undici-types: 6.21.0
--- a/services/api/src/config.ts
+++ b/services/api/src/config.ts
@@ -14,7 +14,7 @@ function optional(name: string, fallback: string): string {
 }
 export const config = {
-  PORT: parseInt(optional('PORT', '3078'), 10),
+  PORT: parseInt(optional('PORT', '3001'), 10),
  NODE_ENV: optional('NODE_ENV', 'development'),
  DATABASE_PATH: optional('DATABASE_PATH', './data/oo.db'),
--- a/services/api/src/events/bus.ts
+++ b/services/api/src/events/bus.ts
@@ -22,12 +22,27 @@ export type TipServedEvent = {
 export type TipFeedbackEvent = {
  userId: string;
  tipId: string;
-  action: 'done' | 'dismiss' | 'snooze';
+  action: 'done' | 'dismiss' | 'snooze' | 'helpful' | 'not_helpful';
  reward: number;   // inferred from action + dwellMs (see inferReward in recommender.ts)
  dwellMs: number | null;
  createdAt: string;
 };
 export type IntegrationTokenExpiredEvent = {
  userId: string;
  provider: string;
  detectedAt: string;
 };
 export type RewardDeliveryFailedEvent = {
  userId: string;
  tipId: string;
  reward: number;
  attempts: number;
  error: string;
  failedAt: string;
 };
 export type TaskSyncedEvent = {
  userId: string;
  count: number;
@@ -37,7 +52,9 @@ export type TaskSyncedEvent = {
 type EventMap = {
  'signals.tip.served': TipServedEvent;
  'signals.tip.feedback': TipFeedbackEvent;
  'signals.tip.reward_failed': RewardDeliveryFailedEvent;
  'signals.task.synced': TaskSyncedEvent;
  'signals.integration.token_expired': IntegrationTokenExpiredEvent;
 };
 export type StoredEvent = {
--- a/services/api/src/index.ts
+++ b/services/api/src/index.ts
@@ -3,7 +3,9 @@ import express from 'express';
 import cookieParser from 'cookie-parser';
 import cors from 'cors';
 import { config } from './config.js';
-import { runMigrations } from './db/index.js';
+import { db, runMigrations } from './db/index.js';
 import { tipScores, tipFeedback } from './db/schema.js';
 import { lt } from 'drizzle-orm';
 import { sessionMiddleware } from './middleware/session.js';
 import { authRouter } from './routes/auth.js';
 import { integrationsRouter } from './routes/integrations.js';
@@ -20,6 +22,15 @@ import type { Request, Response } from 'express';
 await mkdir(dirname(config.DATABASE_PATH), { recursive: true });
 runMigrations();
 // Keep the API alive on stray async faults (e.g. a single bad admin route)
 // rather than dropping the whole process.
 process.on('unhandledRejection', (reason) => {
  console.error('[api] unhandledRejection', reason);
 });
 process.on('uncaughtException', (err) => {
  console.error('[api] uncaughtException', err);
 });
 const app = express();
 app.use(
@@ -61,6 +72,19 @@ app.use('/api/ml', requireAuth as any, requireAdmin as any, async (req: Request,
  }
 });
 async function purgeExpiredData() {
  const cutoff = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000).toISOString();
  try {
    await db.delete(tipScores).where(lt(tipScores.servedAt, cutoff));
    await db.delete(tipFeedback).where(lt(tipFeedback.createdAt, cutoff));
  } catch (err: any) {
    console.error(`[purge] retention cleanup failed: ${err.message}`);
  }
 }
 purgeExpiredData();
 setInterval(purgeExpiredData, 24 * 60 * 60 * 1000);
 app.listen(config.PORT, () => {
  console.log(`oO API listening on http://localhost:${config.PORT}`);
 });
--- a/services/api/src/routes/admin.ts
+++ b/services/api/src/routes/admin.ts
@@ -368,7 +368,7 @@ router.get('/reward-analytics', async (req: AuthenticatedRequest, res: Response)
    .select({
      action: tipFeedback.action,
      count: sql<number>`count(*)`,
-      avgHour: sql<number>`avg(json_extract(ts.features_json, '$.hour_of_day'))`,
+      avgHour: sql<number>`avg(json_extract(${tipScores.featuresJson}, '$.hour_of_day'))`,
    })
    .from(tipFeedback)
    .leftJoin(tipScores, eq(tipFeedback.tipId, tipScores.tipId))
@@ -683,6 +683,18 @@ router.post('/simulate/start', async (req: AuthenticatedRequest, res: Response)
    _simProcesses.set(id, { pid: child.pid, startedAt: now });
  }
  // Without this listener, a spawn failure (ENOENT when python3 is absent
  // — e.g. in the alpine api container) would emit an unhandled 'error' event
  // and crash the whole API process.
  child.on('error', async (err) => {
    console.error('[sim] spawn error', err);
    _simProcesses.delete(id);
    await db
      .update(simRuns)
      .set({ status: 'failed', finishedAt: new Date().toISOString() })
      .where(eq(simRuns.id, id));
  });
  // Capture stderr for debugging
  const stderrLines: string[] = [];
  child.stderr?.on('data', (d: Buffer) => stderrLines.push(d.toString()));
--- a/services/api/src/routes/recommender.ts
+++ b/services/api/src/routes/recommender.ts
@@ -65,7 +65,17 @@ async function fetchTodoistTasks(userId: string, accessToken: string): Promise<C
    headers: { Authorization: `Bearer ${accessToken}` },
  });
-  if (!res.ok) return cached?.tasks ?? [];
+  if (!res.ok) {
    if (res.status === 401) {
      console.error(`[todoist] token expired for user ${userId}`);
      bus.publish('signals.integration.token_expired', {
        userId,
        provider: 'todoist',
        detectedAt: new Date().toISOString(),
      });
    }
    return cached?.tasks ?? [];
  }
  const body = (await res.json()) as {
    results: Array<{
@@ -230,10 +240,10 @@ router.post('/recommend', requireAuth, async (req: AuthenticatedRequest, res: Re
 // ---------------------------------------------------------------------------
 // Reward inference from action + dwell time
 //
 // Feedback is now 3 signals only: done / snooze / dismiss.
 // "Helpfulness" is inferred from how long the user took to act on a tip:
 //   dismiss              → -1.0 (clear rejection)
 //   snooze               → +0.1 (tip noticed, timing off — mild positive)
 //   helpful              → +0.5 (explicit positive signal)
 //   not_helpful          → -0.5 (explicit negative signal)
 //   done < 15 s          → -0.3 (almost certainly a stale task, not magic)
 //   done 15 s – 2 min    → +1.0 (magic zone: user saw tip and acted)
 //   done 2 – 10 min      → +0.6 (good: user engaged, acted in same session)
@@ -242,6 +252,8 @@ router.post('/recommend', requireAuth, async (req: AuthenticatedRequest, res: Re
 function inferReward(action: string, dwellMs: number | null): number {
  if (action === 'dismiss')     return -1.0;
  if (action === 'snooze')      return 0.1;
  if (action === 'helpful')     return 0.5;
  if (action === 'not_helpful') return -0.5;
  // done — use dwell time
  if (dwellMs === null || dwellMs < 0) return 0.5; // unknown dwell: neutral positive
  if (dwellMs < 15_000)   return -0.3; // stale / reflex
@@ -250,6 +262,51 @@ function inferReward(action: string, dwellMs: number | null): number {
  return 0.3;                           // eventually
 }
 // ---------------------------------------------------------------------------
 // Reward delivery with retry (bug #75 — was fire-and-forget)
 // ---------------------------------------------------------------------------
 async function sendRewardWithRetry(
  userId: string,
  tipId: string,
  reward: number,
  features: TaskFeatures,
 ): Promise<void> {
  const body = JSON.stringify({
    user_id: userId,
    tip_id: tipId,
    reward,
    features,
    day_of_week: new Date().getDay(),
  });
  for (let attempt = 1; attempt <= 3; attempt++) {
    try {
      const res = await fetch(`${config.ML_SERVING_URL}/reward/egreedy`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body,
        signal: AbortSignal.timeout(3000),
      });
      if (res.ok) return;
      throw new Error(`HTTP ${res.status}`);
    } catch (err: any) {
      if (attempt === 3) {
        console.error(`[reward] failed after 3 attempts for tip ${tipId}: ${err.message}`);
        bus.publish('signals.tip.reward_failed', {
          userId,
          tipId,
          reward,
          attempts: 3,
          error: err.message,
          failedAt: new Date().toISOString(),
        });
        return;
      }
      await new Promise((r) => setTimeout(r, 250 * Math.pow(2, attempt)));
    }
  }
 }
 // ---------------------------------------------------------------------------
 // POST /api/tip/:id/feedback
 // ---------------------------------------------------------------------------
@@ -258,7 +315,7 @@ router.post('/tip/:id/feedback', requireAuth, async (req: AuthenticatedRequest,
  const tipId = String(req.params.id);
  const now = new Date();
-  const validActions = ['done', 'dismiss', 'snooze'];
+  const validActions = ['done', 'dismiss', 'snooze', 'helpful', 'not_helpful'];
  if (!validActions.includes(action)) {
    res.status(400).json({ error: 'Invalid action' });
    return;
@@ -297,25 +354,14 @@ router.post('/tip/:id/feedback', requireAuth, async (req: AuthenticatedRequest,
  bus.publish('signals.tip.feedback', {
    userId: req.userId!,
    tipId,
-    action: action as 'done' | 'dismiss' | 'snooze',
+    action: action as 'done' | 'dismiss' | 'snooze' | 'helpful' | 'not_helpful',
    reward,
    dwellMs,
    createdAt: now.toISOString(),
  });
  if (task) {
-    // Send reward to egreedy-v1 (active policy — ADR-0007)
+    sendRewardWithRetry(req.userId!, tipId, reward, task.features);
    fetch(`${config.ML_SERVING_URL}/reward/egreedy`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        user_id: req.userId!,
        tip_id: tipId,
        reward,
        features: task.features,
        day_of_week: new Date().getDay(),
      }),
    }).catch(() => {});
  }
  // Mark complete in Todoist if done
--- a/services/api/src/test/db.ts
+++ b/services/api/src/test/db.ts
@@ -41,6 +41,8 @@ export function makeTestDb() {
      tip_id TEXT NOT NULL,
      action TEXT NOT NULL,
      source_id TEXT,
      dwell_ms INTEGER,
      reward_milli INTEGER,
      created_at TEXT NOT NULL
    );
@@ -76,6 +78,60 @@ export function makeTestDb() {
      detail TEXT,
      created_at TEXT NOT NULL
    );
    CREATE TABLE IF NOT EXISTS tip_scores (
      id TEXT PRIMARY KEY,
      user_id TEXT NOT NULL REFERENCES users(id),
      tip_id TEXT NOT NULL,
      policy TEXT NOT NULL,
      ml_score INTEGER,
      features_json TEXT,
      candidate_count INTEGER,
      latency_ms INTEGER,
      served_at TEXT NOT NULL
    );
    CREATE TABLE IF NOT EXISTS saved_queries (
      id TEXT PRIMARY KEY,
      admin_id TEXT NOT NULL REFERENCES users(id),
      name TEXT NOT NULL,
      sql TEXT NOT NULL,
      created_at TEXT NOT NULL
    );
    CREATE TABLE IF NOT EXISTS sim_runs (
      id TEXT PRIMARY KEY,
      policy_a TEXT NOT NULL,
      policy_b TEXT NOT NULL,
      n_users INTEGER NOT NULL,
      n_rounds INTEGER NOT NULL,
      tasks_per_round INTEGER NOT NULL DEFAULT 8,
      use_llm INTEGER NOT NULL DEFAULT 0,
      status TEXT NOT NULL DEFAULT 'pending',
      summary_json TEXT,
      winner TEXT,
      persona_breakdown_json TEXT,
      created_at TEXT NOT NULL,
      finished_at TEXT
    );
    CREATE TABLE IF NOT EXISTS sim_events (
      id TEXT PRIMARY KEY,
      run_id TEXT NOT NULL REFERENCES sim_runs(id),
      round INTEGER NOT NULL,
      user_id TEXT NOT NULL,
      persona TEXT NOT NULL,
      policy TEXT NOT NULL,
      tip_content TEXT NOT NULL,
      priority INTEGER NOT NULL,
      is_overdue INTEGER NOT NULL,
      action TEXT NOT NULL,
      dwell_ms INTEGER,
      reward_milli INTEGER NOT NULL,
      hour INTEGER NOT NULL,
      day_of_week INTEGER NOT NULL,
      created_at TEXT NOT NULL
    );
  `);
  return drizzle(sqlite, { schema });