Add MLOps feature store, fix UI layout, add docs and Gitea remote

Backend: - Replace on-the-fly Ollama calls with versioned feature store (task_features, task_edges) - Background Tokio worker drains pending rows; write path returns immediately - MLConfig versioning: changing model IDs triggers automatic backfill via next_stale() - AppState with FromRef; new GET /api/ml/status observability endpoint - Idempotent mark_pending (content hash guards), retry failed rows after 30s - Remove tracked build artifacts (backend/target/, frontend/.next/, node_modules/) Frontend: - TaskItem: items-center alignment (fixes checkbox/text offset), break-words for overflow - TaskDetailPanel: fix invisible AI context (text-gray-700→text-gray-400), show all fields - TaskDetailPanel: pending placeholder when latent_desc not yet computed, show task ID - GraphView: surface pending_count as amber pulsing "analyzing N tasks…" hint in legend - Fix Task.created_at type (number/Unix seconds, not string) - Auth gate: LoginPage + sessionStorage; fix e2e tests to bypass gate in jsdom - Fix deleteTask test assertion and '1 remaining'→'1 left' stale text Docs: - VitePress docs in docs/ with guide, MLOps pipeline, and API reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 06:16:28 +00:00
parent 95342f852f
commit 9b77d6ea67
23998 changed files with 2593 additions and 3230377 deletions
--- a/docs/.vitepress/config.ts
+++ b/docs/.vitepress/config.ts
@@ -0,0 +1,43 @@
+import { defineConfig } from 'vitepress'
+
+export default defineConfig({
+  title: 'Taskpile',
+  description: 'Task manager with force-directed graph visualization and MLOps feature store',
+  themeConfig: {
+    logo: '/logo.svg',
+    nav: [
+      { text: 'Guide', link: '/guide/getting-started' },
+      { text: 'MLOps', link: '/mlops/overview' },
+      { text: 'API', link: '/api/reference' },
+    ],
+    sidebar: [
+      {
+        text: 'Guide',
+        items: [
+          { text: 'Getting Started', link: '/guide/getting-started' },
+          { text: 'Architecture', link: '/guide/architecture' },
+        ],
+      },
+      {
+        text: 'MLOps Pipeline',
+        items: [
+          { text: 'Overview', link: '/mlops/overview' },
+          { text: 'Feature Pipeline', link: '/mlops/pipeline' },
+        ],
+      },
+      {
+        text: 'API Reference',
+        link: '/api/reference',
+      },
+    ],
+    socialLinks: [
+      { icon: 'github', link: 'https://git.alogins.net/alvis/taskpile' },
+    ],
+    footer: {
+      message: 'Built with VitePress',
+    },
+    search: {
+      provider: 'local',
+    },
+  },
+})
--- a/docs/api/reference.md
+++ b/docs/api/reference.md
@@ -0,0 +1,107 @@
+# API Reference
+
+All endpoints are under `http://localhost:3001/api` and require HTTP Basic Auth (`admin` / configured password).
+
+## Tasks
+
+### `GET /api/tasks`
+
+Returns all tasks ordered by `created_at` ascending.
+
+**Response**
+```json
+[
+  {
+    "id": "550e8400-e29b-41d4-a716-446655440000",
+    "title": "Buy groceries @home #errand",
+    "description": null,
+    "completed": false,
+    "created_at": 1760000000,
+    "project": "home",
+    "tags": "errand",
+    "latent_desc": "This task involves…"
+  }
+]
+```
+
+---
+
+### `POST /api/tasks`
+
+Create a new task. Automatically parses `@project` and `#tag` tokens from the title. Seeds a `pending` feature row and wakes the ML worker.
+
+**Body**
+```json
+{ "title": "Deploy to prod @work #urgent", "description": "optional" }
+```
+
+**Response** — `201 Created` with the created Task object. `latent_desc` is always `null` on creation.
+
+---
+
+### `PATCH /api/tasks/:id`
+
+Update one or more fields of a task. If the title changes, the feature row is re-queued.
+
+**Body** (all fields optional)
+```json
+{ "title": "New title", "description": "Updated", "completed": true }
+```
+
+**Response** — `200 OK` with the updated Task.
+
+---
+
+### `DELETE /api/tasks/:id`
+
+Delete a task and cascade-delete its feature row and all edges it appears in.
+
+**Response** — `204 No Content`
+
+---
+
+## Graph
+
+### `GET /api/graph`
+
+Returns the task graph ready for `react-force-graph-2d`. **No inference is performed at query time.**
+
+**Response**
+```json
+{
+  "nodes": [
+    { "id": "uuid…", "label": "Buy groceries", "completed": false, "project": "home" }
+  ],
+  "edges": [
+    { "source": "uuid-a", "target": "uuid-b", "weight": 0.923 }
+  ],
+  "pending_count": 3
+}
+```
+
+`pending_count` is the number of tasks whose embeddings haven't been computed (or whose last attempt failed). The frontend uses this to show an "analyzing N tasks…" hint in the graph legend.
+
+---
+
+## ML
+
+### `GET /api/ml/status`
+
+Observability endpoint — returns current ML config and feature store counters.
+
+**Response**
+```json
+{
+  "desc_model": "qwen2.5:1.5b",
+  "embed_model": "nomic-embed-text",
+  "prompt_version": "v1",
+  "min_similarity": 0.8,
+  "pending": 2,
+  "ready": 15,
+  "failed": 0,
+  "edges": 28,
+  "last_error": null
+}
+```
+
+`last_error` is the error string from the most recently failed feature row (useful for diagnosing Ollama connectivity issues).
--- a/docs/guide/architecture.md
+++ b/docs/guide/architecture.md
@@ -0,0 +1,111 @@
+# Architecture
+
+## Stack
+
+```
+┌─────────────────────────────────────────┐
+│  Frontend  (Next.js 14 + Tailwind CSS)   │
+│  port 3003 — proxies /api → backend      │
+└────────────────────┬────────────────────┘
+                     │ HTTP
+┌────────────────────▼────────────────────┐
+│  Backend   (Rust / Axum 0.7)            │
+│  port 3001 — Basic Auth                  │
+│                                          │
+│  ┌──────────────┐  ┌──────────────────┐ │
+│  │ HTTP routes  │  │  ML worker       │ │
+│  │ /api/tasks   │  │  (Tokio task)    │ │
+│  │ /api/graph   │  │  Ollama client   │ │
+│  │ /api/ml/…    │  └──────┬───────────┘ │
+│  └──────┬───────┘         │             │
+│         └────────┬────────┘             │
+│            SQLite (taskpile.db)          │
+│            ├── tasks                    │
+│            ├── task_features            │
+│            └── task_edges               │
+└─────────────────────────────────────────┘
+                     │ HTTP
+┌────────────────────▼────────────────────┐
+│  Ollama  (localhost:11434)               │
+│  nomic-embed-text  — embeddings          │
+│  qwen2.5:1.5b      — descriptions       │
+└─────────────────────────────────────────┘
+```
+
+## Key modules
+
+| Path | Responsibility |
+|------|---------------|
+| `backend/src/ml/config.rs` | Single source of truth for model IDs, prompt version, similarity threshold. Changing any field triggers automatic backfill. |
+| `backend/src/ml/features.rs` | Content hash, embedding encode/decode, `mark_pending`, `compute`, `next_stale`. |
+| `backend/src/ml/edges.rs` | Pairwise cosine similarity, canonical ordering, transactional edge recompute. |
+| `backend/src/ml/worker.rs` | Tokio background task. Drains pending rows, retries failures after 30 s, 60 s slow-poll. |
+| `backend/src/routes/graph.rs` | Pure read over `task_features` + `task_edges`. Zero Ollama calls at query time. |
+| `backend/src/state.rs` | `AppState` with `SqlitePool`, `Arc<Notify>`, `Arc<MLConfig>`. `FromRef` lets read-only routes extract just the pool. |
+| `frontend/src/components/GraphView.tsx` | `react-force-graph-2d` canvas with 3-phase node centering animation, 2-hop BFS filtering, edge threshold slider. |
+
+## Data flow
+
+```
+User creates task
+      │
+      ▼
+POST /api/tasks
+  INSERT tasks
+  INSERT task_features (status='pending', content_hash=sha256(pv+title))
+  notify.notify_one()
+      │
+      ▼ (async)
+ML Worker
+  next_stale()  →  pick pending row
+  Ollama generate_description()
+  Ollama get_embedding()
+  UPDATE task_features (status='ready', embedding=blob)
+  DELETE task_edges WHERE source=id OR target=id
+  INSERT task_edges for each pair with sim ≥ threshold
+      │
+      ▼ (next request)
+GET /api/graph
+  SELECT tasks → nodes
+  SELECT task_edges → edges
+  SELECT COUNT(*) WHERE status IN ('pending','failed') → pending_count
+  Return JSON (zero Ollama calls)
+```
+
+## Database schema
+
+### `tasks`
+| Column | Type | Notes |
+|--------|------|-------|
+| id | TEXT PK | UUID v4 |
+| title | TEXT | May contain `@project` and `#tag` tokens |
+| description | TEXT | Optional user description |
+| completed | BOOLEAN | |
+| created_at | INTEGER | Unix seconds |
+| project | TEXT | Parsed from title |
+| tags | TEXT | Comma-separated, parsed from title |
+| latent_desc | TEXT | Legacy — kept for migration, not read |
+
+### `task_features`
+| Column | Type | Notes |
+|--------|------|-------|
+| task_id | TEXT PK FK | → tasks.id ON DELETE CASCADE |
+| content_hash | TEXT | sha256(prompt_version + title) |
+| latent_desc | TEXT | AI-generated standalone description |
+| embedding | BLOB | LE-encoded f32 array |
+| embed_dim | INTEGER | Length of embedding |
+| desc_model | TEXT | e.g. `qwen2.5:1.5b` |
+| embed_model | TEXT | e.g. `nomic-embed-text` |
+| prompt_version | TEXT | e.g. `v1` |
+| status | TEXT | `pending` \| `ready` \| `failed` |
+| error | TEXT | Last error message if failed |
+| updated_at | INTEGER | Unix seconds |
+
+### `task_edges`
+| Column | Type | Notes |
+|--------|------|-------|
+| source | TEXT FK | Canonical: source < target |
+| target | TEXT FK | → tasks.id ON DELETE CASCADE |
+| weight | REAL | Cosine similarity ∈ [0, 1] |
+| model_key | TEXT | `{embed_model}@{prompt_version}` |
+| updated_at | INTEGER | Unix seconds |
--- a/docs/guide/getting-started.md
+++ b/docs/guide/getting-started.md
@@ -0,0 +1,51 @@
+# Getting Started
+
+## Prerequisites
+
+| Tool | Version | Notes |
+|------|---------|-------|
+| Rust | ≥ 1.78 | `rustup update stable` |
+| Node.js | ≥ 20 | For the frontend |
+| Ollama | any | `ollama pull nomic-embed-text && ollama pull qwen2.5:1.5b` |
+
+> **Port note** — Port 3000 is used by Gitea on this machine. The frontend runs on **3003**; the backend on **3001**.
+
+## Running locally
+
+```bash
+# 1. Backend (Rust + SQLite)
+cd backend
+cargo run
+# → Listening on http://0.0.0.0:3001
+
+# 2. Frontend (Next.js)
+cd frontend
+npm install
+npm run dev -- -p 3003
+# → http://localhost:3003
+```
+
+The backend auto-creates `taskpile.db` and runs schema migrations on startup. It also seeds `task_features` pending rows for any existing task that doesn't have embeddings yet, then wakes the ML worker to process them.
+
+## First login
+
+The default credentials are `admin` / `VQ7q1CzFe3Y` (configured via `ValidateRequestHeaderLayer::basic` in `backend/src/main.rs`).
+
+## Verifying the ML pipeline
+
+```bash
+# Check ML status (requires auth)
+curl -u admin:VQ7q1CzFe3Y --noproxy '*' http://localhost:3001/api/ml/status | jq
+```
+
+You should see `pending` ticking down toward 0 as the worker processes tasks. Once `ready` matches your task count, edges will appear in the graph.
+
+## Running tests
+
+```bash
+# Backend (Rust)
+cd backend && cargo test
+
+# Frontend (Jest)
+cd frontend && npx jest
+```
--- a/docs/index.md
+++ b/docs/index.md
@@ -0,0 +1,32 @@
+---
+layout: home
+
+hero:
+  name: Taskpile
+  text: Task manager with intelligent graph visualization
+  tagline: Force-directed graphs powered by semantic embeddings. MLOps-grade feature store. Zero-latency writes.
+  actions:
+    - theme: brand
+      text: Get Started
+      link: /guide/getting-started
+    - theme: alt
+      text: Architecture
+      link: /guide/architecture
+
+features:
+  - icon: 🕸️
+    title: Force-directed graph
+    details: Tasks are laid out as nodes connected by semantic similarity edges. Select any node to explore its 2-hop neighborhood with smooth physics animation.
+
+  - icon: 🤖
+    title: MLOps feature store
+    details: Descriptions and embeddings are computed once, versioned by model+prompt, and served instantly. Changing a model config triggers automatic backfill — no manual intervention needed.
+
+  - icon: ⚡
+    title: Zero-latency writes
+    details: Creating or updating tasks returns immediately. Background Tokio worker drains the pending queue asynchronously. The graph always shows whatever is ready.
+
+  - icon: 🔍
+    title: Semantic edge weights
+    details: Task pairs are connected by cosine similarity of their nomic-embed-text embeddings. Higher similarity → thicker, more opaque edges. An in-graph slider adjusts the visibility threshold in real time.
+---
--- a/docs/mlops/overview.md
+++ b/docs/mlops/overview.md
@@ -0,0 +1,60 @@
+# MLOps Overview
+
+## Design principles
+
+Taskpile's ML subsystem follows three core MLOps practices applied to a small-scale Ollama setup:
+
+### 1. Decouple inference from serving
+
+The write path (`POST /tasks`, `PATCH /tasks/:id`) never calls Ollama. It only writes a `pending` row to `task_features` and wakes a `tokio::sync::Notify`. The read path (`GET /graph`) is a pure SQL query — no model calls, no blocking.
+
+**Result:** sub-millisecond graph reads regardless of Ollama availability.
+
+### 2. Versioned feature store
+
+Every feature row records which model produced it:
+
+```
+desc_model     = "qwen2.5:1.5b"
+embed_model    = "nomic-embed-text"
+prompt_version = "v1"
+content_hash   = sha256("v1" + title)
+```
+
+Changing **any** of these in `MLConfig` causes `next_stale()` to pick up those rows on the next worker tick — automatic backfill, no migration scripts.
+
+### 3. Idempotent pipelines
+
+`mark_pending` uses `INSERT … ON CONFLICT DO UPDATE` with a content-hash guard: re-editing a task title without changing its content does **not** re-queue it. The hash is derived from `prompt_version + title`, so it changes when either changes.
+
+## Observability
+
+```bash
+curl -u admin:VQ7q1CzFe3Y --noproxy '*' \
+  http://localhost:3001/api/ml/status | jq
+```
+
+```json
+{
+  "desc_model": "qwen2.5:1.5b",
+  "embed_model": "nomic-embed-text",
+  "prompt_version": "v1",
+  "min_similarity": 0.8,
+  "pending": 3,
+  "ready": 14,
+  "failed": 0,
+  "edges": 22,
+  "last_error": null
+}
+```
+
+The graph endpoint also returns `pending_count` so the frontend can display an "analyzing N tasks…" indicator in the legend without a second API call.
+
+## Failure modes
+
+| Scenario | Behavior |
+|----------|----------|
+| Ollama unreachable | Worker marks rows `failed` with current model IDs, sleeps 5 s, backs off 30 s before retry. Graph returns nodes + 0 edges + `pending_count > 0`. |
+| Model changed in config | `next_stale()` picks up all `ready` rows where stored model IDs differ. They're re-processed in background. Old edges remain until recomputed. |
+| Task deleted | `ON DELETE CASCADE` on `task_features` and `task_edges` cleans up immediately. No orphaned embeddings. |
+| Title unchanged on PATCH | `mark_pending` detects matching content hash + `ready` status → no-op. Worker not woken. |
--- a/docs/mlops/pipeline.md
+++ b/docs/mlops/pipeline.md
@@ -0,0 +1,90 @@
+# Feature Pipeline
+
+## Worker lifecycle
+
+```
+startup
+  │
+  ▼
+notify.notify_one()   ← wake immediately to drain any pending rows
+  │
+  ▼
+loop:
+  drain loop:
+    next_stale() ──► None → break
+         │
+         ▼
+    generate_description(title)
+         │ error → set_failed(current model IDs), sleep 5s, break
+         ▼
+    get_embedding(description)
+         │ error → set_failed(current model IDs), sleep 5s, break
+         ▼
+    UPDATE task_features SET status='ready', embedding=blob, …
+         │
+         ▼
+    recompute_for_task(task_id)
+      DELETE task_edges WHERE source=id OR target=id
+      load all other 'ready' embeddings
+      INSERT pairs with cosine_sim ≥ min_similarity
+  
+  tokio::select!
+    notified()         ← new task created/updated
+    sleep(60s)         ← retry failed rows
+```
+
+## Content hash and cache invalidation
+
+```
+content_hash = sha256( prompt_version || "\0" || title )
+```
+
+A `task_features` row is considered **stale** when:
+- `status = 'pending'` — explicitly queued
+- `status = 'failed'` and `updated_at < now − 30s` — retry after backoff
+- `status = 'ready'` and `desc_model ≠ config.desc_model` — model changed
+- `status = 'ready'` and `embed_model ≠ config.embed_model`
+- `status = 'ready'` and `prompt_version ≠ config.prompt_version`
+
+A stale-but-ready row serves its existing data until the worker overwrites it, so the graph never shows a "hole" during recomputation.
+
+## Changing models
+
+Edit `backend/src/ml/config.rs`:
+
+```rust
+pub fn default() -> Self {
+    Self {
+        desc_model: "qwen2.5:7b".to_string(),      // upgraded
+        embed_model: "nomic-embed-text".to_string(),
+        prompt_version: "v2".to_string(),            // bump when prompt changes
+        min_similarity: 0.75,                        // wider edges
+        ..
+    }
+}
+```
+
+On the next startup (or `notify_one()`), `next_stale()` returns every row whose stored config no longer matches. The worker re-runs them in oldest-first order.
+
+## Prompt versioning
+
+The prompt template is matched on `prompt_version` in `ml/ollama.rs::render_prompt`. Old versions remain compilable — bumping the version adds a new match arm rather than overwriting the old one, so descriptions produced by `v1` can always be reproduced.
+
+```rust
+fn render_prompt(prompt_version: &str, task_title: &str) -> String {
+    match prompt_version {
+        "v2" => format!("…new prompt…{task_title}…"),
+        _ => format!("…v1 prompt…{task_title}…"),  // "v1" and legacy
+    }
+}
+```
+
+## Embedding storage
+
+Embeddings are stored as raw little-endian f32 bytes in a `BLOB` column:
+
+```
+[f32 LE] [f32 LE] [f32 LE] … (768 floats for nomic-embed-text = 3072 bytes)
+```
+
+`encode_embedding` / `decode_embedding` in `ml/features.rs` handle the conversion. The `embed_dim` column records the dimension so readers don't have to hard-code the model's output size.
--- a/docs/package.json
+++ b/docs/package.json
@@ -0,0 +1,12 @@
+{
+  "name": "taskpile-docs",
+  "private": true,
+  "scripts": {
+    "docs:dev": "vitepress dev",
+    "docs:build": "vitepress build",
+    "docs:preview": "vitepress preview"
+  },
+  "devDependencies": {
+    "vitepress": "^1.5.0"
+  }
+}