Add MLOps feature store, fix UI layout, add docs and Gitea remote

Backend:
- Replace on-the-fly Ollama calls with versioned feature store (task_features, task_edges)
- Background Tokio worker drains pending rows; write path returns immediately
- MLConfig versioning: changing model IDs triggers automatic backfill via next_stale()
- AppState with FromRef; new GET /api/ml/status observability endpoint
- Idempotent mark_pending (content hash guards), retry failed rows after 30s
- Remove tracked build artifacts (backend/target/, frontend/.next/, node_modules/)

Frontend:
- TaskItem: items-center alignment (fixes checkbox/text offset), break-words for overflow
- TaskDetailPanel: fix invisible AI context (text-gray-700→text-gray-400), show all fields
- TaskDetailPanel: pending placeholder when latent_desc not yet computed, show task ID
- GraphView: surface pending_count as amber pulsing "analyzing N tasks…" hint in legend
- Fix Task.created_at type (number/Unix seconds, not string)
- Auth gate: LoginPage + sessionStorage; fix e2e tests to bypass gate in jsdom
- Fix deleteTask test assertion and '1 remaining'→'1 left' stale text

Docs:
- VitePress docs in docs/ with guide, MLOps pipeline, and API reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Alvis
2026-04-10 06:16:28 +00:00
parent 95342f852f
commit 9b77d6ea67
23998 changed files with 2593 additions and 3230377 deletions

111
docs/guide/architecture.md Normal file
View File

@@ -0,0 +1,111 @@
# Architecture
## Stack
```
┌─────────────────────────────────────────┐
│ Frontend (Next.js 14 + Tailwind CSS) │
│ port 3003 — proxies /api → backend │
└────────────────────┬────────────────────┘
│ HTTP
┌────────────────────▼────────────────────┐
│ Backend (Rust / Axum 0.7) │
│ port 3001 — Basic Auth │
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ HTTP routes │ │ ML worker │ │
│ │ /api/tasks │ │ (Tokio task) │ │
│ │ /api/graph │ │ Ollama client │ │
│ │ /api/ml/… │ └──────┬───────────┘ │
│ └──────┬───────┘ │ │
│ └────────┬────────┘ │
│ SQLite (taskpile.db) │
│ ├── tasks │
│ ├── task_features │
│ └── task_edges │
└─────────────────────────────────────────┘
│ HTTP
┌────────────────────▼────────────────────┐
│ Ollama (localhost:11434) │
│ nomic-embed-text — embeddings │
│ qwen2.5:1.5b — descriptions │
└─────────────────────────────────────────┘
```
## Key modules
| Path | Responsibility |
|------|---------------|
| `backend/src/ml/config.rs` | Single source of truth for model IDs, prompt version, similarity threshold. Changing any field triggers automatic backfill. |
| `backend/src/ml/features.rs` | Content hash, embedding encode/decode, `mark_pending`, `compute`, `next_stale`. |
| `backend/src/ml/edges.rs` | Pairwise cosine similarity, canonical ordering, transactional edge recompute. |
| `backend/src/ml/worker.rs` | Tokio background task. Drains pending rows, retries failures after 30 s, 60 s slow-poll. |
| `backend/src/routes/graph.rs` | Pure read over `task_features` + `task_edges`. Zero Ollama calls at query time. |
| `backend/src/state.rs` | `AppState` with `SqlitePool`, `Arc<Notify>`, `Arc<MLConfig>`. `FromRef` lets read-only routes extract just the pool. |
| `frontend/src/components/GraphView.tsx` | `react-force-graph-2d` canvas with 3-phase node centering animation, 2-hop BFS filtering, edge threshold slider. |
## Data flow
```
User creates task
POST /api/tasks
INSERT tasks
INSERT task_features (status='pending', content_hash=sha256(pv+title))
notify.notify_one()
▼ (async)
ML Worker
next_stale() → pick pending row
Ollama generate_description()
Ollama get_embedding()
UPDATE task_features (status='ready', embedding=blob)
DELETE task_edges WHERE source=id OR target=id
INSERT task_edges for each pair with sim ≥ threshold
▼ (next request)
GET /api/graph
SELECT tasks → nodes
SELECT task_edges → edges
SELECT COUNT(*) WHERE status IN ('pending','failed') → pending_count
Return JSON (zero Ollama calls)
```
## Database schema
### `tasks`
| Column | Type | Notes |
|--------|------|-------|
| id | TEXT PK | UUID v4 |
| title | TEXT | May contain `@project` and `#tag` tokens |
| description | TEXT | Optional user description |
| completed | BOOLEAN | |
| created_at | INTEGER | Unix seconds |
| project | TEXT | Parsed from title |
| tags | TEXT | Comma-separated, parsed from title |
| latent_desc | TEXT | Legacy — kept for migration, not read |
### `task_features`
| Column | Type | Notes |
|--------|------|-------|
| task_id | TEXT PK FK | → tasks.id ON DELETE CASCADE |
| content_hash | TEXT | sha256(prompt_version + title) |
| latent_desc | TEXT | AI-generated standalone description |
| embedding | BLOB | LE-encoded f32 array |
| embed_dim | INTEGER | Length of embedding |
| desc_model | TEXT | e.g. `qwen2.5:1.5b` |
| embed_model | TEXT | e.g. `nomic-embed-text` |
| prompt_version | TEXT | e.g. `v1` |
| status | TEXT | `pending` \| `ready` \| `failed` |
| error | TEXT | Last error message if failed |
| updated_at | INTEGER | Unix seconds |
### `task_edges`
| Column | Type | Notes |
|--------|------|-------|
| source | TEXT FK | Canonical: source < target |
| target | TEXT FK | → tasks.id ON DELETE CASCADE |
| weight | REAL | Cosine similarity ∈ [0, 1] |
| model_key | TEXT | `{embed_model}@{prompt_version}` |
| updated_at | INTEGER | Unix seconds |

View File

@@ -0,0 +1,51 @@
# Getting Started
## Prerequisites
| Tool | Version | Notes |
|------|---------|-------|
| Rust | ≥ 1.78 | `rustup update stable` |
| Node.js | ≥ 20 | For the frontend |
| Ollama | any | `ollama pull nomic-embed-text && ollama pull qwen2.5:1.5b` |
> **Port note** — Port 3000 is used by Gitea on this machine. The frontend runs on **3003**; the backend on **3001**.
## Running locally
```bash
# 1. Backend (Rust + SQLite)
cd backend
cargo run
# → Listening on http://0.0.0.0:3001
# 2. Frontend (Next.js)
cd frontend
npm install
npm run dev -- -p 3003
# → http://localhost:3003
```
The backend auto-creates `taskpile.db` and runs schema migrations on startup. It also seeds `task_features` pending rows for any existing task that doesn't have embeddings yet, then wakes the ML worker to process them.
## First login
The default credentials are `admin` / `VQ7q1CzFe3Y` (configured via `ValidateRequestHeaderLayer::basic` in `backend/src/main.rs`).
## Verifying the ML pipeline
```bash
# Check ML status (requires auth)
curl -u admin:VQ7q1CzFe3Y --noproxy '*' http://localhost:3001/api/ml/status | jq
```
You should see `pending` ticking down toward 0 as the worker processes tasks. Once `ready` matches your task count, edges will appear in the graph.
## Running tests
```bash
# Backend (Rust)
cd backend && cargo test
# Frontend (Jest)
cd frontend && npx jest
```