taskpile/docs/guide/architecture.md

# Architecture

## Stack

```
┌─────────────────────────────────────────┐
│  Frontend  (Next.js 14 + Tailwind CSS)   │
│  port 3003 — proxies /api → backend      │
└────────────────────┬────────────────────┘
                     │ HTTP
┌────────────────────▼────────────────────┐
│  Backend   (Rust / Axum 0.7)            │
│  port 3001 — Basic Auth                  │
│                                          │
│  ┌──────────────┐  ┌──────────────────┐ │
│  │ HTTP routes  │  │  ML worker       │ │
│  │ /api/tasks   │  │  (Tokio task)    │ │
│  │ /api/graph   │  │  Ollama client   │ │
│  │ /api/ml/…    │  └──────┬───────────┘ │
│  └──────┬───────┘         │             │
│         └────────┬────────┘             │
│            SQLite (taskpile.db)          │
│            ├── tasks                    │
│            ├── task_features            │
│            └── task_edges               │
└─────────────────────────────────────────┘
                     │ HTTP
┌────────────────────▼────────────────────┐
│  Ollama  (localhost:11434)               │
│  nomic-embed-text  — embeddings          │
│  qwen2.5:1.5b      — descriptions       │
└─────────────────────────────────────────┘
```

## Key modules

| Path | Responsibility |
|------|---------------|
| `backend/src/ml/config.rs` | Single source of truth for model IDs, prompt version, similarity threshold. Changing any field triggers automatic backfill. |
| `backend/src/ml/features.rs` | Content hash, embedding encode/decode, `mark_pending`, `compute`, `next_stale`. |
| `backend/src/ml/edges.rs` | Pairwise cosine similarity, canonical ordering, transactional edge recompute. |
| `backend/src/ml/worker.rs` | Tokio background task. Drains pending rows, retries failures after 30 s, 60 s slow-poll. |
| `backend/src/routes/graph.rs` | Pure read over `task_features` + `task_edges`. Zero Ollama calls at query time. |
| `backend/src/state.rs` | `AppState` with `SqlitePool`, `Arc<Notify>`, `Arc<MLConfig>`. `FromRef` lets read-only routes extract just the pool. |
| `frontend/src/components/GraphView.tsx` | `react-force-graph-2d` canvas with 3-phase node centering animation, 2-hop BFS filtering, edge threshold slider. |

## Data flow

```
User creates task
      │
      ▼
POST /api/tasks
  INSERT tasks
  INSERT task_features (status='pending', content_hash=sha256(pv+title))
  notify.notify_one()
      │
      ▼ (async)
ML Worker
  next_stale()  →  pick pending row
  Ollama generate_description()
  Ollama get_embedding()
  UPDATE task_features (status='ready', embedding=blob)
  DELETE task_edges WHERE source=id OR target=id
  INSERT task_edges for each pair with sim ≥ threshold
      │
      ▼ (next request)
GET /api/graph
  SELECT tasks → nodes
  SELECT task_edges → edges
  SELECT COUNT(*) WHERE status IN ('pending','failed') → pending_count
  Return JSON (zero Ollama calls)
```

## Database schema

### `tasks`
| Column | Type | Notes |
|--------|------|-------|
| id | TEXT PK | UUID v4 |
| title | TEXT | May contain `@project` and `#tag` tokens |
| description | TEXT | Optional user description |
| completed | BOOLEAN | |
| created_at | INTEGER | Unix seconds |
| project | TEXT | Parsed from title |
| tags | TEXT | Comma-separated, parsed from title |
| latent_desc | TEXT | Legacy — kept for migration, not read |

### `task_features`
| Column | Type | Notes |
|--------|------|-------|
| task_id | TEXT PK FK | → tasks.id ON DELETE CASCADE |
| content_hash | TEXT | sha256(prompt_version + title) |
| latent_desc | TEXT | AI-generated standalone description |
| embedding | BLOB | LE-encoded f32 array |
| embed_dim | INTEGER | Length of embedding |
| desc_model | TEXT | e.g. `qwen2.5:1.5b` |
| embed_model | TEXT | e.g. `nomic-embed-text` |
| prompt_version | TEXT | e.g. `v1` |
| status | TEXT | `pending` \| `ready` \| `failed` |
| error | TEXT | Last error message if failed |
| updated_at | INTEGER | Unix seconds |

### `task_edges`
| Column | Type | Notes |
|--------|------|-------|
| source | TEXT FK | Canonical: source < target |
| target | TEXT FK | → tasks.id ON DELETE CASCADE |
| weight | REAL | Cosine similarity ∈ [0, 1] |
| model_key | TEXT | `{embed_model}@{prompt_version}` |
| updated_at | INTEGER | Unix seconds |