# Feature Pipeline ## Worker lifecycle ``` startup │ ▼ notify.notify_one() ← wake immediately to drain any pending rows │ ▼ loop: drain loop: next_stale() ──► None → break │ ▼ generate_description(title) │ error → set_failed(current model IDs), sleep 5s, break ▼ get_embedding(description) │ error → set_failed(current model IDs), sleep 5s, break ▼ UPDATE task_features SET status='ready', embedding=blob, … │ ▼ recompute_for_task(task_id) DELETE task_edges WHERE source=id OR target=id load all other 'ready' embeddings INSERT pairs with cosine_sim ≥ min_similarity tokio::select! notified() ← new task created/updated sleep(60s) ← retry failed rows ``` ## Content hash and cache invalidation ``` content_hash = sha256( prompt_version || "\0" || title ) ``` A `task_features` row is considered **stale** when: - `status = 'pending'` — explicitly queued - `status = 'failed'` and `updated_at < now − 30s` — retry after backoff - `status = 'ready'` and `desc_model ≠ config.desc_model` — model changed - `status = 'ready'` and `embed_model ≠ config.embed_model` - `status = 'ready'` and `prompt_version ≠ config.prompt_version` A stale-but-ready row serves its existing data until the worker overwrites it, so the graph never shows a "hole" during recomputation. ## Changing models Edit `backend/src/ml/config.rs`: ```rust pub fn default() -> Self { Self { desc_model: "qwen2.5:7b".to_string(), // upgraded embed_model: "nomic-embed-text".to_string(), prompt_version: "v2".to_string(), // bump when prompt changes min_similarity: 0.75, // wider edges .. } } ``` On the next startup (or `notify_one()`), `next_stale()` returns every row whose stored config no longer matches. The worker re-runs them in oldest-first order. ## Prompt versioning The prompt template is matched on `prompt_version` in `ml/ollama.rs::render_prompt`. Old versions remain compilable — bumping the version adds a new match arm rather than overwriting the old one, so descriptions produced by `v1` can always be reproduced. ```rust fn render_prompt(prompt_version: &str, task_title: &str) -> String { match prompt_version { "v2" => format!("…new prompt…{task_title}…"), _ => format!("…v1 prompt…{task_title}…"), // "v1" and legacy } } ``` ## Embedding storage Embeddings are stored as raw little-endian f32 bytes in a `BLOB` column: ``` [f32 LE] [f32 LE] [f32 LE] … (768 floats for nomic-embed-text = 3072 bytes) ``` `encode_embedding` / `decode_embedding` in `ml/features.rs` handle the conversion. The `embed_dim` column records the dimension so readers don't have to hard-code the model's output size.