feat(clustering): LLM-enrichment before embedding for better semantic clustering #129

Closed
opened 2026-05-12 14:19:16 +00:00 by alvis · 0 comments
Owner

Background

Taskpile ran a controlled experiment comparing raw-title embeddings vs. LLM-expanded descriptions before embedding (qwen2.5:1.5bnomic-embed-text). Results on the exact same task corpus:

Dataset Method ARI AUROC
Synthetic (8 clusters, terse titles) RAW 0.22 0.76
Synthetic LLM 0.77 0.91
Real Todoist tasks RAW -0.01 0.54
Real Todoist tasks LLM 0.22 0.63

The gain is largest for short, noisy, context-poor titles — exactly what oO gets from Todoist.

What to port

  • Before embedding a task title, call LiteLLM tip-generator alias with the same prompt used in taskpile (v1): expand the title into a 3-sentence description of what the task involves, what context/skills it needs, and why it might matter.
  • Prefix every embed input with "clustering: " (nomic-embed-text task prefix).
  • Cache expansions by content_hash so each unique title is only expanded once per agent compute cycle (not persisted across calls — in-memory is fine for now).
  • All LLM and embed calls go through LiteLLM (tip-generator and embedder aliases).

Reference

/home/alvis/taskpile/experiments/clustering_eval/run.py — full experiment with metrics.
/home/alvis/taskpile/experiments/clustering_eval/REPORT.md — results report.

## Background Taskpile ran a controlled experiment comparing raw-title embeddings vs. LLM-expanded descriptions before embedding (`qwen2.5:1.5b` → `nomic-embed-text`). Results on the exact same task corpus: | Dataset | Method | ARI | AUROC | |---------|--------|-----|-------| | Synthetic (8 clusters, terse titles) | RAW | 0.22 | 0.76 | | Synthetic | LLM | 0.77 | 0.91 | | Real Todoist tasks | RAW | -0.01 | 0.54 | | Real Todoist tasks | LLM | 0.22 | 0.63 | The gain is largest for short, noisy, context-poor titles — exactly what oO gets from Todoist. ## What to port - Before embedding a task title, call LiteLLM `tip-generator` alias with the same prompt used in taskpile (`v1`): expand the title into a 3-sentence description of what the task involves, what context/skills it needs, and why it might matter. - Prefix every embed input with `"clustering: "` (nomic-embed-text task prefix). - Cache expansions by `content_hash` so each unique title is only expanded once per agent compute cycle (not persisted across calls — in-memory is fine for now). - All LLM and embed calls go through LiteLLM (`tip-generator` and `embedder` aliases). ## Reference `/home/alvis/taskpile/experiments/clustering_eval/run.py` — full experiment with metrics. `/home/alvis/taskpile/experiments/clustering_eval/REPORT.md` — results report.
alvis added this to the M2 — AI tips + multi-source signals milestone 2026-05-12 14:19:16 +00:00
alvis closed this issue 2026-05-12 14:22:43 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#129