feat(clustering): LLM-enrichment before embedding for better semantic clustering #129

New Issue

alvis · 2026-05-12T14:19:16Z

alvis commented

2026-05-12 14:19:16 +00:00

Background

Taskpile ran a controlled experiment comparing raw-title embeddings vs. LLM-expanded descriptions before embedding (qwen2.5:1.5b → nomic-embed-text). Results on the exact same task corpus:

Dataset	Method	ARI	AUROC
Synthetic (8 clusters, terse titles)	RAW	0.22	0.76
Synthetic	LLM	0.77	0.91
Real Todoist tasks	RAW	-0.01	0.54
Real Todoist tasks	LLM	0.22	0.63

The gain is largest for short, noisy, context-poor titles — exactly what oO gets from Todoist.

What to port

Before embedding a task title, call LiteLLM tip-generator alias with the same prompt used in taskpile (v1): expand the title into a 3-sentence description of what the task involves, what context/skills it needs, and why it might matter.
Prefix every embed input with "clustering: " (nomic-embed-text task prefix).
Cache expansions by content_hash so each unique title is only expanded once per agent compute cycle (not persisted across calls — in-memory is fine for now).
All LLM and embed calls go through LiteLLM (tip-generator and embedder aliases).

Reference

/home/alvis/taskpile/experiments/clustering_eval/run.py — full experiment with metrics.
/home/alvis/taskpile/experiments/clustering_eval/REPORT.md — results report.

## Background Taskpile ran a controlled experiment comparing raw-title embeddings vs. LLM-expanded descriptions before embedding (`qwen2.5:1.5b` → `nomic-embed-text`). Results on the exact same task corpus: | Dataset | Method | ARI | AUROC | |---------|--------|-----|-------| | Synthetic (8 clusters, terse titles) | RAW | 0.22 | 0.76 | | Synthetic | LLM | 0.77 | 0.91 | | Real Todoist tasks | RAW | -0.01 | 0.54 | | Real Todoist tasks | LLM | 0.22 | 0.63 | The gain is largest for short, noisy, context-poor titles — exactly what oO gets from Todoist. ## What to port - Before embedding a task title, call LiteLLM `tip-generator` alias with the same prompt used in taskpile (`v1`): expand the title into a 3-sentence description of what the task involves, what context/skills it needs, and why it might matter. - Prefix every embed input with `"clustering: "` (nomic-embed-text task prefix). - Cache expansions by `content_hash` so each unique title is only expanded once per agent compute cycle (not persisted across calls — in-memory is fine for now). - All LLM and embed calls go through LiteLLM (`tip-generator` and `embedder` aliases). ## Reference `/home/alvis/taskpile/experiments/clustering_eval/run.py` — full experiment with metrics. `/home/alvis/taskpile/experiments/clustering_eval/REPORT.md` — results report.

alvis added this to the M2 — AI tips + multi-source signals milestone 2026-05-12 14:19:16 +00:00

alvis closed this issue

2026-05-12 14:22:43 +00:00

alvis referenced this issue from a commit

2026-05-12 15:10:11 +00:00

feat(clustering): LLM-enrichment before embedding (port from taskpile #129)

alvis referenced this issue from a commit

2026-05-12 15:10:11 +00:00

feat(clustering): persistent enrichment cache in task_enrichments table

Sign in to join this conversation.