feat(clustering): LLM-enrichment before embedding for better semantic clustering #129
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Background
Taskpile ran a controlled experiment comparing raw-title embeddings vs. LLM-expanded descriptions before embedding (
qwen2.5:1.5b→nomic-embed-text). Results on the exact same task corpus:The gain is largest for short, noisy, context-poor titles — exactly what oO gets from Todoist.
What to port
tip-generatoralias with the same prompt used in taskpile (v1): expand the title into a 3-sentence description of what the task involves, what context/skills it needs, and why it might matter."clustering: "(nomic-embed-text task prefix).content_hashso each unique title is only expanded once per agent compute cycle (not persisted across calls — in-memory is fine for now).tip-generatorandembedderaliases).Reference
/home/alvis/taskpile/experiments/clustering_eval/run.py— full experiment with metrics./home/alvis/taskpile/experiments/clustering_eval/REPORT.md— results report.