oO/ml/agents/tests/test_clustering.py at 08d08ad7b0ed1d8da628a4cb038f00da9b15e579

alvis/oO

Files

alvis 08d08ad7b0 feat(clustering): LLM-enrichment before embedding (port from taskpile #129 )

Ported from taskpile experiments/clustering_eval (prompt v1, qwen2.5:1.5b).
The experiment showed ARI 0.22→0.77 and AUROC 0.76→0.91 on synthetic tasks
when embedding LLM-expanded descriptions instead of raw titles.

- Expand each task title via LiteLLM tip-generator before embedding
- Prefix with "clustering: " (nomic-embed-text task instruction prefix)
- Cache expansions in-memory by content hash within a compute cycle
- Falls back to raw title if enrichment fails; no change to fallback behaviour

Fixes #129

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-12 14:20:48 +00:00

7.8 KiB

Raw Blame History

View Raw

7.8 KiB Raw Blame History

7.8 KiB

Raw Blame History