Benchmark: light tier over-classified as medium (tech definition queries) #8
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Queries that should be
light(static tech definitions) are routed tomedium:Root Cause
The
_LIGHT_PATTERNSregex only matches exact greetings/acks. Everything else falls through to the semantic embedder. The embedder centroids overlap for short Russian tech definitions — they embed similarly to medium-tier queries because both are short question forms.Also
_LIGHT_PATTERNSuses^...$anchoring but the text includes trailing punctuation (?,!) which may not always match the[\s!.?]*$suffix.Fix
_LIGHT_PATTERNSto catchчто такое <term>andчто означает <term>patterns directly_LIGHT_UTTERANCESto pull the light centroid closerсколько <unit> в <unit>patternsImpact
~10 light queries misclassified as medium in latest run