Benchmark: complex tier never triggered — 0% accuracy (40 queries) #10
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
All 40 complex-tier queries in the benchmark returned either
mediumor?(timeout). 0 complex queries were correctly routed. All complex benchmark queries useddry_run=true.Root Cause
1. dry_run log format mismatch
In
agent.py(_run_agent_pipeline), whendry_run=Trueandtier=complex, the log line is:But
extract_tier_from_logsregex istier=(\w+(?:\s*\(dry-run\))?)— it should matchcomplex (dry-run), then normalise tocomplexvia.split()[0]. However if the log line is slightly different or the tail window misses it, extraction fails.2. _COMPLEX_PATTERNS regex not matching
The complex regex requires
(?:^|\s)before the keyword. Example failures:The regex uses
re.search()so anchoring is not the issue. Likelyизучи лучши(without "все") is not in_COMPLEX_PATTERNS.Fix
изучи лучши(without "все") as a standalone complex triggertier=complexon a separate line so extraction is unambiguous[benchmark] session={id} tier={tier}for reliable grepImpact
0/40 complex queries correctly classified — the entire complex tier is effectively broken in current routing.