QUERY_TIMEOUT=1s — classification and routing must complete within
1 second or the query is recorded as 'timeout'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests routing accuracy for all tiers with no_inference=True hardcoded.
Fast (QUERY_TIMEOUT=30s), no GPU check, shares benchmark.json dataset.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>