Tests routing accuracy for all tiers with no_inference=True hardcoded.
Fast (QUERY_TIMEOUT=30s), no GPU check, shares benchmark.json dataset.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove reversed() from extract_tier_from_logs: first match = routing decision
(dry-run complex logs tier=complex early, then overwrites with tier=medium at done)
- Increase log tail from 80→300 to handle concurrent log activity
Fixes#7, #10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>