Block a user
feat: rename --dry-run to --no-inference in run_benchmark.py
feat: add run_routing_benchmark.py — routing-only benchmark
feat: rename dry_run to no_inference for all tiers
Fix actual_tier never updated from "unknown" in run_agent_task
Benchmark: smart home commands (medium) mis-routed to light
Fix reply_text[:200] truncation breaking bench keyword matching
Benchmark: complex tier never triggered — 0% accuracy (40 queries)
Benchmark: light tier over-classified as medium (tech definition queries)
Fix routing: add Russian tech def patterns to light, strengthen medium smart home
Remove Bifrost: replace test 4 with LiteLLM health check
Remove or replace Bifrost test in test_memory.py
Fix tier logging: capture actual_tier, fix parse_run_block regex, remove reply_text truncation