2026-01-10 - 2026-04-10
Overview
7 Pull requests merged by 1 user
Merged
#17 feat: rename dry_run to no_inference for all tiers
Merged
#18 feat: rename --dry-run to --no-inference in run_benchmark.py
Merged
#19 feat: add run_routing_benchmark.py — routing-only benchmark
Merged
#13 Fix routing: add Russian tech def patterns to light, strengthen medium smart home
Merged
#14 Remove Bifrost: replace test 4 with LiteLLM health check
Merged
#11 Fix tier logging: capture actual_tier, fix parse_run_block regex, remove reply_text truncation
Merged
#12 Fix benchmark log extraction: first tier match, increase log tail to 300
10 Issues closed from 1 user
Closed
#9 Benchmark: smart home commands (medium) mis-routed to light
Closed
#3 Fix reply_text[:200] truncation breaking bench keyword matching
Closed
#4 Fix actual_tier never updated from "unknown" in run_agent_task
Closed
#10 Benchmark: complex tier never triggered — 0% accuracy (40 queries)
Closed
#8 Benchmark: light tier over-classified as medium (tech definition queries)
Closed
#5 Remove or replace Bifrost test in test_memory.py
Closed
#2 Verify [agent] running: log anchor still emitted in new agent.py
Closed
#6 Verify POST /chat still accepts {message, chat_id} after agent.py refactor
Closed
#1 Fix parse_run_block regex to match new log format
Closed
#7 Benchmark: ~50 queries return "?" due to tier= log extraction timeout
10 Issues created by 0 users
Opened
#2 Verify [agent] running: log anchor still emitted in new agent.py
Opened
#6 Verify POST /chat still accepts {message, chat_id} after agent.py refactor
Opened
#5 Remove or replace Bifrost test in test_memory.py
Opened
#1 Fix parse_run_block regex to match new log format
Opened
#3 Fix reply_text[:200] truncation breaking bench keyword matching
Opened
#4 Fix actual_tier never updated from "unknown" in run_agent_task
Opened
#7 Benchmark: ~50 queries return "?" due to tier= log extraction timeout
Opened
#8 Benchmark: light tier over-classified as medium (tech definition queries)
Opened
#9 Benchmark: smart home commands (medium) mis-routed to light
Opened
#10 Benchmark: complex tier never triggered — 0% accuracy (40 queries)