No GPU inference happens in this mode, so serialization is not needed. Without this, timed-out routing benchmark queries hold the semaphore and cascade-block all subsequent queries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
31 KiB
31 KiB