CLAUDE.md: 178→25 lines — commands + @ARCHITECTURE.md import only Rules split into .claude/rules/ (load at startup, topic-scoped): llm-inference.md — Bifrost-only, semaphore, model name format, timeouts agent-pipeline.md — tier rules, no tools in medium, memory outside loop fast-tools.md — extension guide (path-scoped: fast_tools.py + agent.py) secrets.md — .env keys, Vaultwarden, no hardcoding Path-scoped rule: fast-tools.md only loads when editing fast_tools.py or agent.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.2 KiB
1.2 KiB
Agent Pipeline Rules
Tiers
- Complex tier requires
/thinkprefix. Any LLM classification of "complex" is downgraded to medium. Do not change this. - Medium is the default tier. Light is only for trivial static-knowledge queries matched by regex or LLM.
- Light tier upgrade to medium is automatic when URL content is pre-fetched or a fast tool matches.
Medium agent
_DirectModelmakes a singleainvoke()call with no tool schema. Do not add tools to the medium agent.qwen3:4bbehaves unreliably when a tool array is present in the request — inject context via system prompt instead.
Memory
add_memoryandsearch_memoryare called directly inrun_agent_task(), outside the agent loop.- Never add memory tools to any agent's tool list.
- Memory storage (
_store_memory) runs as an asyncio background task after the semaphore is released.
Fast tools
FastToolRunner.run_matching()runs in the pre-flightasyncio.gatheralongside URL fetch and memory retrieval.- Fast tool results are injected as a system prompt block, not returned to the user directly.
- When
any_matches()is true, the router forces medium tier before LLM classification.