Infrastructure: - docker-compose.yml: replace bifrost container with LiteLLM proxy (host.docker.internal:4000); complex model → deepseek-r1:free via OpenRouter; add Matrix URL env var; mount logs volume - bifrost-config.json: add auth_config + postgres config_store (archived) Routing: - router.py: full semantic 3-tier classifier rewrite — nomic-embed-text centroids for light/medium/complex; regex pre-classifiers for all tiers; Russian utterance sets expanded - agent.py: wire LiteLLM URL; add dry_run support; add Matrix channel Channels: - channels.py: add Matrix adapter (_matrix_send via mx- session prefix) Rules / docs: - agent-pipeline.md: remove /think prefix requirement; document automatic complex tier classification - llm-inference.md: update BIFROST_URL → LITELLM_URL references; add remote model note for complex tier - ARCHITECTURE.md: deleted (superseded by README.md) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.4 KiB
1.4 KiB
Agent Pipeline Rules
Tiers
- Routing is fully automatic: router classifies into light/medium/complex via 3-way embedding similarity.
- Complex tier is reached automatically for deep research queries — no prefix required.
- Medium is the default tier. Light is only for trivial static-knowledge queries matched by regex or embedding.
- Light tier upgrade to medium is automatic when URL content is pre-fetched or a fast tool matches.
tier_overrideAPI parameter still allows callers to force a specific tier (e.g.adolf-deepmodel → complex).
Medium agent
_DirectModelmakes a singleainvoke()call with no tool schema. Do not add tools to the medium agent.qwen3:4bbehaves unreliably when a tool array is present in the request — inject context via system prompt instead.
Memory
add_memoryandsearch_memoryare called directly inrun_agent_task(), outside the agent loop.- Never add memory tools to any agent's tool list.
- Memory storage (
_store_memory) runs as an asyncio background task after the semaphore is released.
Fast tools
FastToolRunner.run_matching()runs in the pre-flightasyncio.gatheralongside URL fetch and memory retrieval.- Fast tool results are injected as a system prompt block, not returned to the user directly.
- When
any_matches()is true, the router forces medium tier before LLM classification.