Files

Alvis 1f5e272600 Switch from Bifrost to LiteLLM; add Matrix channel; update rules

Infrastructure:
- docker-compose.yml: replace bifrost container with LiteLLM proxy
  (host.docker.internal:4000); complex model → deepseek-r1:free via
  OpenRouter; add Matrix URL env var; mount logs volume
- bifrost-config.json: add auth_config + postgres config_store (archived)

Routing:
- router.py: full semantic 3-tier classifier rewrite — nomic-embed-text
  centroids for light/medium/complex; regex pre-classifiers for all tiers;
  Russian utterance sets expanded
- agent.py: wire LiteLLM URL; add dry_run support; add Matrix channel

Channels:
- channels.py: add Matrix adapter (_matrix_send via mx- session prefix)

Rules / docs:
- agent-pipeline.md: remove /think prefix requirement; document automatic
  complex tier classification
- llm-inference.md: update BIFROST_URL → LITELLM_URL references; add
  remote model note for complex tier
- ARCHITECTURE.md: deleted (superseded by README.md)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-24 02:14:13 +00:00

1.4 KiB

Raw Blame History

Agent Pipeline Rules

Tiers

Routing is fully automatic: router classifies into light/medium/complex via 3-way embedding similarity.
Complex tier is reached automatically for deep research queries — no prefix required.
Medium is the default tier. Light is only for trivial static-knowledge queries matched by regex or embedding.
Light tier upgrade to medium is automatic when URL content is pre-fetched or a fast tool matches.
tier_override API parameter still allows callers to force a specific tier (e.g. adolf-deep model → complex).

Medium agent

_DirectModel makes a single ainvoke() call with no tool schema. Do not add tools to the medium agent.
qwen3:4b behaves unreliably when a tool array is present in the request — inject context via system prompt instead.

Memory

add_memory and search_memory are called directly in run_agent_task(), outside the agent loop.
Never add memory tools to any agent's tool list.
Memory storage (_store_memory) runs as an asyncio background task after the semaphore is released.

Fast tools

FastToolRunner.run_matching() runs in the pre-flight asyncio.gather alongside URL fetch and memory retrieval.
Fast tool results are injected as a system prompt block, not returned to the user directly.
When any_matches() is true, the router forces medium tier before LLM classification.

1.4 KiB Raw Blame History

Agent Pipeline Rules

Tiers

Medium agent

Memory

Fast tools

1.4 KiB

Raw Blame History