# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Commands ```bash # Start all services docker compose up --build # Interactive CLI (requires services running) docker compose --profile tools run --rm -it cli # Integration tests (run from tests/integration/, requires all services) python3 test_health.py python3 test_memory.py [--name-only|--bench-only|--dedup-only] python3 test_routing.py [--easy-only|--medium-only|--hard-only] # Use case tests — read the .md file and follow its steps as Claude Code # e.g.: read tests/use_cases/weather_now.md and execute it ``` ## Key Conventions - **Models via Bifrost only** — all LLM calls use `base_url=BIFROST_URL` with `ollama/` prefix. Never call Ollama directly for inference. - **One inference at a time** — `_reply_semaphore` serializes GPU use. Do not bypass it. - **No tools in medium agent** — `_DirectModel` is a plain `ainvoke()` call. Context is injected via system prompt. `qwen3:4b` is unreliable with tool schemas. - **Fast tools are pre-flight** — `FastToolRunner` runs before routing and before any LLM call. Results are injected as context, not returned to the user directly. - **Memory outside agent loop** — `add_memory`/`search_memory` are called directly, never passed to agent tool lists. - **Complex tier is opt-in** — `/think ` prefix only. LLM classification of "complex" is always downgraded to medium. - **`.env` is required** — `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTING_KEY`. Never commit it. ## Adding a Fast Tool 1. Subclass `FastTool` in `fast_tools.py` — implement `name`, `matches(message) → bool`, `run(message) → str` 2. Add instance to `_fast_tool_runner` list in `agent.py` 3. The router will automatically force medium tier when `matches()` returns true ## Architecture @ARCHITECTURE.md