Restructure CLAUDE.md per official Claude Code recommendations

CLAUDE.md: 178→25 lines — commands + @ARCHITECTURE.md import only Rules split into .claude/rules/ (load at startup, topic-scoped): llm-inference.md — Bifrost-only, semaphore, model name format, timeouts agent-pipeline.md — tier rules, no tools in medium, memory outside loop fast-tools.md — extension guide (path-scoped: fast_tools.py + agent.py) secrets.md — .env keys, Vaultwarden, no hardcoding Path-scoped rule: fast-tools.md only loads when editing fast_tools.py or agent.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 07:19:09 +00:00
parent 3ed47b45da
commit 957360f6ce
5 changed files with 60 additions and 18 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -11,31 +11,15 @@ docker compose up --build
 # Interactive CLI (requires services running)
 docker compose --profile tools run --rm -it cli

-# Integration tests (run from tests/integration/, requires all services)
+# Integration tests — run from tests/integration/, require all services up
 python3 test_health.py
 python3 test_memory.py [--name-only|--bench-only|--dedup-only]
 python3 test_routing.py [--easy-only|--medium-only|--hard-only]

 # Use case tests — read the .md file and follow its steps as Claude Code
-# e.g.: read tests/use_cases/weather_now.md and execute it
+# example: read tests/use_cases/weather_now.md and execute it
 ```

-## Key Conventions
-
- **Models via Bifrost only** — all LLM calls use `base_url=BIFROST_URL` with `ollama/<model>` prefix. Never call Ollama directly for inference.
- **One inference at a time** — `_reply_semaphore` serializes GPU use. Do not bypass it.
- **No tools in medium agent** — `_DirectModel` is a plain `ainvoke()` call. Context is injected via system prompt. `qwen3:4b` is unreliable with tool schemas.
- **Fast tools are pre-flight** — `FastToolRunner` runs before routing and before any LLM call. Results are injected as context, not returned to the user directly.
- **Memory outside agent loop** — `add_memory`/`search_memory` are called directly, never passed to agent tool lists.
- **Complex tier is opt-in** — `/think ` prefix only. LLM classification of "complex" is always downgraded to medium.
- **`.env` is required** — `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTING_KEY`. Never commit it.
-
-## Adding a Fast Tool
-
-1. Subclass `FastTool` in `fast_tools.py` — implement `name`, `matches(message) → bool`, `run(message) → str`
-2. Add instance to `_fast_tool_runner` list in `agent.py`
-3. The router will automatically force medium tier when `matches()` returns true
-
 ## Architecture

@ARCHITECTURE.md