Restructure CLAUDE.md per official Claude Code recommendations
CLAUDE.md: 178→25 lines — commands + @ARCHITECTURE.md import only Rules split into .claude/rules/ (load at startup, topic-scoped): llm-inference.md — Bifrost-only, semaphore, model name format, timeouts agent-pipeline.md — tier rules, no tools in medium, memory outside loop fast-tools.md — extension guide (path-scoped: fast_tools.py + agent.py) secrets.md — .env keys, Vaultwarden, no hardcoding Path-scoped rule: fast-tools.md only loads when editing fast_tools.py or agent.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
20
.claude/rules/agent-pipeline.md
Normal file
20
.claude/rules/agent-pipeline.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Agent Pipeline Rules
|
||||
|
||||
## Tiers
|
||||
- Complex tier requires `/think ` prefix. Any LLM classification of "complex" is downgraded to medium. Do not change this.
|
||||
- Medium is the default tier. Light is only for trivial static-knowledge queries matched by regex or LLM.
|
||||
- Light tier upgrade to medium is automatic when URL content is pre-fetched or a fast tool matches.
|
||||
|
||||
## Medium agent
|
||||
- `_DirectModel` makes a single `ainvoke()` call with no tool schema. Do not add tools to the medium agent.
|
||||
- `qwen3:4b` behaves unreliably when a tool array is present in the request — inject context via system prompt instead.
|
||||
|
||||
## Memory
|
||||
- `add_memory` and `search_memory` are called directly in `run_agent_task()`, outside the agent loop.
|
||||
- Never add memory tools to any agent's tool list.
|
||||
- Memory storage (`_store_memory`) runs as an asyncio background task after the semaphore is released.
|
||||
|
||||
## Fast tools
|
||||
- `FastToolRunner.run_matching()` runs in the pre-flight `asyncio.gather` alongside URL fetch and memory retrieval.
|
||||
- Fast tool results are injected as a system prompt block, not returned to the user directly.
|
||||
- When `any_matches()` is true, the router forces medium tier before LLM classification.
|
||||
24
.claude/rules/fast-tools.md
Normal file
24
.claude/rules/fast-tools.md
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
paths:
|
||||
- "fast_tools.py"
|
||||
- "agent.py"
|
||||
---
|
||||
|
||||
# Fast Tools — Extension Guide
|
||||
|
||||
To add a new fast tool:
|
||||
|
||||
1. In `fast_tools.py`, subclass `FastTool` and implement:
|
||||
- `name` (str property) — unique identifier, used in logs
|
||||
- `matches(message: str) -> bool` — regex or logic; keep it cheap, runs on every message
|
||||
- `run(message: str) -> str` — async fetch; return a short context block or `""` on failure; never raise
|
||||
|
||||
2. In `agent.py`, add an instance to the `_fast_tool_runner` list (module level, after env vars are defined).
|
||||
|
||||
3. The router will automatically force medium tier when `matches()` returns true — no router changes needed.
|
||||
|
||||
## Constraints
|
||||
- `run()` must return in under 15s — it runs in the pre-flight gather that blocks routing.
|
||||
- Return `""` or a `[tool error: ...]` string on failure — never raise exceptions.
|
||||
- Keep returned context under ~1000 chars — larger contexts slow down `qwen3:4b` streaming significantly.
|
||||
- The deepagents container has no direct external internet. Use SearXNG (`host.docker.internal:11437`) or internal services.
|
||||
7
.claude/rules/llm-inference.md
Normal file
7
.claude/rules/llm-inference.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# LLM Inference Rules
|
||||
|
||||
- All LLM calls must use `base_url=BIFROST_URL` with model name `ollama/<model>`. Never call Ollama directly for inference.
|
||||
- `_reply_semaphore` (asyncio.Semaphore(1)) serializes all GPU inference. Never bypass it or add a second semaphore.
|
||||
- Model names in code always use the `ollama/` prefix: `ollama/qwen3:4b`, `ollama/qwen3:8b`, `ollama/qwen2.5:1.5b`.
|
||||
- Timeout values: router=30s, medium=180s, complex=600s. Do not reduce them — GPU inference under load is slow.
|
||||
- `VRAMManager` is the only component that contacts Ollama directly (for flush/prewarm/poll). This is intentional — Bifrost cannot manage VRAM.
|
||||
7
.claude/rules/secrets.md
Normal file
7
.claude/rules/secrets.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Secrets and Environment
|
||||
|
||||
- `.env` is required at project root and must never be committed. It is in `.gitignore`.
|
||||
- Required keys: `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTING_KEY`.
|
||||
- `ROUTECHECK_TOKEN` is a shared secret between `deepagents` and `routecheck` containers — generate once with `python3 -c "import uuid; print(uuid.uuid4())"`.
|
||||
- All tokens are stored in Vaultwarden (AI collection). Fetch with `bw get password "<NAME>"` — see `~/.claude/CLAUDE.md` for the full procedure.
|
||||
- Do not hardcode tokens, URLs, or credentials anywhere in source code.
|
||||
Reference in New Issue
Block a user