Restructure CLAUDE.md per official Claude Code recommendations

CLAUDE.md: 178→25 lines — commands + @ARCHITECTURE.md import only

Rules split into .claude/rules/ (load at startup, topic-scoped):
  llm-inference.md  — Bifrost-only, semaphore, model name format, timeouts
  agent-pipeline.md — tier rules, no tools in medium, memory outside loop
  fast-tools.md     — extension guide (path-scoped: fast_tools.py + agent.py)
  secrets.md        — .env keys, Vaultwarden, no hardcoding

Path-scoped rule: fast-tools.md only loads when editing fast_tools.py or agent.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Alvis
2026-03-13 07:19:09 +00:00
parent 3ed47b45da
commit 957360f6ce
5 changed files with 60 additions and 18 deletions

View File

@@ -0,0 +1,24 @@
---
paths:
- "fast_tools.py"
- "agent.py"
---
# Fast Tools — Extension Guide
To add a new fast tool:
1. In `fast_tools.py`, subclass `FastTool` and implement:
- `name` (str property) — unique identifier, used in logs
- `matches(message: str) -> bool` — regex or logic; keep it cheap, runs on every message
- `run(message: str) -> str` — async fetch; return a short context block or `""` on failure; never raise
2. In `agent.py`, add an instance to the `_fast_tool_runner` list (module level, after env vars are defined).
3. The router will automatically force medium tier when `matches()` returns true — no router changes needed.
## Constraints
- `run()` must return in under 15s — it runs in the pre-flight gather that blocks routing.
- Return `""` or a `[tool error: ...]` string on failure — never raise exceptions.
- Keep returned context under ~1000 chars — larger contexts slow down `qwen3:4b` streaming significantly.
- The deepagents container has no direct external internet. Use SearXNG (`host.docker.internal:11437`) or internal services.