Files

Alvis 957360f6ce Restructure CLAUDE.md per official Claude Code recommendations

CLAUDE.md: 178→25 lines — commands + @ARCHITECTURE.md import only

Rules split into .claude/rules/ (load at startup, topic-scoped):
  llm-inference.md  — Bifrost-only, semaphore, model name format, timeouts
  agent-pipeline.md — tier rules, no tools in medium, memory outside loop
  fast-tools.md     — extension guide (path-scoped: fast_tools.py + agent.py)
  secrets.md        — .env keys, Vaultwarden, no hardcoding

Path-scoped rule: fast-tools.md only loads when editing fast_tools.py or agent.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-13 07:19:09 +00:00

1.2 KiB

Raw Blame History

Agent Pipeline Rules

Tiers

Complex tier requires /think prefix. Any LLM classification of "complex" is downgraded to medium. Do not change this.
Medium is the default tier. Light is only for trivial static-knowledge queries matched by regex or LLM.
Light tier upgrade to medium is automatic when URL content is pre-fetched or a fast tool matches.

Medium agent

_DirectModel makes a single ainvoke() call with no tool schema. Do not add tools to the medium agent.
qwen3:4b behaves unreliably when a tool array is present in the request — inject context via system prompt instead.

Memory

add_memory and search_memory are called directly in run_agent_task(), outside the agent loop.
Never add memory tools to any agent's tool list.
Memory storage (_store_memory) runs as an asyncio background task after the semaphore is released.

Fast tools

FastToolRunner.run_matching() runs in the pre-flight asyncio.gather alongside URL fetch and memory retrieval.
Fast tool results are injected as a system prompt block, not returned to the user directly.
When any_matches() is true, the router forces medium tier before LLM classification.

1.2 KiB Raw Blame History

Agent Pipeline Rules

Tiers

Medium agent

Memory

Fast tools

1.2 KiB

Raw Blame History