Files
adolf/CLAUDE.md
Alvis eba805f787 Update docs: fast tools, routecheck service, commute tool
- Request flow: add fast_tool_runner.run_matching() to pre-flight gather
- New Fast Tools section: WeatherTool + CommuteTool table, extension guide
- New routecheck section: captcha UI, internal API, proxy requirements
- Services table: add routecheck:8090
- Files tree: add fast_tools.py, routecheck/, updated .env note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 07:10:30 +00:00

11 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

Start all services:

docker compose up --build

Interactive CLI (Docker container, requires gateway running):

docker compose --profile tools run --rm -it cli
# or with options:
docker compose --profile tools run --rm -it cli python3 cli.py --url http://deepagents:8000 --session cli-alvis

Run integration tests (from tests/integration/, require all Docker services running):

python3 test_health.py                          # service health: deepagents, bifrost, Ollama, Qdrant, SearXNG

python3 test_memory.py                          # name store/recall + memory benchmark + dedup
python3 test_memory.py --name-only              # only name store/recall pipeline
python3 test_memory.py --bench-only             # only 5-fact store + 10-question recall
python3 test_memory.py --dedup-only             # only deduplication test

python3 test_routing.py                         # all routing benchmarks (easy + medium + hard)
python3 test_routing.py --easy-only             # light-tier routing benchmark
python3 test_routing.py --medium-only           # medium-tier routing benchmark
python3 test_routing.py --hard-only             # complex-tier + VRAM flush benchmark

Shared config and helpers are in tests/integration/common.py.

Use case tests (tests/use_cases/) — markdown skill files executed by Claude Code, which acts as mock user and quality evaluator. Run by reading the .md file and following its steps with tools (Bash, WebFetch, etc.).

Architecture

Adolf is a multi-channel personal assistant. All LLM inference is routed through Bifrost, an open-source Go-based LLM gateway that adds retry logic, failover, and observability in front of Ollama.

Request flow

Channel adapter → POST /message {text, session_id, channel, user_id}
                → 202 Accepted (immediate)
                → background: run_agent_task()
                    → asyncio.gather(
                        _fetch_urls_from_message()          ← Crawl4AI, concurrent
                        _retrieve_memories()                 ← openmemory search, concurrent
                        _fast_tool_runner.run_matching()     ← FastTools (weather, commute), concurrent
                      )
                    → router.route() → tier decision (light/medium/complex)
                        fast tool match → force medium
                        if URL content fetched → upgrade light→medium
                    → invoke agent for tier via Bifrost (url_context + memories in system prompt)
                        deepagents:8000 → bifrost:8080/v1 → ollama:11436
                    → _push_stream_chunk() per token (medium streaming) / full reply (light, complex)
                        → _stream_queues[session_id] asyncio.Queue
                    → _end_stream() sends [DONE] sentinel
                    → channels.deliver(session_id, channel, reply)
                        → channel-specific callback (Telegram POST)
                    → _store_memory() background task (openmemory)
CLI streaming    → GET /stream/{session_id}  (SSE, per-token for medium, single-chunk for others)

Bifrost integration

Bifrost (bifrost-config.json) is configured with the ollama provider pointing to the GPU Ollama instance on host port 11436. It exposes an OpenAI-compatible API at http://bifrost:8080/v1.

agent.py uses langchain_openai.ChatOpenAI with base_url=BIFROST_URL. Model names use the provider/model format that Bifrost expects: ollama/qwen3:4b, ollama/qwen3:8b, ollama/qwen2.5:1.5b. Bifrost strips the ollama/ prefix before forwarding to Ollama.

VRAMManager bypasses Bifrost and talks directly to Ollama via OLLAMA_BASE_URL (host:11436) for flush/poll/prewarm operations — Bifrost cannot manage GPU VRAM.

Three-tier routing (router.py, agent.py)

Tier Model (env var) Trigger
light qwen2.5:1.5b (DEEPAGENTS_ROUTER_MODEL) Regex pre-match or LLM classifies "light" — answered by router model directly, no agent invoked
medium qwen3:4b (DEEPAGENTS_MODEL) Default for tool-requiring queries
complex qwen3:8b (DEEPAGENTS_COMPLEX_MODEL) /think prefix only

The router does regex pre-classification first, then LLM classification. Complex tier is blocked unless the message starts with /think — any LLM classification of "complex" is downgraded to medium.

A global asyncio.Semaphore(1) (_reply_semaphore) serializes all LLM inference — one request at a time.

Thinking mode and streaming

qwen3 models produce chain-of-thought <think>...</think> tokens. Handling differs by tier:

  • Medium (qwen3:4b): streams via astream(). A state machine (in_think flag) filters <think> blocks in real time — only non-think tokens are pushed to _stream_queues and displayed to the user.
  • Complex (qwen3:8b): create_deep_agent returns a complete reply; _strip_think() filters think blocks before the reply is pushed as a single chunk.
  • Router/light (qwen2.5:1.5b): no thinking support; _strip_think() used defensively.

_strip_think() in agent.py and router.py strips any <think> blocks from non-streaming output.

VRAM management (vram_manager.py)

Hardware: GTX 1070 (8 GB). Before running the 8b model, medium models are flushed via Ollama keep_alive=0, then /api/ps is polled (15s timeout) to confirm eviction. On timeout, falls back to medium tier. After complex reply, 8b is flushed and medium models are pre-warmed as a background task.

Channel adapters (channels.py)

  • Telegram: Grammy Node.js bot (grammy/bot.mjs) long-polls Telegram → POST /message; replies delivered via POST grammy:3001/send
  • CLI: cli.py (Docker container, profiles: [tools]) posts to /message, then streams from GET /stream/{session_id} SSE with Rich Live display and final Markdown render.

Session IDs: tg-<chat_id> for Telegram, cli-<username> for CLI. Conversation history: 5-turn buffer per session.

Services (docker-compose.yml)

Service Port Role
bifrost 8080 LLM gateway — retries, failover, observability; config from bifrost-config.json
deepagents 8000 FastAPI gateway + agent core
openmemory 8765 FastMCP server + mem0 memory tools (Qdrant-backed)
grammy 3001 grammY Telegram bot + /send HTTP endpoint
crawl4ai 11235 JS-rendered page fetching
routecheck 8090 Local routing web service — image captcha UI + Yandex Routing API backend
cli Interactive CLI container (profiles: [tools]), Rich streaming display

External (from openai/ stack, host ports):

  • Ollama GPU: 11436 — all reply inference (via Bifrost) + VRAM management (direct)
  • Ollama CPU: 11435 — nomic-embed-text embeddings for openmemory
  • Qdrant: 6333 — vector store for memories
  • SearXNG: 11437 — web search

Bifrost config (bifrost-config.json)

The file is mounted into the bifrost container at /app/data/config.json. It declares one Ollama provider key pointing to host.docker.internal:11436 with 2 retries and 300s timeout. To add fallback providers or adjust weights, edit this file and restart the bifrost container.

Crawl4AI integration

Crawl4AI is embedded at all levels of the pipeline:

  • Pre-routing (all tiers): _fetch_urls_from_message() detects URLs in any message via _URL_RE, fetches up to 3 URLs concurrently with _crawl4ai_fetch_async() (async httpx). URL content is injected as a system context block into enriched history before routing, and into the system prompt for medium/complex agents.
  • Tier upgrade: if URL content is successfully fetched, light tier is upgraded to medium (light model cannot process page content).
  • Complex agent tools: web_search (SearXNG + Crawl4AI auto-fetch of top 2 results) and fetch_url (single-URL Crawl4AI fetch) remain available for the complex agent's agentic loop. Complex tier also receives the pre-fetched content in system prompt to avoid redundant re-fetching.

MCP tools from openmemory (add_memory, search_memory, get_all_memories) are excluded from agent tools — memory management is handled outside the agent loop.

Fast Tools (fast_tools.py)

Pre-flight tools that run before the LLM in the asyncio.gather alongside URL fetch and memory retrieval. Each tool has a regex matches() classifier and an async run() that returns a context string injected into the system prompt. The router uses FastToolRunner.any_matches() to force medium tier when a tool matches.

Tool Trigger Data source
WeatherTool weather/forecast/temperature keywords SearXNG query "погода Балашиха сейчас" — Russian sources return °C
CommuteTool commute/traffic/arrival time keywords routecheck:8090/api/route — Yandex Routing API, Balashikha→Moscow center

To add a new fast tool: subclass FastTool in fast_tools.py, add an instance to _fast_tool_runner in agent.py.

routecheck service (routecheck/app.py)

Local web service that exposes Yandex Routing API behind an image captcha. Two access paths:

  • Web UI (localhost:8090): solve PIL-generated arithmetic captcha → query any two lat/lon points
  • Internal API: GET /api/route?from=lat,lon&to=lat,lon&token=ROUTECHECK_TOKEN — bypasses captcha, used by CommuteTool

Requires .env: YANDEX_ROUTING_KEY (free tier from developer.tech.yandex.ru) and ROUTECHECK_TOKEN. The container routes Yandex API calls through the host HTTPS proxy (host.docker.internal:56928).

Medium vs Complex agent

Agent Builder Speed Use case
medium _DirectModel (single LLM call, no tools) ~3s General questions, conversation
complex create_deep_agent (deepagents) Slow — multi-step planner Deep research via /think prefix

Key files

  • agent.py — FastAPI app, lifespan wiring, run_agent_task(), Crawl4AI pre-fetch, fast tools, memory pipeline, all endpoints
  • fast_tools.pyFastTool base class, FastToolRunner, WeatherTool, CommuteTool
  • routecheck/app.py — captcha UI + /api/route Yandex proxy
  • bifrost-config.json — Bifrost provider config (Ollama GPU, retries, timeouts)
  • channels.py — channel registry and deliver() dispatcher
  • router.pyRouter class: regex + LLM classification, light-tier reply generation
  • vram_manager.pyVRAMManager: flush/poll/prewarm Ollama VRAM directly
  • agent_factory.pybuild_medium_agent (_DirectModel, single call) / build_complex_agent (create_deep_agent)
  • openmemory/server.py — FastMCP + mem0 config with custom extraction/dedup prompts
  • wiki_research.py — batch research pipeline using /message + SSE polling
  • grammy/bot.mjs — Telegram long-poll + HTTP /send endpoint