Table of Contents

Adolf

Architecture
Three-Tier Model Routing
Fast Tools
Memory Pipeline
VRAM Management
Benchmarking
SearXNG
Compose Stack
Files

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Adolf

Autonomous personal assistant reachable via Telegram and CLI. Three-tier model routing with GPU VRAM management and long-term memory.

Architecture

Telegram / CLI
     ↕
[grammy] Node.js — port 3001      [cli] Python Rich REPL
  grammY long-poll → POST /message    POST /message + GET /stream SSE
     ↓
[deepagents] Python FastAPI — port 8000
     ↓
Pre-flight (asyncio.gather — all parallel):
  - URL fetch (Crawl4AI)
  - Memory retrieval (openmemory)
  - Fast tools (WeatherTool, CommuteTool)
     ↓
Fast tool matched? → deliver reply directly (no LLM)
     ↓ (if no fast tool)
Router (qwen2.5:1.5b + nomic-embed-text)
  - light:   simple/conversational → router answers directly (~2–4s)
  - medium:  default → qwen3:4b single call (~10–20s)
  - complex: deep research queries → remote model + web_search + fetch_url (~60–120s)
     ↓
channels.deliver() → Telegram / CLI SSE stream
     ↓
asyncio.create_task(_store_memory()) — background

Three-Tier Model Routing

Tier	Model	Trigger	Latency
Fast	— (no LLM)	Fast tool matched (weather, commute)	~1s
Light	qwen2.5:1.5b (router)	Regex or embedding classifies "light"	~2–4s
Medium	qwen3:4b	Default	~10–20s
Complex	deepseek-r1 (remote via LiteLLM)	Regex pre-classifier or embedding similarity	~60–120s

Routing is automatic — no prefix needed. Complex tier triggers on Russian research keywords (исследуй, изучи все, напиши подробный, etc.) and embedding similarity. Force complex tier via adolf-deep model name in OpenAI-compatible API.

Fast Tools

Pre-flight tools run concurrently before any LLM call. If matched, the result is delivered directly — no LLM involved.

Tool	Pattern	Source	Latency
`WeatherTool`	weather/forecast/temperature/...	SearXNG → Russian weather sites	~1s
`CommuteTool`	commute/traffic/пробки/...	routecheck:8090 → Yandex Routing API	~1–2s

Memory Pipeline

openmemory (FastMCP + mem0 + Qdrant + nomic-embed-text):

Before routing: search_memory retrieves relevant context injected into system prompt
After reply: _store_memory() runs as background task — extraction via qwen2.5:1.5b

VRAM Management

GTX 1070 (8 GB). Flush qwen3:4b before loading qwen3:8b for complex tier.

Flush medium + router (keep_alive=0)
Poll /api/ps until evicted (15s timeout)
Fallback to medium on timeout
After complex reply: flush 8b, pre-warm medium + router

Benchmarking

Routing accuracy benchmark: benchmarks/run_benchmark.py

120 queries across 3 tiers and 10 categories (Russian + English). Sends each query to /message, waits for SSE [DONE], extracts tier= from docker logs deepagents, compares to expected tier.

cd ~/adolf/benchmarks
python3 run_benchmark.py                            # full run
python3 run_benchmark.py --tier light               # light only
python3 run_benchmark.py --tier complex --dry-run   # complex routing, no API cost
python3 run_benchmark.py --list-categories

Latest known results (open issues #7–#10 in Gitea):

Light: 11/30 (37%) — tech definition queries mis-routed to medium
Medium: 13/50 (26%) — smart home commands mis-routed to light; many timeouts
Complex: 0/40 (0%) — log extraction failures + pattern gaps

Dataset (benchmark.json) and results (results_latest.json) are gitignored.

SearXNG

Port 11437. Used by web_search tool in complex tier.

Compose Stack

Repo: ~/adolf/ — http://localhost:3000/alvis/adolf

cd ~/adolf
docker compose up --build -d                        # start all services
docker compose --profile tools run --rm -it cli     # interactive CLI

Requires ~/adolf/.env: TELEGRAM_BOT_TOKEN, ROUTECHECK_TOKEN, YANDEX_ROUTING_KEY.

Files

~/adolf/
├── docker-compose.yml      Services: bifrost, deepagents, openmemory, grammy, crawl4ai, routecheck, cli
├── agent.py                FastAPI gateway, run_agent_task, fast tool short-circuit, memory pipeline
├── fast_tools.py           WeatherTool, CommuteTool, FastToolRunner
├── router.py               Router — regex pre-classifiers + nomic-embed-text 3-way cosine similarity
├── channels.py             Channel registry + deliver()
├── vram_manager.py         VRAMManager — flush/poll/prewarm Ollama VRAM
├── agent_factory.py        _DirectModel (medium) / create_deep_agent (complex)
├── cli.py                  Rich Live streaming REPL
├── benchmarks/
│   ├── run_benchmark.py    Routing accuracy benchmark (120 queries, 3 tiers)
│   └── run_voice_benchmark.py  Voice path benchmark
├── routecheck/             Yandex Routing API proxy (port 8090)
├── openmemory/             FastMCP + mem0 MCP server (port 8765)
└── grammy/                 grammY Telegram bot (port 3001)