Adolf
Autonomous personal assistant reachable via Telegram and CLI. Three-tier model routing with GPU VRAM management and long-term memory.
Architecture
Telegram / CLI
↕
[grammy] Node.js — port 3001 [cli] Python Rich REPL
grammY long-poll → POST /message POST /message + GET /stream SSE
↓
[deepagents] Python FastAPI — port 8000
↓
Pre-flight (asyncio.gather — all parallel):
- URL fetch (Crawl4AI)
- Memory retrieval (openmemory)
- Fast tools (WeatherTool, CommuteTool)
↓
Fast tool matched? → deliver reply directly (no LLM)
↓ (if no fast tool)
Router (qwen2.5:1.5b + nomic-embed-text)
- light: simple/conversational → router answers directly (~2–4s)
- medium: default → qwen3:4b single call (~10–20s)
- complex: deep research queries → remote model + web_search + fetch_url (~60–120s)
↓
channels.deliver() → Telegram / CLI SSE stream
↓
asyncio.create_task(_store_memory()) — background
Three-Tier Model Routing
| Tier | Model | Trigger | Latency |
|---|---|---|---|
| Fast | — (no LLM) | Fast tool matched (weather, commute) | ~1s |
| Light | qwen2.5:1.5b (router) | Regex or embedding classifies "light" | ~2–4s |
| Medium | qwen3:4b | Default | ~10–20s |
| Complex | deepseek-r1 (remote via LiteLLM) | Regex pre-classifier or embedding similarity | ~60–120s |
Routing is automatic — no prefix needed. Complex tier triggers on Russian research keywords (исследуй, изучи все, напиши подробный, etc.) and embedding similarity. Force complex tier via adolf-deep model name in OpenAI-compatible API.
Fast Tools
Pre-flight tools run concurrently before any LLM call. If matched, the result is delivered directly — no LLM involved.
| Tool | Pattern | Source | Latency |
|---|---|---|---|
WeatherTool |
weather/forecast/temperature/... | SearXNG → Russian weather sites | ~1s |
CommuteTool |
commute/traffic/пробки/... | routecheck:8090 → Yandex Routing API | ~1–2s |
Memory Pipeline
openmemory (FastMCP + mem0 + Qdrant + nomic-embed-text):
- Before routing:
search_memoryretrieves relevant context injected into system prompt - After reply:
_store_memory()runs as background task — extraction viaqwen2.5:1.5b
VRAM Management
GTX 1070 (8 GB). Flush qwen3:4b before loading qwen3:8b for complex tier.
- Flush medium + router (
keep_alive=0) - Poll
/api/psuntil evicted (15s timeout) - Fallback to medium on timeout
- After complex reply: flush 8b, pre-warm medium + router
Benchmarking
Routing accuracy benchmark: benchmarks/run_benchmark.py
120 queries across 3 tiers and 10 categories (Russian + English). Sends each query to /message, waits for SSE [DONE], extracts tier= from docker logs deepagents, compares to expected tier.
cd ~/adolf/benchmarks
python3 run_benchmark.py # full run
python3 run_benchmark.py --tier light # light only
python3 run_benchmark.py --tier complex --dry-run # complex routing, no API cost
python3 run_benchmark.py --list-categories
Latest known results (open issues #7–#10 in Gitea):
- Light: 11/30 (37%) — tech definition queries mis-routed to medium
- Medium: 13/50 (26%) — smart home commands mis-routed to light; many timeouts
- Complex: 0/40 (0%) — log extraction failures + pattern gaps
Dataset (benchmark.json) and results (results_latest.json) are gitignored.
SearXNG
Port 11437. Used by web_search tool in complex tier.
Compose Stack
Repo: ~/adolf/ — http://localhost:3000/alvis/adolf
cd ~/adolf
docker compose up --build -d # start all services
docker compose --profile tools run --rm -it cli # interactive CLI
Requires ~/adolf/.env: TELEGRAM_BOT_TOKEN, ROUTECHECK_TOKEN, YANDEX_ROUTING_KEY.
Files
~/adolf/
├── docker-compose.yml Services: bifrost, deepagents, openmemory, grammy, crawl4ai, routecheck, cli
├── agent.py FastAPI gateway, run_agent_task, fast tool short-circuit, memory pipeline
├── fast_tools.py WeatherTool, CommuteTool, FastToolRunner
├── router.py Router — regex pre-classifiers + nomic-embed-text 3-way cosine similarity
├── channels.py Channel registry + deliver()
├── vram_manager.py VRAMManager — flush/poll/prewarm Ollama VRAM
├── agent_factory.py _DirectModel (medium) / create_deep_agent (complex)
├── cli.py Rich Live streaming REPL
├── benchmarks/
│ ├── run_benchmark.py Routing accuracy benchmark (120 queries, 3 tiers)
│ └── run_voice_benchmark.py Voice path benchmark
├── routecheck/ Yandex Routing API proxy (port 8090)
├── openmemory/ FastMCP + mem0 MCP server (port 8765)
└── grammy/ grammY Telegram bot (port 3001)