1
unnamed
alvis edited this page 2026-03-24 02:13:06 +00:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Adolf

Autonomous personal assistant reachable via Telegram and CLI. Three-tier model routing with GPU VRAM management and long-term memory.

Architecture

Telegram / CLI
     ↕
[grammy] Node.js — port 3001      [cli] Python Rich REPL
  grammY long-poll → POST /message    POST /message + GET /stream SSE
     ↓
[deepagents] Python FastAPI — port 8000
     ↓
Pre-flight (asyncio.gather — all parallel):
  - URL fetch (Crawl4AI)
  - Memory retrieval (openmemory)
  - Fast tools (WeatherTool, CommuteTool)
     ↓
Fast tool matched? → deliver reply directly (no LLM)
     ↓ (if no fast tool)
Router (qwen2.5:1.5b + nomic-embed-text)
  - light:   simple/conversational → router answers directly (~24s)
  - medium:  default → qwen3:4b single call (~1020s)
  - complex: deep research queries → remote model + web_search + fetch_url (~60120s)
     ↓
channels.deliver() → Telegram / CLI SSE stream
     ↓
asyncio.create_task(_store_memory()) — background

Three-Tier Model Routing

Tier Model Trigger Latency
Fast — (no LLM) Fast tool matched (weather, commute) ~1s
Light qwen2.5:1.5b (router) Regex or embedding classifies "light" ~24s
Medium qwen3:4b Default ~1020s
Complex deepseek-r1 (remote via LiteLLM) Regex pre-classifier or embedding similarity ~60120s

Routing is automatic — no prefix needed. Complex tier triggers on Russian research keywords (исследуй, изучи все, напиши подробный, etc.) and embedding similarity. Force complex tier via adolf-deep model name in OpenAI-compatible API.

Fast Tools

Pre-flight tools run concurrently before any LLM call. If matched, the result is delivered directly — no LLM involved.

Tool Pattern Source Latency
WeatherTool weather/forecast/temperature/... SearXNG → Russian weather sites ~1s
CommuteTool commute/traffic/пробки/... routecheck:8090 → Yandex Routing API ~12s

Memory Pipeline

openmemory (FastMCP + mem0 + Qdrant + nomic-embed-text):

  • Before routing: search_memory retrieves relevant context injected into system prompt
  • After reply: _store_memory() runs as background task — extraction via qwen2.5:1.5b

VRAM Management

GTX 1070 (8 GB). Flush qwen3:4b before loading qwen3:8b for complex tier.

  1. Flush medium + router (keep_alive=0)
  2. Poll /api/ps until evicted (15s timeout)
  3. Fallback to medium on timeout
  4. After complex reply: flush 8b, pre-warm medium + router

Benchmarking

Routing accuracy benchmark: benchmarks/run_benchmark.py

120 queries across 3 tiers and 10 categories (Russian + English). Sends each query to /message, waits for SSE [DONE], extracts tier= from docker logs deepagents, compares to expected tier.

cd ~/adolf/benchmarks
python3 run_benchmark.py                            # full run
python3 run_benchmark.py --tier light               # light only
python3 run_benchmark.py --tier complex --dry-run   # complex routing, no API cost
python3 run_benchmark.py --list-categories

Latest known results (open issues #7#10 in Gitea):

  • Light: 11/30 (37%) — tech definition queries mis-routed to medium
  • Medium: 13/50 (26%) — smart home commands mis-routed to light; many timeouts
  • Complex: 0/40 (0%) — log extraction failures + pattern gaps

Dataset (benchmark.json) and results (results_latest.json) are gitignored.

SearXNG

Port 11437. Used by web_search tool in complex tier.

Compose Stack

Repo: ~/adolf/http://localhost:3000/alvis/adolf

cd ~/adolf
docker compose up --build -d                        # start all services
docker compose --profile tools run --rm -it cli     # interactive CLI

Requires ~/adolf/.env: TELEGRAM_BOT_TOKEN, ROUTECHECK_TOKEN, YANDEX_ROUTING_KEY.

Files

~/adolf/
├── docker-compose.yml      Services: bifrost, deepagents, openmemory, grammy, crawl4ai, routecheck, cli
├── agent.py                FastAPI gateway, run_agent_task, fast tool short-circuit, memory pipeline
├── fast_tools.py           WeatherTool, CommuteTool, FastToolRunner
├── router.py               Router — regex pre-classifiers + nomic-embed-text 3-way cosine similarity
├── channels.py             Channel registry + deliver()
├── vram_manager.py         VRAMManager — flush/poll/prewarm Ollama VRAM
├── agent_factory.py        _DirectModel (medium) / create_deep_agent (complex)
├── cli.py                  Rich Live streaming REPL
├── benchmarks/
│   ├── run_benchmark.py    Routing accuracy benchmark (120 queries, 3 tiers)
│   └── run_voice_benchmark.py  Voice path benchmark
├── routecheck/             Yandex Routing API proxy (port 8090)
├── openmemory/             FastMCP + mem0 MCP server (port 8765)
└── grammy/                 grammY Telegram bot (port 3001)