Update Adolf wiki: benchmarking section, fix complex tier docs, add benchmarks/ files

2026-03-24 02:13:06 +00:00
parent 9a7c64d902
commit e3242f4ec2

@@ -19,10 +19,10 @@ Pre-flight (asyncio.gather — all parallel):
Fast tool matched? → deliver reply directly (no LLM)
↓ (if no fast tool)
Router (qwen2.5:1.5b)
Router (qwen2.5:1.5b + nomic-embed-text)
- light: simple/conversational → router answers directly (~24s)
- medium: default → qwen3:4b single call (~1020s)
- complex: /think prefix → qwen3:8b + web_search + fetch_url (~60120s)
- complex: deep research queries → remote model + web_search + fetch_url (~60120s)
channels.deliver() → Telegram / CLI SSE stream
@@ -34,11 +34,11 @@ asyncio.create_task(_store_memory()) — background
| Tier | Model | Trigger | Latency |
|------|-------|---------|---------|
| Fast | — (no LLM) | Fast tool matched (weather, commute) | ~1s |
| Light | qwen2.5:1.5b (router) | Regex or LLM classifies "light" | ~24s |
| Light | qwen2.5:1.5b (router) | Regex or embedding classifies "light" | ~24s |
| Medium | qwen3:4b | Default | ~1020s |
| Complex | qwen3:8b | `/think` prefix only | ~60120s |
| Complex | deepseek-r1 (remote via LiteLLM) | Regex pre-classifier or embedding similarity | ~60120s |
Complex tier is locked behind `/think` — LLM classification of "complex" is downgraded to medium.
Routing is automatic — no prefix needed. Complex tier triggers on Russian research keywords (`исследуй`, `изучи все`, `напиши подробный`, etc.) and embedding similarity. Force complex tier via `adolf-deep` model name in OpenAI-compatible API.
## Fast Tools
@@ -46,7 +46,7 @@ Pre-flight tools run concurrently before any LLM call. If matched, the result is
| Tool | Pattern | Source | Latency |
|------|---------|--------|---------|
| `WeatherTool` | weather/forecast/temperature/... | open-meteo.com API (Balashikha, no key) | ~200ms |
| `WeatherTool` | weather/forecast/temperature/... | SearXNG → Russian weather sites | ~1s |
| `CommuteTool` | commute/traffic/пробки/... | routecheck:8090 → Yandex Routing API | ~12s |
## Memory Pipeline
@@ -64,15 +64,31 @@ GTX 1070 (8 GB). Flush qwen3:4b before loading qwen3:8b for complex tier.
3. Fallback to medium on timeout
4. After complex reply: flush 8b, pre-warm medium + router
## Benchmarking
Routing accuracy benchmark: `benchmarks/run_benchmark.py`
120 queries across 3 tiers and 10 categories (Russian + English). Sends each query to `/message`, waits for SSE `[DONE]`, extracts `tier=` from `docker logs deepagents`, compares to expected tier.
```bash
cd ~/adolf/benchmarks
python3 run_benchmark.py # full run
python3 run_benchmark.py --tier light # light only
python3 run_benchmark.py --tier complex --dry-run # complex routing, no API cost
python3 run_benchmark.py --list-categories
```
Latest known results (open issues #7#10 in Gitea):
- Light: 11/30 (37%) — tech definition queries mis-routed to medium
- Medium: 13/50 (26%) — smart home commands mis-routed to light; many timeouts
- Complex: 0/40 (0%) — log extraction failures + pattern gaps
Dataset (`benchmark.json`) and results (`results_latest.json`) are gitignored.
## SearXNG
Port 11437. Used by `web_search` tool in complex tier.
Disabled slow/broken engines: **startpage** (3s timeout), **google news** (timeout), **qwant news/images/videos** (access denied).
Fast enabled engines: bing, duckduckgo, brave, google, yahoo (~3001000ms).
Config: `/mnt/ssd/ai/searxng/config/settings.yml`
## Compose Stack
Repo: `~/adolf/``http://localhost:3000/alvis/adolf`
@@ -91,12 +107,15 @@ Requires `~/adolf/.env`: `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTI
~/adolf/
├── docker-compose.yml Services: bifrost, deepagents, openmemory, grammy, crawl4ai, routecheck, cli
├── agent.py FastAPI gateway, run_agent_task, fast tool short-circuit, memory pipeline
├── fast_tools.py WeatherTool (open-meteo), CommuteTool (routecheck), FastToolRunner
├── router.py Router — regex + qwen2.5:1.5b classification
├── fast_tools.py WeatherTool, CommuteTool, FastToolRunner
├── router.py Router — regex pre-classifiers + nomic-embed-text 3-way cosine similarity
├── channels.py Channel registry + deliver()
├── vram_manager.py VRAMManager — flush/poll/prewarm Ollama VRAM
├── agent_factory.py _DirectModel (medium) / create_deep_agent (complex)
├── cli.py Rich Live streaming REPL
├── benchmarks/
│ ├── run_benchmark.py Routing accuracy benchmark (120 queries, 3 tiers)
│ └── run_voice_benchmark.py Voice path benchmark
├── routecheck/ Yandex Routing API proxy (port 8090)
├── openmemory/ FastMCP + mem0 MCP server (port 8765)
└── grammy/ grammY Telegram bot (port 3001)