Update Adolf wiki: benchmarking section, fix complex tier docs, add benchmarks/ files
@@ -19,10 +19,10 @@ Pre-flight (asyncio.gather — all parallel):
|
||||
↓
|
||||
Fast tool matched? → deliver reply directly (no LLM)
|
||||
↓ (if no fast tool)
|
||||
Router (qwen2.5:1.5b)
|
||||
Router (qwen2.5:1.5b + nomic-embed-text)
|
||||
- light: simple/conversational → router answers directly (~2–4s)
|
||||
- medium: default → qwen3:4b single call (~10–20s)
|
||||
- complex: /think prefix → qwen3:8b + web_search + fetch_url (~60–120s)
|
||||
- complex: deep research queries → remote model + web_search + fetch_url (~60–120s)
|
||||
↓
|
||||
channels.deliver() → Telegram / CLI SSE stream
|
||||
↓
|
||||
@@ -34,11 +34,11 @@ asyncio.create_task(_store_memory()) — background
|
||||
| Tier | Model | Trigger | Latency |
|
||||
|------|-------|---------|---------|
|
||||
| Fast | — (no LLM) | Fast tool matched (weather, commute) | ~1s |
|
||||
| Light | qwen2.5:1.5b (router) | Regex or LLM classifies "light" | ~2–4s |
|
||||
| Light | qwen2.5:1.5b (router) | Regex or embedding classifies "light" | ~2–4s |
|
||||
| Medium | qwen3:4b | Default | ~10–20s |
|
||||
| Complex | qwen3:8b | `/think` prefix only | ~60–120s |
|
||||
| Complex | deepseek-r1 (remote via LiteLLM) | Regex pre-classifier or embedding similarity | ~60–120s |
|
||||
|
||||
Complex tier is locked behind `/think` — LLM classification of "complex" is downgraded to medium.
|
||||
Routing is automatic — no prefix needed. Complex tier triggers on Russian research keywords (`исследуй`, `изучи все`, `напиши подробный`, etc.) and embedding similarity. Force complex tier via `adolf-deep` model name in OpenAI-compatible API.
|
||||
|
||||
## Fast Tools
|
||||
|
||||
@@ -46,7 +46,7 @@ Pre-flight tools run concurrently before any LLM call. If matched, the result is
|
||||
|
||||
| Tool | Pattern | Source | Latency |
|
||||
|------|---------|--------|---------|
|
||||
| `WeatherTool` | weather/forecast/temperature/... | open-meteo.com API (Balashikha, no key) | ~200ms |
|
||||
| `WeatherTool` | weather/forecast/temperature/... | SearXNG → Russian weather sites | ~1s |
|
||||
| `CommuteTool` | commute/traffic/пробки/... | routecheck:8090 → Yandex Routing API | ~1–2s |
|
||||
|
||||
## Memory Pipeline
|
||||
@@ -64,15 +64,31 @@ GTX 1070 (8 GB). Flush qwen3:4b before loading qwen3:8b for complex tier.
|
||||
3. Fallback to medium on timeout
|
||||
4. After complex reply: flush 8b, pre-warm medium + router
|
||||
|
||||
## Benchmarking
|
||||
|
||||
Routing accuracy benchmark: `benchmarks/run_benchmark.py`
|
||||
|
||||
120 queries across 3 tiers and 10 categories (Russian + English). Sends each query to `/message`, waits for SSE `[DONE]`, extracts `tier=` from `docker logs deepagents`, compares to expected tier.
|
||||
|
||||
```bash
|
||||
cd ~/adolf/benchmarks
|
||||
python3 run_benchmark.py # full run
|
||||
python3 run_benchmark.py --tier light # light only
|
||||
python3 run_benchmark.py --tier complex --dry-run # complex routing, no API cost
|
||||
python3 run_benchmark.py --list-categories
|
||||
```
|
||||
|
||||
Latest known results (open issues #7–#10 in Gitea):
|
||||
- Light: 11/30 (37%) — tech definition queries mis-routed to medium
|
||||
- Medium: 13/50 (26%) — smart home commands mis-routed to light; many timeouts
|
||||
- Complex: 0/40 (0%) — log extraction failures + pattern gaps
|
||||
|
||||
Dataset (`benchmark.json`) and results (`results_latest.json`) are gitignored.
|
||||
|
||||
## SearXNG
|
||||
|
||||
Port 11437. Used by `web_search` tool in complex tier.
|
||||
|
||||
Disabled slow/broken engines: **startpage** (3s timeout), **google news** (timeout), **qwant news/images/videos** (access denied).
|
||||
Fast enabled engines: bing, duckduckgo, brave, google, yahoo (~300–1000ms).
|
||||
|
||||
Config: `/mnt/ssd/ai/searxng/config/settings.yml`
|
||||
|
||||
## Compose Stack
|
||||
|
||||
Repo: `~/adolf/` — `http://localhost:3000/alvis/adolf`
|
||||
@@ -91,12 +107,15 @@ Requires `~/adolf/.env`: `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTI
|
||||
~/adolf/
|
||||
├── docker-compose.yml Services: bifrost, deepagents, openmemory, grammy, crawl4ai, routecheck, cli
|
||||
├── agent.py FastAPI gateway, run_agent_task, fast tool short-circuit, memory pipeline
|
||||
├── fast_tools.py WeatherTool (open-meteo), CommuteTool (routecheck), FastToolRunner
|
||||
├── router.py Router — regex + qwen2.5:1.5b classification
|
||||
├── fast_tools.py WeatherTool, CommuteTool, FastToolRunner
|
||||
├── router.py Router — regex pre-classifiers + nomic-embed-text 3-way cosine similarity
|
||||
├── channels.py Channel registry + deliver()
|
||||
├── vram_manager.py VRAMManager — flush/poll/prewarm Ollama VRAM
|
||||
├── agent_factory.py _DirectModel (medium) / create_deep_agent (complex)
|
||||
├── cli.py Rich Live streaming REPL
|
||||
├── benchmarks/
|
||||
│ ├── run_benchmark.py Routing accuracy benchmark (120 queries, 3 tiers)
|
||||
│ └── run_voice_benchmark.py Voice path benchmark
|
||||
├── routecheck/ Yandex Routing API proxy (port 8090)
|
||||
├── openmemory/ FastMCP + mem0 MCP server (port 8765)
|
||||
└── grammy/ grammY Telegram bot (port 3001)
|
||||
Reference in New Issue
Block a user