diff --git a/Adolf.md b/unnamed.md similarity index 61% rename from Adolf.md rename to unnamed.md index e718d76..0d8cbb5 100644 --- a/Adolf.md +++ b/unnamed.md @@ -19,10 +19,10 @@ Pre-flight (asyncio.gather — all parallel): ↓ Fast tool matched? → deliver reply directly (no LLM) ↓ (if no fast tool) -Router (qwen2.5:1.5b) +Router (qwen2.5:1.5b + nomic-embed-text) - light: simple/conversational → router answers directly (~2–4s) - medium: default → qwen3:4b single call (~10–20s) - - complex: /think prefix → qwen3:8b + web_search + fetch_url (~60–120s) + - complex: deep research queries → remote model + web_search + fetch_url (~60–120s) ↓ channels.deliver() → Telegram / CLI SSE stream ↓ @@ -34,11 +34,11 @@ asyncio.create_task(_store_memory()) — background | Tier | Model | Trigger | Latency | |------|-------|---------|---------| | Fast | — (no LLM) | Fast tool matched (weather, commute) | ~1s | -| Light | qwen2.5:1.5b (router) | Regex or LLM classifies "light" | ~2–4s | +| Light | qwen2.5:1.5b (router) | Regex or embedding classifies "light" | ~2–4s | | Medium | qwen3:4b | Default | ~10–20s | -| Complex | qwen3:8b | `/think` prefix only | ~60–120s | +| Complex | deepseek-r1 (remote via LiteLLM) | Regex pre-classifier or embedding similarity | ~60–120s | -Complex tier is locked behind `/think` — LLM classification of "complex" is downgraded to medium. +Routing is automatic — no prefix needed. Complex tier triggers on Russian research keywords (`исследуй`, `изучи все`, `напиши подробный`, etc.) and embedding similarity. Force complex tier via `adolf-deep` model name in OpenAI-compatible API. ## Fast Tools @@ -46,7 +46,7 @@ Pre-flight tools run concurrently before any LLM call. If matched, the result is | Tool | Pattern | Source | Latency | |------|---------|--------|---------| -| `WeatherTool` | weather/forecast/temperature/... | open-meteo.com API (Balashikha, no key) | ~200ms | +| `WeatherTool` | weather/forecast/temperature/... | SearXNG → Russian weather sites | ~1s | | `CommuteTool` | commute/traffic/пробки/... | routecheck:8090 → Yandex Routing API | ~1–2s | ## Memory Pipeline @@ -64,15 +64,31 @@ GTX 1070 (8 GB). Flush qwen3:4b before loading qwen3:8b for complex tier. 3. Fallback to medium on timeout 4. After complex reply: flush 8b, pre-warm medium + router +## Benchmarking + +Routing accuracy benchmark: `benchmarks/run_benchmark.py` + +120 queries across 3 tiers and 10 categories (Russian + English). Sends each query to `/message`, waits for SSE `[DONE]`, extracts `tier=` from `docker logs deepagents`, compares to expected tier. + +```bash +cd ~/adolf/benchmarks +python3 run_benchmark.py # full run +python3 run_benchmark.py --tier light # light only +python3 run_benchmark.py --tier complex --dry-run # complex routing, no API cost +python3 run_benchmark.py --list-categories +``` + +Latest known results (open issues #7–#10 in Gitea): +- Light: 11/30 (37%) — tech definition queries mis-routed to medium +- Medium: 13/50 (26%) — smart home commands mis-routed to light; many timeouts +- Complex: 0/40 (0%) — log extraction failures + pattern gaps + +Dataset (`benchmark.json`) and results (`results_latest.json`) are gitignored. + ## SearXNG Port 11437. Used by `web_search` tool in complex tier. -Disabled slow/broken engines: **startpage** (3s timeout), **google news** (timeout), **qwant news/images/videos** (access denied). -Fast enabled engines: bing, duckduckgo, brave, google, yahoo (~300–1000ms). - -Config: `/mnt/ssd/ai/searxng/config/settings.yml` - ## Compose Stack Repo: `~/adolf/` — `http://localhost:3000/alvis/adolf` @@ -91,13 +107,16 @@ Requires `~/adolf/.env`: `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTI ~/adolf/ ├── docker-compose.yml Services: bifrost, deepagents, openmemory, grammy, crawl4ai, routecheck, cli ├── agent.py FastAPI gateway, run_agent_task, fast tool short-circuit, memory pipeline -├── fast_tools.py WeatherTool (open-meteo), CommuteTool (routecheck), FastToolRunner -├── router.py Router — regex + qwen2.5:1.5b classification +├── fast_tools.py WeatherTool, CommuteTool, FastToolRunner +├── router.py Router — regex pre-classifiers + nomic-embed-text 3-way cosine similarity ├── channels.py Channel registry + deliver() ├── vram_manager.py VRAMManager — flush/poll/prewarm Ollama VRAM ├── agent_factory.py _DirectModel (medium) / create_deep_agent (complex) ├── cli.py Rich Live streaming REPL +├── benchmarks/ +│ ├── run_benchmark.py Routing accuracy benchmark (120 queries, 3 tiers) +│ └── run_voice_benchmark.py Voice path benchmark ├── routecheck/ Yandex Routing API proxy (port 8090) ├── openmemory/ FastMCP + mem0 MCP server (port 8765) └── grammy/ grammY Telegram bot (port 3001) -``` +``` \ No newline at end of file