Compare commits
21 Commits
ec45d255f0
...
fix/benchm
| Author | SHA1 | Date | |
|---|---|---|---|
| 98095679be | |||
|
|
1f5e272600 | ||
|
|
54cb940279 | ||
|
|
bd951f943f | ||
|
|
ab68bba935 | ||
|
|
3ae1cefbd4 | ||
|
|
957360f6ce | ||
|
|
3ed47b45da | ||
|
|
eba805f787 | ||
|
|
32089ed596 | ||
|
|
d2ca1926f8 | ||
|
|
af181ba7ec | ||
|
|
f5fc2e9bfb | ||
|
|
436299f7e2 | ||
|
|
8cd41940f0 | ||
|
|
b04e8a0925 | ||
|
|
edc9a96f7a | ||
|
|
a35ba83db7 | ||
|
|
021104f510 | ||
|
|
50097d6092 | ||
|
|
f9618a9bbf |
22
.claude/rules/agent-pipeline.md
Normal file
22
.claude/rules/agent-pipeline.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Agent Pipeline Rules
|
||||
|
||||
## Tiers
|
||||
- Routing is fully automatic: router classifies into light/medium/complex via 3-way embedding similarity.
|
||||
- Complex tier is reached automatically for deep research queries — no prefix required.
|
||||
- Medium is the default tier. Light is only for trivial static-knowledge queries matched by regex or embedding.
|
||||
- Light tier upgrade to medium is automatic when URL content is pre-fetched or a fast tool matches.
|
||||
- `tier_override` API parameter still allows callers to force a specific tier (e.g. `adolf-deep` model → complex).
|
||||
|
||||
## Medium agent
|
||||
- `_DirectModel` makes a single `ainvoke()` call with no tool schema. Do not add tools to the medium agent.
|
||||
- `qwen3:4b` behaves unreliably when a tool array is present in the request — inject context via system prompt instead.
|
||||
|
||||
## Memory
|
||||
- `add_memory` and `search_memory` are called directly in `run_agent_task()`, outside the agent loop.
|
||||
- Never add memory tools to any agent's tool list.
|
||||
- Memory storage (`_store_memory`) runs as an asyncio background task after the semaphore is released.
|
||||
|
||||
## Fast tools
|
||||
- `FastToolRunner.run_matching()` runs in the pre-flight `asyncio.gather` alongside URL fetch and memory retrieval.
|
||||
- Fast tool results are injected as a system prompt block, not returned to the user directly.
|
||||
- When `any_matches()` is true, the router forces medium tier before LLM classification.
|
||||
24
.claude/rules/fast-tools.md
Normal file
24
.claude/rules/fast-tools.md
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
paths:
|
||||
- "fast_tools.py"
|
||||
- "agent.py"
|
||||
---
|
||||
|
||||
# Fast Tools — Extension Guide
|
||||
|
||||
To add a new fast tool:
|
||||
|
||||
1. In `fast_tools.py`, subclass `FastTool` and implement:
|
||||
- `name` (str property) — unique identifier, used in logs
|
||||
- `matches(message: str) -> bool` — regex or logic; keep it cheap, runs on every message
|
||||
- `run(message: str) -> str` — async fetch; return a short context block or `""` on failure; never raise
|
||||
|
||||
2. In `agent.py`, add an instance to the `_fast_tool_runner` list (module level, after env vars are defined).
|
||||
|
||||
3. The router will automatically force medium tier when `matches()` returns true — no router changes needed.
|
||||
|
||||
## Constraints
|
||||
- `run()` must return in under 15s — it runs in the pre-flight gather that blocks routing.
|
||||
- Return `""` or a `[tool error: ...]` string on failure — never raise exceptions.
|
||||
- Keep returned context under ~1000 chars — larger contexts slow down `qwen3:4b` streaming significantly.
|
||||
- The deepagents container has no direct external internet. Use SearXNG (`host.docker.internal:11437`) or internal services.
|
||||
8
.claude/rules/llm-inference.md
Normal file
8
.claude/rules/llm-inference.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# LLM Inference Rules
|
||||
|
||||
- All LLM calls must use `base_url=LITELLM_URL` (points to LiteLLM at `host.docker.internal:4000/v1`). Never call Ollama directly for inference.
|
||||
- `_reply_semaphore` (asyncio.Semaphore(1)) serializes all GPU inference. Never bypass it or add a second semaphore.
|
||||
- Local Ollama models use the `ollama/` prefix: `ollama/qwen3:4b`, `ollama/qwen2.5:1.5b`. Remote models (e.g. OpenRouter) use their full LiteLLM name: `openrouter/deepseek-r1`.
|
||||
- Timeout values: router=30s, medium=180s, complex=600s. Do not reduce them.
|
||||
- `VRAMManager` is the only component that contacts Ollama directly (for flush/prewarm/poll). This is intentional — LiteLLM cannot manage VRAM.
|
||||
- Complex tier uses a remote model (`DEEPAGENTS_COMPLEX_MODEL`) — no VRAM management is needed for it.
|
||||
7
.claude/rules/secrets.md
Normal file
7
.claude/rules/secrets.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Secrets and Environment
|
||||
|
||||
- `.env` is required at project root and must never be committed. It is in `.gitignore`.
|
||||
- Required keys: `TELEGRAM_BOT_TOKEN`, `ROUTECHECK_TOKEN`, `YANDEX_ROUTING_KEY`.
|
||||
- `ROUTECHECK_TOKEN` is a shared secret between `deepagents` and `routecheck` containers — generate once with `python3 -c "import uuid; print(uuid.uuid4())"`.
|
||||
- All tokens are stored in Vaultwarden (AI collection). Fetch with `bw get password "<NAME>"` — see `~/.claude/CLAUDE.md` for the full procedure.
|
||||
- Do not hardcode tokens, URLs, or credentials anywhere in source code.
|
||||
8
.gitignore
vendored
Normal file
8
.gitignore
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
logs/*.jsonl
|
||||
adolf_tuning_data/voice_audio/
|
||||
benchmarks/benchmark.json
|
||||
benchmarks/results_latest.json
|
||||
benchmarks/voice_results*.json
|
||||
benchmarks/voice_audio/
|
||||
118
ARCHITECTURE.md
118
ARCHITECTURE.md
@@ -1,118 +0,0 @@
|
||||
# Adolf
|
||||
|
||||
Autonomous personal assistant with a multi-channel gateway. Three-tier model routing with GPU VRAM management.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ CHANNEL ADAPTERS │
|
||||
│ │
|
||||
│ [Telegram/Grammy] [CLI] [Voice — future] │
|
||||
│ ↕ ↕ ↕ │
|
||||
│ └────────────────┴────────────┘ │
|
||||
│ ↕ │
|
||||
│ ┌─────────────────────────┐ │
|
||||
│ │ GATEWAY (agent.py) │ │
|
||||
│ │ FastAPI :8000 │ │
|
||||
│ │ │ │
|
||||
│ │ POST /message │ ← all inbound │
|
||||
│ │ POST /chat (legacy) │ │
|
||||
│ │ GET /reply/{id} SSE │ ← CLI polling │
|
||||
│ │ GET /health │ │
|
||||
│ │ │ │
|
||||
│ │ channels.py registry │ │
|
||||
│ │ conversation buffers │ │
|
||||
│ └──────────┬──────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────────┐ │
|
||||
│ │ AGENT CORE │ │
|
||||
│ │ three-tier routing │ │
|
||||
│ │ VRAM management │ │
|
||||
│ └──────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ channels.deliver(session_id, channel, text)│
|
||||
│ ↓ ↓ │
|
||||
│ telegram → POST grammy/send cli → SSE queue │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Channel Adapters
|
||||
|
||||
| Channel | session_id | Inbound | Outbound |
|
||||
|---------|-----------|---------|---------|
|
||||
| Telegram | `tg-<chat_id>` | Grammy long-poll → POST /message | channels.py → POST grammy:3001/send |
|
||||
| CLI | `cli-<user>` | POST /message directly | GET /reply/{id} SSE stream |
|
||||
| Voice | `voice-<device>` | (future) | (future) |
|
||||
|
||||
## Unified Message Flow
|
||||
|
||||
```
|
||||
1. Channel adapter receives message
|
||||
2. POST /message {text, session_id, channel, user_id}
|
||||
3. 202 Accepted immediately
|
||||
4. Background: run_agent_task(message, session_id, channel)
|
||||
5. Route → run agent tier → get reply text
|
||||
6. channels.deliver(session_id, channel, reply_text)
|
||||
- always puts reply in pending_replies[session_id] queue (for SSE)
|
||||
- calls channel-specific send callback
|
||||
7. GET /reply/{session_id} SSE clients receive the reply
|
||||
```
|
||||
|
||||
## Three-Tier Model Routing
|
||||
|
||||
| Tier | Model | VRAM | Trigger | Latency |
|
||||
|------|-------|------|---------|---------|
|
||||
| Light | qwen2.5:1.5b (router answers) | ~1.2 GB | Router classifies as light | ~2–4s |
|
||||
| Medium | qwen3:4b | ~2.5 GB | Default | ~20–40s |
|
||||
| Complex | qwen3:8b | ~6.0 GB | `/think` prefix | ~60–120s |
|
||||
|
||||
**`/think` prefix**: forces complex tier, stripped before sending to agent.
|
||||
|
||||
## VRAM Management
|
||||
|
||||
GTX 1070 — 8 GB. Ollama must be restarted if CUDA init fails (model loads on CPU).
|
||||
|
||||
1. Flush explicitly before loading qwen3:8b (`keep_alive=0`)
|
||||
2. Verify eviction via `/api/ps` poll (15s timeout) before proceeding
|
||||
3. Fallback: timeout → run medium agent instead
|
||||
4. Post-complex: flush 8b, pre-warm 4b + router
|
||||
|
||||
## Session ID Convention
|
||||
|
||||
- Telegram: `tg-<chat_id>` (e.g. `tg-346967270`)
|
||||
- CLI: `cli-<username>` (e.g. `cli-alvis`)
|
||||
|
||||
Conversation history is keyed by session_id (5-turn buffer).
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
adolf/
|
||||
├── docker-compose.yml Services: deepagents, openmemory, grammy
|
||||
├── Dockerfile deepagents container (Python 3.12)
|
||||
├── agent.py FastAPI gateway + three-tier routing
|
||||
├── channels.py Channel registry + deliver() + pending_replies
|
||||
├── router.py Router class — qwen2.5:1.5b routing
|
||||
├── vram_manager.py VRAMManager — flush/prewarm/poll Ollama VRAM
|
||||
├── agent_factory.py build_medium_agent / build_complex_agent
|
||||
├── cli.py Interactive CLI REPL client
|
||||
├── wiki_research.py Batch wiki research pipeline (uses /message + SSE)
|
||||
├── .env TELEGRAM_BOT_TOKEN (not committed)
|
||||
├── openmemory/
|
||||
│ ├── server.py FastMCP + mem0 MCP tools
|
||||
│ └── Dockerfile
|
||||
└── grammy/
|
||||
├── bot.mjs grammY Telegram bot + POST /send HTTP endpoint
|
||||
├── package.json
|
||||
└── Dockerfile
|
||||
```
|
||||
|
||||
## External Services (from openai/ stack)
|
||||
|
||||
| Service | Host Port | Role |
|
||||
|---------|-----------|------|
|
||||
| Ollama GPU | 11436 | All reply inference |
|
||||
| Ollama CPU | 11435 | Memory embedding (nomic-embed-text) |
|
||||
| Qdrant | 6333 | Vector store for memories |
|
||||
| SearXNG | 11437 | Web search |
|
||||
41
CLAUDE.md
Normal file
41
CLAUDE.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
# Start all services
|
||||
docker compose up --build
|
||||
|
||||
# Interactive CLI (requires services running)
|
||||
docker compose --profile tools run --rm -it cli
|
||||
|
||||
# Integration tests — run from tests/integration/, require all services up
|
||||
python3 test_health.py
|
||||
python3 test_memory.py [--name-only|--bench-only|--dedup-only]
|
||||
python3 test_routing.py [--easy-only|--medium-only|--hard-only]
|
||||
|
||||
# Use case tests — read the .md file and follow its steps as Claude Code
|
||||
# example: read tests/use_cases/weather_now.md and execute it
|
||||
|
||||
# Routing benchmark — measures tier classification accuracy across 120 queries
|
||||
# Run from benchmarks/ — Adolf must be running. DO NOT run during active use (holds GPU).
|
||||
cd benchmarks
|
||||
python3 run_benchmark.py # full run (120 queries)
|
||||
python3 run_benchmark.py --tier light # light tier only (30 queries)
|
||||
python3 run_benchmark.py --tier medium # medium tier only (50 queries)
|
||||
python3 run_benchmark.py --tier complex --dry-run # complex tier, medium model (no API cost)
|
||||
python3 run_benchmark.py --category smart_home_control
|
||||
python3 run_benchmark.py --ids 1,2,3
|
||||
python3 run_benchmark.py --list-categories
|
||||
|
||||
# Voice benchmark
|
||||
python3 run_voice_benchmark.py
|
||||
|
||||
# benchmark.json (dataset) and results_latest.json are gitignored — not committed
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
@README.md
|
||||
@@ -2,9 +2,9 @@ FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN pip install --no-cache-dir deepagents langchain-ollama langgraph \
|
||||
RUN pip install --no-cache-dir deepagents langchain-openai langgraph \
|
||||
fastapi uvicorn langchain-mcp-adapters langchain-community httpx
|
||||
|
||||
COPY agent.py channels.py vram_manager.py router.py agent_factory.py hello_world.py .
|
||||
COPY agent.py channels.py vram_manager.py router.py agent_factory.py fast_tools.py hello_world.py ./
|
||||
|
||||
CMD ["uvicorn", "agent:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
|
||||
9
Dockerfile.cli
Normal file
9
Dockerfile.cli
Normal file
@@ -0,0 +1,9 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN pip install --no-cache-dir rich
|
||||
|
||||
COPY cli.py .
|
||||
|
||||
CMD ["python3", "cli.py"]
|
||||
208
README.md
Normal file
208
README.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# Adolf
|
||||
|
||||
Autonomous personal assistant with a multi-channel gateway. Three-tier model routing with GPU VRAM management.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ CHANNEL ADAPTERS │
|
||||
│ │
|
||||
│ [Telegram/Grammy] [CLI] [Voice — future] │
|
||||
│ ↕ ↕ ↕ │
|
||||
│ └────────────────┴────────────┘ │
|
||||
│ ↕ │
|
||||
│ ┌─────────────────────────┐ │
|
||||
│ │ GATEWAY (agent.py) │ │
|
||||
│ │ FastAPI :8000 │ │
|
||||
│ │ │ │
|
||||
│ │ POST /message │ ← all inbound │
|
||||
│ │ POST /chat (legacy) │ │
|
||||
│ │ GET /stream/{id} SSE │ ← token stream│
|
||||
│ │ GET /reply/{id} SSE │ ← legacy poll │
|
||||
│ │ GET /health │ │
|
||||
│ │ │ │
|
||||
│ │ channels.py registry │ │
|
||||
│ │ conversation buffers │ │
|
||||
│ └──────────┬──────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────────┐ │
|
||||
│ │ AGENT CORE │ │
|
||||
│ │ three-tier routing │ │
|
||||
│ │ VRAM management │ │
|
||||
│ └──────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ channels.deliver(session_id, channel, text)│
|
||||
│ ↓ ↓ │
|
||||
│ telegram → POST grammy/send cli → SSE queue │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Channel Adapters
|
||||
|
||||
| Channel | session_id | Inbound | Outbound |
|
||||
|---------|-----------|---------|---------|
|
||||
| Telegram | `tg-<chat_id>` | Grammy long-poll → POST /message | channels.py → POST grammy:3001/send |
|
||||
| CLI | `cli-<user>` | POST /message directly | GET /stream/{id} SSE — Rich Live streaming |
|
||||
| Voice | `voice-<device>` | (future) | (future) |
|
||||
|
||||
## Unified Message Flow
|
||||
|
||||
```
|
||||
1. Channel adapter receives message
|
||||
2. POST /message {text, session_id, channel, user_id}
|
||||
3. 202 Accepted immediately
|
||||
4. Background: run_agent_task(message, session_id, channel)
|
||||
5. Parallel IO (asyncio.gather):
|
||||
a. _fetch_urls_from_message() — Crawl4AI fetches any URLs in message
|
||||
b. _retrieve_memories() — openmemory semantic search for context
|
||||
c. _fast_tool_runner.run_matching() — FastTools (weather, commute) if pattern matches
|
||||
6. router.route() with enriched history (url_context + fast_context + memories)
|
||||
- fast tool match → force medium (real-time data, no point routing to light)
|
||||
- if URL content fetched and tier=light → upgrade to medium
|
||||
7. Invoke agent for tier with url_context + memories in system prompt
|
||||
8. Token streaming:
|
||||
- medium: astream() pushes per-token chunks to _stream_queues[session_id]; <think> blocks filtered in real time
|
||||
- light/complex: full reply pushed as single chunk after completion
|
||||
- _end_stream() sends [DONE] sentinel
|
||||
9. channels.deliver(session_id, channel, reply_text) — Telegram callback
|
||||
10. _store_memory() background task — stores turn in openmemory
|
||||
11. GET /stream/{session_id} SSE clients receive chunks; CLI renders with Rich Live + final Markdown
|
||||
```
|
||||
|
||||
## Tool Handling
|
||||
|
||||
Adolf uses LangChain's tool interface but only the complex agent actually invokes tools at runtime.
|
||||
|
||||
**Complex agent:** `web_search` and `fetch_url` are defined as `langchain_core.tools.Tool` objects and passed to `create_deep_agent()`. The deepagents library runs an agentic loop (LangGraph `create_react_agent` under the hood) that sends the tool schema to the model via OpenAI function-calling format and handles tool dispatch.
|
||||
|
||||
**Medium agent (default):** `_DirectModel` makes a single `model.ainvoke(messages)` call with no tool schema. Context (memories, fetched URL content) is injected via the system prompt instead. This is intentional — `qwen3:4b` behaves unreliably when a tool array is present.
|
||||
|
||||
**Memory tools (out-of-loop):** `add_memory` and `search_memory` are LangChain MCP tool objects (via `langchain_mcp_adapters`) but are excluded from both agents' tool lists. They are called directly — `await _memory_add_tool.ainvoke(...)` — outside the agent loop, before and after each turn.
|
||||
|
||||
## Three-Tier Model Routing
|
||||
|
||||
| Tier | Model | Agent | Trigger | Latency |
|
||||
|------|-------|-------|---------|---------|
|
||||
| Light | `qwen2.5:1.5b` (router answers directly) | — | Regex pre-match or 3-way embedding classifies "light" | ~2–4s |
|
||||
| Medium | `qwen3:4b` (`DEEPAGENTS_MODEL`) | `_DirectModel` — single LLM call, no tools | Default; also forced when message contains URLs | ~10–20s |
|
||||
| Complex | `deepseek/deepseek-r1:free` via LiteLLM (`DEEPAGENTS_COMPLEX_MODEL`) | `create_deep_agent` — agentic loop with tools | Auto-classified by embedding similarity | ~30–90s |
|
||||
|
||||
Routing is fully automatic via 3-way cosine similarity over pre-embedded utterance centroids (light / medium / complex). No prefix required. Use `adolf-deep` model name to force complex tier via API.
|
||||
|
||||
Complex tier is reached automatically for deep research queries — `исследуй`, `изучи все`, `напиши подробный`, etc. — via regex pre-classifier and embedding similarity. No prefix required. Use `adolf-deep` model name to force it via API.
|
||||
|
||||
## Fast Tools (`fast_tools.py`)
|
||||
|
||||
Pre-flight tools that run concurrently with URL fetch and memory retrieval before any LLM call. Each tool has two methods:
|
||||
- `matches(message) → bool` — regex classifier; also used by `Router` to force medium tier
|
||||
- `run(message) → str` — async fetch returning a context block injected into system prompt
|
||||
|
||||
`FastToolRunner` holds all tools. `any_matches()` is called by the Router at step 0a; `run_matching()` is called in the pre-flight `asyncio.gather` in `run_agent_task()`.
|
||||
|
||||
| Tool | Pattern | Source | Context returned |
|
||||
|------|---------|--------|-----------------|
|
||||
| `WeatherTool` | weather/forecast/temperature/snow/rain | SearXNG `"погода Балашиха сейчас"` | Current conditions in °C from Russian weather sites |
|
||||
| `CommuteTool` | commute/traffic/arrival/пробки | `routecheck:8090/api/route` (Yandex Routing API) | Drive time with/without traffic, Balashikha→Moscow |
|
||||
|
||||
**To add a new fast tool:** subclass `FastTool` in `fast_tools.py`, implement `name`/`matches`/`run`, add an instance to `_fast_tool_runner` in `agent.py`.
|
||||
|
||||
## routecheck Service (`routecheck/`)
|
||||
|
||||
Local web service on port 8090. Exists because Yandex Routing API free tier requires a web UI that uses the API.
|
||||
|
||||
**Web UI** (`http://localhost:8090`): PIL-generated arithmetic captcha → lat/lon form → travel time result.
|
||||
|
||||
**Internal API**: `GET /api/route?from=lat,lon&to=lat,lon&token=ROUTECHECK_TOKEN` — bypasses captcha, used by `CommuteTool`. The `ROUTECHECK_TOKEN` shared secret is set in `.env` and passed to both `routecheck` and `deepagents` containers.
|
||||
|
||||
Yandex API calls are routed through the host HTTPS proxy (`host.docker.internal:56928`) since the container has no direct external internet access.
|
||||
|
||||
**Requires** `.env`: `YANDEX_ROUTING_KEY` (free from `developer.tech.yandex.ru`) + `ROUTECHECK_TOKEN`.
|
||||
|
||||
## Crawl4AI Integration
|
||||
|
||||
Crawl4AI runs as a Docker service (`crawl4ai:11235`) providing JS-rendered, bot-bypass page fetching.
|
||||
|
||||
**Pre-routing fetch (all tiers):**
|
||||
- `_URL_RE` detects `https?://` URLs in any incoming message
|
||||
- `_crawl4ai_fetch_async()` uses `httpx.AsyncClient` to POST `{urls: [...]}` to `/crawl`
|
||||
- Up to 3 URLs fetched concurrently via `asyncio.gather`
|
||||
- Fetched content (up to 3000 chars/URL) injected as a system context block into enriched history before routing and into medium/complex system prompts
|
||||
- If fetch succeeds and router returns light → tier upgraded to medium
|
||||
|
||||
**Complex agent tools:**
|
||||
- `web_search`: SearXNG query + Crawl4AI auto-fetch of top 2 result URLs → combined snippet + page text
|
||||
- `fetch_url`: Crawl4AI single-URL fetch for any specific URL
|
||||
|
||||
## Memory Pipeline
|
||||
|
||||
openmemory runs as a FastMCP server (`openmemory:8765`) backed by mem0 + Qdrant + nomic-embed-text.
|
||||
|
||||
**Retrieval (before routing):** `_retrieve_memories()` calls `search_memory` MCP tool with the user message as query. Results (threshold ≥ 0.5) are prepended to enriched history so all tiers benefit.
|
||||
|
||||
**Storage (after reply):** `_store_memory()` runs as an asyncio background task, calling `add_memory` with `"User: ...\nAssistant: ..."`. The extraction LLM (`qwen2.5:1.5b` on GPU Ollama) pulls facts; dedup is handled by mem0's update prompt.
|
||||
|
||||
Memory tools (`add_memory`, `search_memory`, `get_all_memories`) are excluded from agent tool lists — memory management happens outside the agent loop.
|
||||
|
||||
## VRAM Management
|
||||
|
||||
GTX 1070 — 8 GB. Ollama must be restarted if CUDA init fails (model loads on CPU).
|
||||
|
||||
1. Flush explicitly before loading qwen3:8b (`keep_alive=0`)
|
||||
2. Verify eviction via `/api/ps` poll (15s timeout) before proceeding
|
||||
3. Fallback: timeout → run medium agent instead
|
||||
4. Post-complex: flush 8b, pre-warm medium + router
|
||||
|
||||
## Session ID Convention
|
||||
|
||||
- Telegram: `tg-<chat_id>` (e.g. `tg-346967270`)
|
||||
- CLI: `cli-<username>` (e.g. `cli-alvis`)
|
||||
|
||||
Conversation history is keyed by session_id (5-turn buffer).
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
adolf/
|
||||
├── docker-compose.yml Services: deepagents, openmemory, grammy, crawl4ai, routecheck, cli
|
||||
├── Dockerfile deepagents container (Python 3.12)
|
||||
├── Dockerfile.cli CLI container (python:3.12-slim + rich)
|
||||
├── agent.py FastAPI gateway, run_agent_task, Crawl4AI pre-fetch, fast tools, memory pipeline
|
||||
├── fast_tools.py FastTool base, FastToolRunner, WeatherTool, CommuteTool
|
||||
├── channels.py Channel registry + deliver() + pending_replies
|
||||
├── router.py Router class — regex + LLM tier classification, FastToolRunner integration
|
||||
├── vram_manager.py VRAMManager — flush/prewarm/poll Ollama VRAM
|
||||
├── agent_factory.py _DirectModel (medium) / create_deep_agent (complex)
|
||||
├── cli.py Interactive CLI REPL — Rich Live streaming + Markdown render
|
||||
├── wiki_research.py Batch wiki research pipeline (uses /message + SSE)
|
||||
├── benchmarks/
|
||||
│ ├── run_benchmark.py Routing accuracy benchmark — 120 queries across 3 tiers
|
||||
│ ├── run_voice_benchmark.py Voice path benchmark
|
||||
│ ├── benchmark.json Query dataset (gitignored)
|
||||
│ └── results_latest.json Last run results (gitignored)
|
||||
├── .env TELEGRAM_BOT_TOKEN, ROUTECHECK_TOKEN, YANDEX_ROUTING_KEY (not committed)
|
||||
├── routecheck/
|
||||
│ ├── app.py FastAPI: image captcha + /api/route Yandex proxy
|
||||
│ └── Dockerfile
|
||||
├── tests/
|
||||
│ ├── integration/ Standalone integration test scripts (common.py + test_*.py)
|
||||
│ └── use_cases/ Claude Code skill markdown files — Claude acts as user + evaluator
|
||||
├── openmemory/
|
||||
│ ├── server.py FastMCP + mem0: add_memory, search_memory, get_all_memories
|
||||
│ └── Dockerfile
|
||||
└── grammy/
|
||||
├── bot.mjs grammY Telegram bot + POST /send HTTP endpoint
|
||||
├── package.json
|
||||
└── Dockerfile
|
||||
```
|
||||
|
||||
## External Services (host ports, from openai/ stack)
|
||||
|
||||
| Service | Host Port | Role |
|
||||
|---------|-----------|------|
|
||||
| LiteLLM | 4000 | LLM proxy — all inference goes through here (`LITELLM_URL` env var) |
|
||||
| Ollama GPU | 11436 | GPU inference backend + VRAM management (direct) + memory extraction |
|
||||
| Ollama CPU | 11435 | nomic-embed-text embeddings for openmemory |
|
||||
| Langfuse | 3200 | LLM observability — traces all requests via LiteLLM callbacks |
|
||||
| Qdrant | 6333 | Vector store for memories |
|
||||
| SearXNG | 11437 | Web search (used by `web_search` tool) |
|
||||
559
agent.py
559
agent.py
@@ -1,7 +1,9 @@
|
||||
import asyncio
|
||||
import json as _json_module
|
||||
import os
|
||||
import time
|
||||
from contextlib import asynccontextmanager
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import FastAPI, BackgroundTasks, Request
|
||||
from fastapi.responses import JSONResponse, StreamingResponse
|
||||
@@ -10,7 +12,14 @@ from pydantic import BaseModel
|
||||
import re as _re
|
||||
import httpx as _httpx
|
||||
|
||||
from langchain_ollama import ChatOllama
|
||||
_URL_RE = _re.compile(r'https?://[^\s<>"\']+')
|
||||
|
||||
|
||||
def _extract_urls(text: str) -> list[str]:
|
||||
return _URL_RE.findall(text)
|
||||
|
||||
from openai import AsyncOpenAI
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_mcp_adapters.client import MultiServerMCPClient
|
||||
from langchain_community.utilities import SearxSearchWrapper
|
||||
from langchain_core.tools import Tool
|
||||
@@ -18,23 +27,120 @@ from langchain_core.tools import Tool
|
||||
from vram_manager import VRAMManager
|
||||
from router import Router
|
||||
from agent_factory import build_medium_agent, build_complex_agent
|
||||
from fast_tools import FastToolRunner, WeatherTool, CommuteTool
|
||||
import channels
|
||||
|
||||
# LiteLLM proxy — all LLM inference goes through here
|
||||
LITELLM_URL = os.getenv("LITELLM_URL", "http://host.docker.internal:4000/v1")
|
||||
LITELLM_API_KEY = os.getenv("LITELLM_API_KEY", "dummy")
|
||||
# Direct Ollama URL — used only by VRAMManager for flush/prewarm/poll
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
|
||||
ROUTER_MODEL = os.getenv("DEEPAGENTS_ROUTER_MODEL", "qwen2.5:0.5b")
|
||||
|
||||
ROUTER_MODEL = os.getenv("DEEPAGENTS_ROUTER_MODEL", "qwen2.5:1.5b")
|
||||
MEDIUM_MODEL = os.getenv("DEEPAGENTS_MODEL", "qwen3:4b")
|
||||
COMPLEX_MODEL = os.getenv("DEEPAGENTS_COMPLEX_MODEL", "qwen3:8b")
|
||||
SEARXNG_URL = os.getenv("SEARXNG_URL", "http://host.docker.internal:11437")
|
||||
OPENMEMORY_URL = os.getenv("OPENMEMORY_URL", "http://openmemory:8765")
|
||||
CRAWL4AI_URL = os.getenv("CRAWL4AI_URL", "http://crawl4ai:11235")
|
||||
ROUTECHECK_URL = os.getenv("ROUTECHECK_URL", "http://routecheck:8090")
|
||||
ROUTECHECK_TOKEN = os.getenv("ROUTECHECK_TOKEN", "")
|
||||
|
||||
MAX_HISTORY_TURNS = 5
|
||||
_conversation_buffers: dict[str, list] = {}
|
||||
|
||||
# ── Interaction logging (RLHF data collection) ─────────────────────────────────
|
||||
_LOG_DIR = Path(os.getenv("ADOLF_LOG_DIR", "/app/logs"))
|
||||
_INTERACTIONS_LOG = _LOG_DIR / "interactions.jsonl"
|
||||
|
||||
def _ensure_log_dir() -> None:
|
||||
try:
|
||||
_LOG_DIR.mkdir(parents=True, exist_ok=True)
|
||||
except Exception as e:
|
||||
print(f"[log] cannot create log dir {_LOG_DIR}: {e}", flush=True)
|
||||
|
||||
|
||||
async def _log_interaction(
|
||||
session_id: str,
|
||||
channel: str,
|
||||
tier: str,
|
||||
input_text: str,
|
||||
response_text: str | None,
|
||||
latency_ms: int,
|
||||
metadata: dict | None = None,
|
||||
) -> None:
|
||||
"""Append one interaction record to the JSONL log for future RLHF/finetuning."""
|
||||
record = {
|
||||
"ts": time.time(),
|
||||
"session_id": session_id,
|
||||
"channel": channel,
|
||||
"tier": tier,
|
||||
"input": input_text,
|
||||
"output": response_text or "",
|
||||
"latency_ms": latency_ms,
|
||||
}
|
||||
if metadata:
|
||||
record["metadata"] = metadata
|
||||
try:
|
||||
_ensure_log_dir()
|
||||
with open(_INTERACTIONS_LOG, "a", encoding="utf-8") as f:
|
||||
f.write(_json_module.dumps(record, ensure_ascii=False) + "\n")
|
||||
except Exception as e:
|
||||
print(f"[log] write error: {e}", flush=True)
|
||||
|
||||
# Per-session streaming queues — filled during inference, read by /stream/{session_id}
|
||||
_stream_queues: dict[str, asyncio.Queue] = {}
|
||||
|
||||
|
||||
async def _push_stream_chunk(session_id: str, chunk: str) -> None:
|
||||
q = _stream_queues.setdefault(session_id, asyncio.Queue())
|
||||
await q.put(chunk)
|
||||
|
||||
|
||||
async def _end_stream(session_id: str) -> None:
|
||||
q = _stream_queues.setdefault(session_id, asyncio.Queue())
|
||||
await q.put("[DONE]")
|
||||
|
||||
|
||||
async def _crawl4ai_fetch_async(url: str) -> str:
|
||||
"""Async fetch via Crawl4AI — JS-rendered, bot-bypass, returns clean markdown."""
|
||||
try:
|
||||
async with _httpx.AsyncClient(timeout=60) as client:
|
||||
r = await client.post(f"{CRAWL4AI_URL}/crawl", json={"urls": [url]})
|
||||
r.raise_for_status()
|
||||
results = r.json().get("results", [])
|
||||
if not results or not results[0].get("success"):
|
||||
return ""
|
||||
md_obj = results[0].get("markdown") or {}
|
||||
md = md_obj.get("raw_markdown") if isinstance(md_obj, dict) else str(md_obj)
|
||||
return (md or "")[:5000]
|
||||
except Exception as e:
|
||||
return f"[fetch error: {e}]"
|
||||
|
||||
|
||||
async def _fetch_urls_from_message(message: str) -> str:
|
||||
"""If message contains URLs, fetch their content concurrently via Crawl4AI.
|
||||
Returns a formatted context block, or '' if no URLs or all fetches fail."""
|
||||
urls = _extract_urls(message)
|
||||
if not urls:
|
||||
return ""
|
||||
# Fetch up to 3 URLs concurrently
|
||||
results = await asyncio.gather(*[_crawl4ai_fetch_async(u) for u in urls[:3]])
|
||||
parts = []
|
||||
for url, content in zip(urls[:3], results):
|
||||
if content and not content.startswith("[fetch error"):
|
||||
parts.append(f"### {url}\n{content[:3000]}")
|
||||
if not parts:
|
||||
return ""
|
||||
return "User's message contains URLs. Fetched content:\n\n" + "\n\n".join(parts)
|
||||
|
||||
|
||||
|
||||
# /no_think at the start of the system prompt disables qwen3 chain-of-thought.
|
||||
# create_deep_agent prepends our system_prompt before BASE_AGENT_PROMPT, so
|
||||
# /no_think lands at position 0 and is respected by qwen3 models via Ollama.
|
||||
MEDIUM_SYSTEM_PROMPT = (
|
||||
"You are a helpful AI assistant. "
|
||||
"Use web_search for questions about current events or facts you don't know. "
|
||||
"Reply concisely."
|
||||
"You are a helpful AI assistant. Reply concisely. "
|
||||
"If asked to remember a fact or name, simply confirm: 'Got it, I'll remember that.'"
|
||||
)
|
||||
|
||||
COMPLEX_SYSTEM_PROMPT = (
|
||||
@@ -49,11 +155,20 @@ COMPLEX_SYSTEM_PROMPT = (
|
||||
"NEVER invent URLs. End with: **Sources checked: N**"
|
||||
)
|
||||
|
||||
medium_model = None
|
||||
medium_agent = None
|
||||
complex_agent = None
|
||||
router: Router = None
|
||||
vram_manager: VRAMManager = None
|
||||
mcp_client = None
|
||||
_memory_add_tool = None
|
||||
_memory_search_tool = None
|
||||
|
||||
# Fast tools run before the LLM — classifier + context enricher
|
||||
_fast_tool_runner = FastToolRunner([
|
||||
WeatherTool(),
|
||||
CommuteTool(routecheck_url=ROUTECHECK_URL, internal_token=ROUTECHECK_TOKEN),
|
||||
])
|
||||
|
||||
# GPU mutex: one LLM inference at a time
|
||||
_reply_semaphore = asyncio.Semaphore(1)
|
||||
@@ -61,25 +176,37 @@ _reply_semaphore = asyncio.Semaphore(1)
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
global medium_agent, complex_agent, router, vram_manager, mcp_client
|
||||
global medium_model, medium_agent, complex_agent, router, vram_manager, mcp_client, \
|
||||
_memory_add_tool, _memory_search_tool
|
||||
|
||||
# Register channel adapters
|
||||
channels.register_defaults()
|
||||
|
||||
# Three model instances
|
||||
router_model = ChatOllama(
|
||||
model=ROUTER_MODEL, base_url=OLLAMA_BASE_URL, think=False, num_ctx=4096,
|
||||
# All three models route through Bifrost → Ollama GPU.
|
||||
router_model = ChatOpenAI(
|
||||
model=f"ollama/{ROUTER_MODEL}",
|
||||
base_url=LITELLM_URL,
|
||||
api_key=LITELLM_API_KEY,
|
||||
temperature=0,
|
||||
timeout=30,
|
||||
)
|
||||
medium_model = ChatOllama(
|
||||
model=MEDIUM_MODEL, base_url=OLLAMA_BASE_URL, think=False, num_ctx=8192
|
||||
embedder = AsyncOpenAI(base_url=LITELLM_URL, api_key=LITELLM_API_KEY)
|
||||
medium_model = ChatOpenAI(
|
||||
model=f"ollama/{MEDIUM_MODEL}",
|
||||
base_url=LITELLM_URL,
|
||||
api_key=LITELLM_API_KEY,
|
||||
timeout=180,
|
||||
)
|
||||
complex_model = ChatOllama(
|
||||
model=COMPLEX_MODEL, base_url=OLLAMA_BASE_URL, think=True, num_ctx=16384
|
||||
complex_model = ChatOpenAI(
|
||||
model=COMPLEX_MODEL, # full model name — may be remote (OpenRouter) or local ollama/*
|
||||
base_url=LITELLM_URL,
|
||||
api_key=LITELLM_API_KEY,
|
||||
timeout=600,
|
||||
)
|
||||
|
||||
vram_manager = VRAMManager(base_url=OLLAMA_BASE_URL)
|
||||
router = Router(model=router_model)
|
||||
router = Router(model=router_model, embedder=embedder, fast_tool_runner=_fast_tool_runner)
|
||||
await router.initialize()
|
||||
|
||||
mcp_connections = {
|
||||
"openmemory": {"transport": "sse", "url": f"{OPENMEMORY_URL}/sse"},
|
||||
@@ -97,6 +224,13 @@ async def lifespan(app: FastAPI):
|
||||
|
||||
agent_tools = [t for t in mcp_tools if t.name not in ("add_memory", "search_memory", "get_all_memories")]
|
||||
|
||||
# Expose memory tools directly so run_agent_task can call them outside the agent loop
|
||||
for t in mcp_tools:
|
||||
if t.name == "add_memory":
|
||||
_memory_add_tool = t
|
||||
elif t.name == "search_memory":
|
||||
_memory_search_tool = t
|
||||
|
||||
searx = SearxSearchWrapper(searx_host=SEARXNG_URL)
|
||||
|
||||
def _crawl4ai_fetch(url: str) -> str:
|
||||
@@ -187,13 +321,15 @@ async def lifespan(app: FastAPI):
|
||||
)
|
||||
|
||||
print(
|
||||
f"[agent] three-tier: router={ROUTER_MODEL} | medium={MEDIUM_MODEL} | complex={COMPLEX_MODEL}",
|
||||
f"[agent] litellm={LITELLM_URL} | router=semantic(ollama/{ROUTER_MODEL}+nomic-embed-text) | "
|
||||
f"medium=ollama/{MEDIUM_MODEL} | complex={COMPLEX_MODEL}",
|
||||
flush=True,
|
||||
)
|
||||
print(f"[agent] agent tools: {[t.name for t in agent_tools]}", flush=True)
|
||||
|
||||
yield
|
||||
|
||||
medium_model = None
|
||||
medium_agent = None
|
||||
complex_agent = None
|
||||
router = None
|
||||
@@ -222,13 +358,19 @@ class ChatRequest(BaseModel):
|
||||
|
||||
# ── helpers ────────────────────────────────────────────────────────────────────
|
||||
|
||||
def _strip_think(text: str) -> str:
|
||||
"""Strip qwen3 chain-of-thought blocks that appear inline in content
|
||||
when using Ollama's OpenAI-compatible endpoint (/v1/chat/completions)."""
|
||||
return _re.sub(r"<think>.*?</think>", "", text, flags=_re.DOTALL).strip()
|
||||
|
||||
|
||||
def _extract_final_text(result) -> str | None:
|
||||
msgs = result.get("messages", [])
|
||||
for m in reversed(msgs):
|
||||
if type(m).__name__ == "AIMessage" and getattr(m, "content", ""):
|
||||
return m.content
|
||||
return _strip_think(m.content)
|
||||
if isinstance(result, dict) and result.get("output"):
|
||||
return result["output"]
|
||||
return _strip_think(result["output"])
|
||||
return None
|
||||
|
||||
|
||||
@@ -244,60 +386,172 @@ def _log_messages(result):
|
||||
print(f"[agent] {role} → {tc['name']}({tc['args']})", flush=True)
|
||||
|
||||
|
||||
# ── core task ──────────────────────────────────────────────────────────────────
|
||||
# ── memory helpers ─────────────────────────────────────────────────────────────
|
||||
|
||||
async def run_agent_task(message: str, session_id: str, channel: str = "telegram"):
|
||||
print(f"[agent] queued: {message[:80]!r} chat={session_id}", flush=True)
|
||||
def _resolve_user_id(session_id: str) -> str:
|
||||
"""Map any session_id to a canonical user identity for openmemory.
|
||||
All channels share the same memory pool for the single user."""
|
||||
return "alvis"
|
||||
|
||||
force_complex = False
|
||||
clean_message = message
|
||||
if message.startswith("/think "):
|
||||
force_complex = True
|
||||
clean_message = message[len("/think "):]
|
||||
print("[agent] /think prefix → force_complex=True", flush=True)
|
||||
|
||||
async def _store_memory(session_id: str, user_msg: str, assistant_reply: str) -> None:
|
||||
"""Store a conversation turn in openmemory (runs as a background task)."""
|
||||
if _memory_add_tool is None:
|
||||
return
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
text = f"User: {user_msg}\nAssistant: {assistant_reply}"
|
||||
user_id = _resolve_user_id(session_id)
|
||||
await _memory_add_tool.ainvoke({"text": text, "user_id": user_id})
|
||||
print(f"[memory] stored in {time.monotonic() - t0:.1f}s", flush=True)
|
||||
except Exception as e:
|
||||
print(f"[memory] error: {e}", flush=True)
|
||||
|
||||
|
||||
async def _retrieve_memories(message: str, session_id: str) -> str:
|
||||
"""Search openmemory for relevant context. Returns formatted string or ''."""
|
||||
if _memory_search_tool is None:
|
||||
return ""
|
||||
try:
|
||||
user_id = _resolve_user_id(session_id)
|
||||
result = await _memory_search_tool.ainvoke({"query": message, "user_id": user_id})
|
||||
if result and result.strip() and result.strip() != "[]":
|
||||
return f"Relevant memories:\n{result}"
|
||||
except Exception:
|
||||
pass
|
||||
return ""
|
||||
|
||||
|
||||
# ── core pipeline ──────────────────────────────────────────────────────────────
|
||||
|
||||
from typing import AsyncGenerator
|
||||
|
||||
async def _run_agent_pipeline(
|
||||
message: str,
|
||||
history: list[dict],
|
||||
session_id: str,
|
||||
tier_override: str | None = None,
|
||||
dry_run: bool = False,
|
||||
) -> AsyncGenerator[str, None]:
|
||||
"""Core pipeline: pre-flight → routing → inference. Yields text chunks.
|
||||
|
||||
tier_override: "light" | "medium" | "complex" | None (auto-route)
|
||||
dry_run: if True and tier=complex, log tier=complex but use medium model (avoids API cost)
|
||||
Caller is responsible for scheduling _store_memory after consuming all chunks.
|
||||
"""
|
||||
async with _reply_semaphore:
|
||||
t0 = time.monotonic()
|
||||
history = _conversation_buffers.get(session_id, [])
|
||||
clean_message = message
|
||||
print(f"[agent] running: {clean_message[:80]!r}", flush=True)
|
||||
|
||||
tier, light_reply = await router.route(clean_message, history, force_complex)
|
||||
print(f"[agent] tier={tier} message={clean_message[:60]!r}", flush=True)
|
||||
# Fetch URL content, memories, and fast-tool context concurrently
|
||||
url_context, memories, fast_context = await asyncio.gather(
|
||||
_fetch_urls_from_message(clean_message),
|
||||
_retrieve_memories(clean_message, session_id),
|
||||
_fast_tool_runner.run_matching(clean_message),
|
||||
)
|
||||
if url_context:
|
||||
print(f"[agent] crawl4ai: {len(url_context)} chars fetched", flush=True)
|
||||
if fast_context:
|
||||
names = _fast_tool_runner.matching_names(clean_message)
|
||||
print(f"[agent] fast_tools={names}: {len(fast_context)} chars injected", flush=True)
|
||||
|
||||
# Build enriched history
|
||||
enriched_history = list(history)
|
||||
if url_context:
|
||||
enriched_history = [{"role": "system", "content": url_context}] + enriched_history
|
||||
if fast_context:
|
||||
enriched_history = [{"role": "system", "content": fast_context}] + enriched_history
|
||||
if memories:
|
||||
enriched_history = [{"role": "system", "content": memories}] + enriched_history
|
||||
|
||||
final_text = None
|
||||
llm_elapsed = 0.0
|
||||
|
||||
try:
|
||||
# Short-circuit: fast tool already has the answer
|
||||
if fast_context and tier_override is None and not url_context:
|
||||
tier = "fast"
|
||||
final_text = fast_context
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
names = _fast_tool_runner.matching_names(clean_message)
|
||||
print(f"[agent] tier=fast tools={names} — delivering directly", flush=True)
|
||||
yield final_text
|
||||
|
||||
else:
|
||||
# Determine tier
|
||||
if tier_override in ("light", "medium", "complex"):
|
||||
tier = tier_override
|
||||
light_reply = None
|
||||
if tier_override == "light":
|
||||
tier, light_reply = await router.route(clean_message, enriched_history)
|
||||
tier = "light"
|
||||
else:
|
||||
tier, light_reply = await router.route(clean_message, enriched_history)
|
||||
if url_context and tier == "light":
|
||||
tier = "medium"
|
||||
light_reply = None
|
||||
print("[agent] URL in message → upgraded light→medium", flush=True)
|
||||
|
||||
# Dry-run: log as complex but infer with medium (no remote API call)
|
||||
effective_tier = tier
|
||||
if dry_run and tier == "complex":
|
||||
effective_tier = "medium"
|
||||
print(f"[agent] tier=complex (dry-run) → using medium model, message={clean_message[:60]!r}", flush=True)
|
||||
else:
|
||||
print(f"[agent] tier={tier} message={clean_message[:60]!r}", flush=True)
|
||||
tier = effective_tier
|
||||
|
||||
if tier == "light":
|
||||
final_text = light_reply
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
print(f"[agent] light path: answered by router", flush=True)
|
||||
print("[agent] light path: answered by router", flush=True)
|
||||
yield final_text
|
||||
|
||||
elif tier == "medium":
|
||||
system_prompt = MEDIUM_SYSTEM_PROMPT
|
||||
result = await medium_agent.ainvoke({
|
||||
"messages": [
|
||||
if memories:
|
||||
system_prompt += "\n\n" + memories
|
||||
if url_context:
|
||||
system_prompt += "\n\n" + url_context
|
||||
if fast_context:
|
||||
system_prompt += "\n\nLive web search results (use these to answer):\n\n" + fast_context
|
||||
|
||||
in_think = False
|
||||
response_parts = []
|
||||
async for chunk in medium_model.astream([
|
||||
{"role": "system", "content": system_prompt},
|
||||
*history,
|
||||
{"role": "user", "content": clean_message},
|
||||
]
|
||||
})
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
_log_messages(result)
|
||||
final_text = _extract_final_text(result)
|
||||
|
||||
else: # complex
|
||||
ok = await vram_manager.enter_complex_mode()
|
||||
if not ok:
|
||||
print("[agent] complex→medium fallback (eviction timeout)", flush=True)
|
||||
tier = "medium"
|
||||
result = await medium_agent.ainvoke({
|
||||
"messages": [
|
||||
{"role": "system", "content": MEDIUM_SYSTEM_PROMPT},
|
||||
*history,
|
||||
{"role": "user", "content": clean_message},
|
||||
]
|
||||
})
|
||||
]):
|
||||
token = chunk.content or ""
|
||||
if not token:
|
||||
continue
|
||||
if in_think:
|
||||
if "</think>" in token:
|
||||
in_think = False
|
||||
after = token.split("</think>", 1)[1]
|
||||
if after:
|
||||
yield after
|
||||
response_parts.append(after)
|
||||
else:
|
||||
if "<think>" in token:
|
||||
in_think = True
|
||||
before = token.split("<think>", 1)[0]
|
||||
if before:
|
||||
yield before
|
||||
response_parts.append(before)
|
||||
else:
|
||||
yield token
|
||||
response_parts.append(token)
|
||||
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
final_text = "".join(response_parts).strip() or None
|
||||
|
||||
else: # complex — remote model, no VRAM management needed
|
||||
system_prompt = COMPLEX_SYSTEM_PROMPT.format(user_id=session_id)
|
||||
if url_context:
|
||||
system_prompt += "\n\n[Pre-fetched URL content from user's message:]\n" + url_context
|
||||
result = await complex_agent.ainvoke({
|
||||
"messages": [
|
||||
{"role": "system", "content": system_prompt},
|
||||
@@ -305,39 +559,91 @@ async def run_agent_task(message: str, session_id: str, channel: str = "telegram
|
||||
{"role": "user", "content": clean_message},
|
||||
]
|
||||
})
|
||||
asyncio.create_task(vram_manager.exit_complex_mode())
|
||||
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
_log_messages(result)
|
||||
final_text = _extract_final_text(result)
|
||||
if final_text:
|
||||
yield final_text
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
print(f"[agent] error after {llm_elapsed:.1f}s for chat {session_id}: {e}", flush=True)
|
||||
print(f"[agent] error after {llm_elapsed:.1f}s for {session_id}: {e}", flush=True)
|
||||
traceback.print_exc()
|
||||
|
||||
# Deliver reply through the originating channel
|
||||
print(f"[agent] pipeline done in {time.monotonic() - t0:.1f}s tier={tier if 'tier' in dir() else '?'}", flush=True)
|
||||
|
||||
# Store memory as side-effect (non-blocking, best-effort)
|
||||
if final_text:
|
||||
t1 = time.monotonic()
|
||||
await channels.deliver(session_id, channel, final_text)
|
||||
send_elapsed = time.monotonic() - t1
|
||||
print(
|
||||
f"[agent] replied in {time.monotonic() - t0:.1f}s "
|
||||
f"(llm={llm_elapsed:.1f}s, send={send_elapsed:.1f}s) tier={tier}",
|
||||
flush=True,
|
||||
)
|
||||
print(f"[agent] reply_text: {final_text}", flush=True)
|
||||
asyncio.create_task(_store_memory(session_id, clean_message, final_text))
|
||||
|
||||
|
||||
# ── core task (Telegram / Matrix / CLI wrapper) ─────────────────────────────────
|
||||
|
||||
async def run_agent_task(
|
||||
message: str,
|
||||
session_id: str,
|
||||
channel: str = "telegram",
|
||||
metadata: dict | None = None,
|
||||
):
|
||||
print(f"[agent] queued: {message[:80]!r} chat={session_id}", flush=True)
|
||||
t0 = time.monotonic()
|
||||
|
||||
meta = metadata or {}
|
||||
dry_run = bool(meta.get("dry_run", False))
|
||||
is_benchmark = bool(meta.get("benchmark", False))
|
||||
|
||||
history = _conversation_buffers.get(session_id, [])
|
||||
final_text = None
|
||||
actual_tier = "unknown"
|
||||
|
||||
# Patch pipeline to capture tier for logging
|
||||
# We read it from logs post-hoc; capture via a wrapper
|
||||
async for chunk in _run_agent_pipeline(message, history, session_id, dry_run=dry_run):
|
||||
await _push_stream_chunk(session_id, chunk)
|
||||
if final_text is None:
|
||||
final_text = chunk
|
||||
else:
|
||||
print("[agent] warning: no text reply from agent", flush=True)
|
||||
final_text += chunk
|
||||
|
||||
await _end_stream(session_id)
|
||||
|
||||
elapsed_ms = int((time.monotonic() - t0) * 1000)
|
||||
|
||||
if final_text:
|
||||
final_text = final_text.strip()
|
||||
|
||||
# Skip channel delivery for benchmark sessions (no Telegram spam)
|
||||
if not is_benchmark:
|
||||
try:
|
||||
await channels.deliver(session_id, channel, final_text)
|
||||
except Exception as e:
|
||||
print(f"[agent] delivery error (non-fatal): {e}", flush=True)
|
||||
|
||||
print(f"[agent] replied in {elapsed_ms / 1000:.1f}s", flush=True)
|
||||
print(f"[agent] reply_text: {final_text[:200]}", flush=True)
|
||||
|
||||
# Update conversation buffer
|
||||
if final_text:
|
||||
buf = _conversation_buffers.get(session_id, [])
|
||||
buf.append({"role": "user", "content": clean_message})
|
||||
buf.append({"role": "user", "content": message})
|
||||
buf.append({"role": "assistant", "content": final_text})
|
||||
_conversation_buffers[session_id] = buf[-(MAX_HISTORY_TURNS * 2):]
|
||||
|
||||
# Log interaction for RLHF data collection (skip benchmark sessions to avoid noise)
|
||||
if not is_benchmark:
|
||||
asyncio.create_task(_log_interaction(
|
||||
session_id=session_id,
|
||||
channel=channel,
|
||||
tier=actual_tier,
|
||||
input_text=message,
|
||||
response_text=final_text,
|
||||
latency_ms=elapsed_ms,
|
||||
metadata=meta if meta else None,
|
||||
))
|
||||
else:
|
||||
print("[agent] warning: no text reply from agent", flush=True)
|
||||
|
||||
|
||||
# ── endpoints ──────────────────────────────────────────────────────────────────
|
||||
|
||||
@@ -348,7 +654,7 @@ async def message(request: InboundMessage, background_tasks: BackgroundTasks):
|
||||
return JSONResponse(status_code=503, content={"error": "Agent not ready"})
|
||||
session_id = request.session_id
|
||||
channel = request.channel
|
||||
background_tasks.add_task(run_agent_task, request.text, session_id, channel)
|
||||
background_tasks.add_task(run_agent_task, request.text, session_id, channel, request.metadata)
|
||||
return JSONResponse(status_code=202, content={"status": "accepted"})
|
||||
|
||||
|
||||
@@ -374,13 +680,132 @@ async def reply_stream(session_id: str):
|
||||
try:
|
||||
text = await asyncio.wait_for(q.get(), timeout=900)
|
||||
# Escape newlines so entire reply fits in one SSE data line
|
||||
yield f"data: {text.replace(chr(10), '\\n').replace(chr(13), '')}\n\n"
|
||||
yield f"data: {text.replace(chr(10), chr(92) + 'n').replace(chr(13), '')}\n\n"
|
||||
except asyncio.TimeoutError:
|
||||
yield "data: [timeout]\n\n"
|
||||
|
||||
return StreamingResponse(event_generator(), media_type="text/event-stream")
|
||||
|
||||
|
||||
@app.get("/stream/{session_id}")
|
||||
async def stream_reply(session_id: str):
|
||||
"""
|
||||
SSE endpoint — streams reply tokens as they are generated.
|
||||
Each chunk: data: <token>\\n\\n
|
||||
Signals completion: data: [DONE]\\n\\n
|
||||
|
||||
Medium tier: real token-by-token streaming (think blocks filtered out).
|
||||
Light and complex tiers: full reply delivered as one chunk then [DONE].
|
||||
"""
|
||||
q = _stream_queues.setdefault(session_id, asyncio.Queue())
|
||||
|
||||
async def event_generator():
|
||||
try:
|
||||
while True:
|
||||
chunk = await asyncio.wait_for(q.get(), timeout=900)
|
||||
escaped = chunk.replace("\n", "\\n").replace("\r", "")
|
||||
yield f"data: {escaped}\n\n"
|
||||
if chunk == "[DONE]":
|
||||
break
|
||||
except asyncio.TimeoutError:
|
||||
yield "data: [DONE]\n\n"
|
||||
|
||||
return StreamingResponse(event_generator(), media_type="text/event-stream")
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "ok", "agent_ready": medium_agent is not None}
|
||||
|
||||
|
||||
# ── OpenAI-compatible API (for OpenWebUI and other clients) ────────────────────
|
||||
|
||||
_TIER_MAP = {
|
||||
"adolf": None,
|
||||
"adolf-light": "light",
|
||||
"adolf-medium": "medium",
|
||||
"adolf-deep": "complex",
|
||||
}
|
||||
|
||||
|
||||
@app.get("/v1/models")
|
||||
async def list_models():
|
||||
return {
|
||||
"object": "list",
|
||||
"data": [
|
||||
{"id": "adolf", "object": "model", "owned_by": "adolf"},
|
||||
{"id": "adolf-light", "object": "model", "owned_by": "adolf"},
|
||||
{"id": "adolf-medium", "object": "model", "owned_by": "adolf"},
|
||||
{"id": "adolf-deep", "object": "model", "owned_by": "adolf"},
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@app.post("/v1/chat/completions")
|
||||
async def chat_completions(request: Request):
|
||||
if medium_agent is None:
|
||||
return JSONResponse(status_code=503, content={"error": {"message": "Agent not ready", "type": "server_error"}})
|
||||
|
||||
body = await request.json()
|
||||
model = body.get("model", "adolf")
|
||||
messages = body.get("messages", [])
|
||||
stream = body.get("stream", True)
|
||||
|
||||
# Extract current user message and history
|
||||
user_messages = [m for m in messages if m.get("role") == "user"]
|
||||
if not user_messages:
|
||||
return JSONResponse(status_code=400, content={"error": {"message": "No user message", "type": "invalid_request_error"}})
|
||||
|
||||
current_message = user_messages[-1]["content"]
|
||||
# History = everything before the last user message (excluding system messages from OpenWebUI)
|
||||
last_user_idx = len(messages) - 1 - next(
|
||||
i for i, m in enumerate(reversed(messages)) if m.get("role") == "user"
|
||||
)
|
||||
history = [m for m in messages[:last_user_idx] if m.get("role") in ("user", "assistant")]
|
||||
|
||||
session_id = request.headers.get("X-Session-Id", "owui-default")
|
||||
tier_override = _TIER_MAP.get(model)
|
||||
|
||||
import json as _json
|
||||
import uuid as _uuid
|
||||
|
||||
response_id = f"chatcmpl-{_uuid.uuid4().hex[:12]}"
|
||||
|
||||
if stream:
|
||||
async def event_stream():
|
||||
# Opening chunk with role
|
||||
opening = {
|
||||
"id": response_id, "object": "chat.completion.chunk",
|
||||
"choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}]
|
||||
}
|
||||
yield f"data: {_json.dumps(opening)}\n\n"
|
||||
|
||||
async for chunk in _run_agent_pipeline(current_message, history, session_id, tier_override):
|
||||
data = {
|
||||
"id": response_id, "object": "chat.completion.chunk",
|
||||
"choices": [{"index": 0, "delta": {"content": chunk}, "finish_reason": None}]
|
||||
}
|
||||
yield f"data: {_json.dumps(data)}\n\n"
|
||||
|
||||
# Final chunk
|
||||
final = {
|
||||
"id": response_id, "object": "chat.completion.chunk",
|
||||
"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]
|
||||
}
|
||||
yield f"data: {_json.dumps(final)}\n\n"
|
||||
yield "data: [DONE]\n\n"
|
||||
|
||||
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||
|
||||
else:
|
||||
# Non-streaming: collect all chunks
|
||||
parts = []
|
||||
async for chunk in _run_agent_pipeline(current_message, history, session_id, tier_override):
|
||||
if chunk:
|
||||
parts.append(chunk)
|
||||
full_text = "".join(parts).strip()
|
||||
return {
|
||||
"id": response_id, "object": "chat.completion",
|
||||
"choices": [{"index": 0, "message": {"role": "assistant", "content": full_text}, "finish_reason": "stop"}],
|
||||
"model": model,
|
||||
}
|
||||
|
||||
@@ -1,13 +1,21 @@
|
||||
from deepagents import create_deep_agent
|
||||
|
||||
|
||||
class _DirectModel:
|
||||
"""Thin wrapper: single LLM call, no tools, same ainvoke interface as a graph."""
|
||||
|
||||
def __init__(self, model):
|
||||
self._model = model
|
||||
|
||||
async def ainvoke(self, input_dict: dict) -> dict:
|
||||
messages = input_dict["messages"]
|
||||
response = await self._model.ainvoke(messages)
|
||||
return {"messages": list(messages) + [response]}
|
||||
|
||||
|
||||
def build_medium_agent(model, agent_tools: list, system_prompt: str):
|
||||
"""Medium agent: create_deep_agent with TodoList planning, no subagents."""
|
||||
return create_deep_agent(
|
||||
model=model,
|
||||
tools=agent_tools,
|
||||
system_prompt=system_prompt,
|
||||
)
|
||||
"""Medium agent: single LLM call, no tools — fast ~3s response."""
|
||||
return _DirectModel(model)
|
||||
|
||||
|
||||
def build_complex_agent(model, agent_tools: list, system_prompt: str):
|
||||
|
||||
318
benchmarks/run_benchmark.py
Normal file
318
benchmarks/run_benchmark.py
Normal file
@@ -0,0 +1,318 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf routing benchmark.
|
||||
|
||||
Sends each query to Adolf's /message endpoint, waits briefly for the routing
|
||||
decision to appear in docker logs, then records the actual tier.
|
||||
|
||||
Usage:
|
||||
python3 run_benchmark.py [options]
|
||||
python3 run_benchmark.py --tier light|medium|complex
|
||||
python3 run_benchmark.py --category <name>
|
||||
python3 run_benchmark.py --ids 1,2,3
|
||||
python3 run_benchmark.py --list-categories
|
||||
python3 run_benchmark.py --dry-run # complex queries use medium model (no API cost)
|
||||
|
||||
IMPORTANT: Always check GPU is free before running. This script does it automatically.
|
||||
|
||||
Adolf must be running at http://localhost:8000.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import httpx
|
||||
|
||||
ADOLF_URL = "http://localhost:8000"
|
||||
OLLAMA_URL = "http://localhost:11436" # GPU Ollama
|
||||
DATASET = Path(__file__).parent / "benchmark.json"
|
||||
RESULTS = Path(__file__).parent / "results_latest.json"
|
||||
|
||||
# Max time to wait for each query to fully complete via SSE stream
|
||||
QUERY_TIMEOUT = 300 # seconds — generous to handle GPU semaphore waits
|
||||
|
||||
# Memory thresholds
|
||||
MIN_FREE_RAM_MB = 1500 # abort if less than this is free
|
||||
MIN_FREE_VRAM_MB = 500 # warn if less than this is free on GPU
|
||||
|
||||
|
||||
# ── Pre-flight checks ──────────────────────────────────────────────────────────
|
||||
|
||||
def check_ram() -> tuple[bool, str]:
|
||||
"""Check available system RAM. Returns (ok, message)."""
|
||||
try:
|
||||
with open("/proc/meminfo") as f:
|
||||
info = {}
|
||||
for line in f:
|
||||
parts = line.split()
|
||||
if len(parts) >= 2:
|
||||
info[parts[0].rstrip(":")] = int(parts[1])
|
||||
free_mb = (info.get("MemAvailable", 0)) // 1024
|
||||
total_mb = info.get("MemTotal", 0) // 1024
|
||||
msg = f"RAM: {free_mb} MB free / {total_mb} MB total"
|
||||
if free_mb < MIN_FREE_RAM_MB:
|
||||
return False, f"CRITICAL: {msg} — need at least {MIN_FREE_RAM_MB} MB free"
|
||||
return True, msg
|
||||
except Exception as e:
|
||||
return True, f"RAM check failed (non-fatal): {e}"
|
||||
|
||||
|
||||
def check_gpu() -> tuple[bool, str]:
|
||||
"""Check GPU VRAM via Ollama /api/ps. Returns (ok, message)."""
|
||||
try:
|
||||
r = httpx.get(f"{OLLAMA_URL}/api/ps", timeout=5)
|
||||
r.raise_for_status()
|
||||
data = r.json()
|
||||
models = data.get("models", [])
|
||||
if models:
|
||||
names = [m.get("name", "?") for m in models]
|
||||
sizes_mb = [m.get("size_vram", 0) // (1024 * 1024) for m in models]
|
||||
loaded = ", ".join(f"{n} ({s}MB)" for n, s in zip(names, sizes_mb))
|
||||
total_vram = sum(sizes_mb)
|
||||
if total_vram > 7000:
|
||||
return False, f"GPU BUSY: models loaded = {loaded} — total VRAM used {total_vram}MB. Wait for models to unload."
|
||||
return True, f"GPU: models loaded = {loaded} (total {total_vram}MB VRAM)"
|
||||
return True, "GPU: idle (no models loaded)"
|
||||
except httpx.ConnectError:
|
||||
return True, "GPU check skipped (Ollama not reachable at localhost:11436)"
|
||||
except Exception as e:
|
||||
return True, f"GPU check failed (non-fatal): {e}"
|
||||
|
||||
|
||||
def preflight_checks(skip_gpu_check: bool = False) -> bool:
|
||||
"""Run all pre-flight checks. Returns True if safe to proceed."""
|
||||
print("\n── Pre-flight checks ──────────────────────────────────────────")
|
||||
|
||||
ram_ok, ram_msg = check_ram()
|
||||
print(f" {'✓' if ram_ok else '✗'} {ram_msg}")
|
||||
if not ram_ok:
|
||||
print("\nABORTING: not enough RAM. Free up memory before running benchmark.")
|
||||
return False
|
||||
|
||||
if not skip_gpu_check:
|
||||
gpu_ok, gpu_msg = check_gpu()
|
||||
print(f" {'✓' if gpu_ok else '✗'} {gpu_msg}")
|
||||
if not gpu_ok:
|
||||
print("\nABORTING: GPU is busy. Wait for current inference to finish, then retry.")
|
||||
return False
|
||||
|
||||
print(" All checks passed.\n")
|
||||
return True
|
||||
|
||||
|
||||
# ── Log helpers ────────────────────────────────────────────────────────────────
|
||||
|
||||
def get_log_tail(n: int = 50) -> str:
|
||||
result = subprocess.run(
|
||||
["docker", "logs", "deepagents", "--tail", str(n)],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
return result.stdout + result.stderr
|
||||
|
||||
|
||||
def extract_tier_from_logs(logs_before: str, logs_after: str) -> str | None:
|
||||
"""Find new tier= lines that appeared after we sent the query."""
|
||||
before_lines = set(logs_before.splitlines())
|
||||
new_lines = [l for l in logs_after.splitlines() if l not in before_lines]
|
||||
for line in new_lines:
|
||||
m = re.search(r"tier=(\w+(?:\s*\(dry-run\))?)", line)
|
||||
if m:
|
||||
tier_raw = m.group(1)
|
||||
# Normalise: "complex (dry-run)" → "complex"
|
||||
return tier_raw.split()[0]
|
||||
return None
|
||||
|
||||
|
||||
# ── Request helpers ────────────────────────────────────────────────────────────
|
||||
|
||||
async def post_message(
|
||||
client: httpx.AsyncClient,
|
||||
query_id: int,
|
||||
query: str,
|
||||
dry_run: bool = False,
|
||||
) -> bool:
|
||||
payload = {
|
||||
"text": query,
|
||||
"session_id": f"benchmark-{query_id}",
|
||||
"channel": "cli",
|
||||
"user_id": "benchmark",
|
||||
"metadata": {"dry_run": dry_run, "benchmark": True},
|
||||
}
|
||||
try:
|
||||
r = await client.post(f"{ADOLF_URL}/message", json=payload, timeout=10)
|
||||
r.raise_for_status()
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f" POST_ERROR: {e}", end="")
|
||||
return False
|
||||
|
||||
|
||||
# ── Dataset ────────────────────────────────────────────────────────────────────
|
||||
|
||||
def load_dataset() -> list[dict]:
|
||||
with open(DATASET) as f:
|
||||
return json.load(f)["queries"]
|
||||
|
||||
|
||||
def filter_queries(queries, tier, category, ids):
|
||||
if tier:
|
||||
queries = [q for q in queries if q["tier"] == tier]
|
||||
if category:
|
||||
queries = [q for q in queries if q["category"] == category]
|
||||
if ids:
|
||||
queries = [q for q in queries if q["id"] in ids]
|
||||
return queries
|
||||
|
||||
|
||||
# ── Main run ───────────────────────────────────────────────────────────────────
|
||||
|
||||
async def run(queries: list[dict], dry_run: bool = False) -> list[dict]:
|
||||
results = []
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
try:
|
||||
r = await client.get(f"{ADOLF_URL}/health", timeout=5)
|
||||
r.raise_for_status()
|
||||
except Exception as e:
|
||||
print(f"ERROR: Adolf not reachable: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
total = len(queries)
|
||||
correct = 0
|
||||
|
||||
dry_label = " [DRY-RUN: complex→medium]" if dry_run else ""
|
||||
print(f"\nRunning {total} queries{dry_label}\n")
|
||||
print(f"{'ID':>3} {'EXPECTED':8} {'ACTUAL':8} {'OK':3} {'TIME':6} {'CATEGORY':22} QUERY")
|
||||
print("─" * 110)
|
||||
|
||||
for q in queries:
|
||||
qid = q["id"]
|
||||
expected = q["tier"]
|
||||
category = q["category"]
|
||||
query_text = q["query"]
|
||||
|
||||
# In dry-run, complex queries still use complex classification (logged), but medium infers
|
||||
send_dry = dry_run and expected == "complex"
|
||||
session_id = f"benchmark-{qid}"
|
||||
|
||||
print(f"{qid:>3} {expected:8} ", end="", flush=True)
|
||||
|
||||
logs_before = get_log_tail(300)
|
||||
t0 = time.monotonic()
|
||||
|
||||
ok_post = await post_message(client, qid, query_text, dry_run=send_dry)
|
||||
if not ok_post:
|
||||
print(f"{'?':8} {'ERR':3} {'?':6} {category:22} {query_text[:40]}")
|
||||
results.append({"id": qid, "expected": expected, "actual": None, "ok": False})
|
||||
continue
|
||||
|
||||
# Wait for query to complete via SSE stream (handles GPU semaphore waits)
|
||||
try:
|
||||
async with client.stream(
|
||||
"GET", f"{ADOLF_URL}/stream/{session_id}", timeout=QUERY_TIMEOUT
|
||||
) as sse:
|
||||
async for line in sse.aiter_lines():
|
||||
if "data: [DONE]" in line:
|
||||
break
|
||||
except Exception:
|
||||
pass # timeout or connection issue — check logs anyway
|
||||
|
||||
# Now the query is done — check logs for tier
|
||||
await asyncio.sleep(0.3)
|
||||
logs_after = get_log_tail(300)
|
||||
actual = extract_tier_from_logs(logs_before, logs_after)
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
match = actual == expected or (actual == "fast" and expected == "medium")
|
||||
if match:
|
||||
correct += 1
|
||||
|
||||
mark = "✓" if match else "✗"
|
||||
actual_str = actual or "?"
|
||||
print(f"{actual_str:8} {mark:3} {elapsed:5.1f}s {category:22} {query_text[:40]}")
|
||||
|
||||
results.append({
|
||||
"id": qid,
|
||||
"expected": expected,
|
||||
"actual": actual_str,
|
||||
"ok": match,
|
||||
"elapsed": round(elapsed, 1),
|
||||
"category": category,
|
||||
"query": query_text,
|
||||
"dry_run": send_dry,
|
||||
})
|
||||
|
||||
print("─" * 110)
|
||||
accuracy = correct / total * 100 if total else 0
|
||||
print(f"\nAccuracy: {correct}/{total} ({accuracy:.0f}%)")
|
||||
|
||||
for tier_name in ["light", "medium", "complex"]:
|
||||
tier_qs = [r for r in results if r["expected"] == tier_name]
|
||||
if tier_qs:
|
||||
tier_ok = sum(1 for r in tier_qs if r["ok"])
|
||||
print(f" {tier_name:8}: {tier_ok}/{len(tier_qs)}")
|
||||
|
||||
wrong = [r for r in results if not r["ok"]]
|
||||
if wrong:
|
||||
print(f"\nMisclassified ({len(wrong)}):")
|
||||
for r in wrong:
|
||||
print(f" id={r['id']:3} expected={r['expected']:8} actual={r['actual']:8} {r['query'][:60]}")
|
||||
|
||||
with open(RESULTS, "w") as f:
|
||||
json.dump(results, f, indent=2, ensure_ascii=False)
|
||||
print(f"\nResults saved to {RESULTS}")
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Adolf routing benchmark",
|
||||
epilog="IMPORTANT: Always check GPU is free before running. This is done automatically."
|
||||
)
|
||||
parser.add_argument("--tier", choices=["light", "medium", "complex"])
|
||||
parser.add_argument("--category")
|
||||
parser.add_argument("--ids", help="Comma-separated IDs")
|
||||
parser.add_argument("--list-categories", action="store_true")
|
||||
parser.add_argument(
|
||||
"--dry-run",
|
||||
action="store_true",
|
||||
help="For complex queries: route classification is tested but medium model is used for inference (no API cost)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-gpu-check",
|
||||
action="store_true",
|
||||
help="Skip GPU availability check (use only if you know GPU is free)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
queries = load_dataset()
|
||||
|
||||
if args.list_categories:
|
||||
cats = sorted(set(q["category"] for q in queries))
|
||||
tiers = {t: sum(1 for q in queries if q["tier"] == t) for t in ["light", "medium", "complex"]}
|
||||
print(f"Total: {len(queries)} | Tiers: {tiers}")
|
||||
print(f"Categories: {cats}")
|
||||
return
|
||||
|
||||
# ALWAYS check GPU and RAM before running
|
||||
if not preflight_checks(skip_gpu_check=args.skip_gpu_check):
|
||||
sys.exit(1)
|
||||
|
||||
ids = [int(i) for i in args.ids.split(",")] if args.ids else None
|
||||
queries = filter_queries(queries, args.tier, args.category, ids)
|
||||
if not queries:
|
||||
print("No queries match filters.")
|
||||
sys.exit(1)
|
||||
|
||||
asyncio.run(run(queries, dry_run=args.dry_run))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
426
benchmarks/run_voice_benchmark.py
Normal file
426
benchmarks/run_voice_benchmark.py
Normal file
@@ -0,0 +1,426 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf voice benchmark.
|
||||
|
||||
Pipeline for each query:
|
||||
1. Synthesize query text → WAV via Silero TTS (localhost:8881)
|
||||
2. Transcribe WAV → text via faster-whisper STT (localhost:8880)
|
||||
3. Send transcription to Adolf → check routing tier
|
||||
4. Report: WER per query, routing accuracy vs text baseline
|
||||
|
||||
Usage:
|
||||
python3 run_voice_benchmark.py [options]
|
||||
python3 run_voice_benchmark.py --tier light|medium|complex
|
||||
python3 run_voice_benchmark.py --ids 1,2,3
|
||||
python3 run_voice_benchmark.py --dry-run # complex queries use medium model
|
||||
|
||||
IMPORTANT: Always check GPU is free before running. Done automatically.
|
||||
|
||||
Services required:
|
||||
- Adolf: http://localhost:8000
|
||||
- Silero TTS: http://localhost:8881 (openai/silero-tts container)
|
||||
- faster-whisper: http://localhost:8880 (faster-whisper container)
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import io
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
import unicodedata
|
||||
from pathlib import Path
|
||||
|
||||
import httpx
|
||||
|
||||
ADOLF_URL = "http://localhost:8000"
|
||||
OLLAMA_URL = "http://localhost:11436"
|
||||
TTS_URL = "http://localhost:8881" # Silero TTS — OpenAI-compatible /v1/audio/speech
|
||||
STT_URL = "http://localhost:8880" # faster-whisper — OpenAI-compatible /v1/audio/transcriptions
|
||||
|
||||
DATASET = Path(__file__).parent / "benchmark.json"
|
||||
RESULTS_DIR = Path(__file__).parent
|
||||
|
||||
TIER_WAIT = 15 # seconds to wait for tier= in docker logs
|
||||
MIN_FREE_RAM_MB = 1500
|
||||
MIN_FREE_VRAM_MB = 500
|
||||
|
||||
|
||||
# ── Pre-flight ─────────────────────────────────────────────────────────────────
|
||||
|
||||
def check_ram() -> tuple[bool, str]:
|
||||
try:
|
||||
with open("/proc/meminfo") as f:
|
||||
info = {}
|
||||
for line in f:
|
||||
parts = line.split()
|
||||
if len(parts) >= 2:
|
||||
info[parts[0].rstrip(":")] = int(parts[1])
|
||||
free_mb = info.get("MemAvailable", 0) // 1024
|
||||
total_mb = info.get("MemTotal", 0) // 1024
|
||||
msg = f"RAM: {free_mb} MB free / {total_mb} MB total"
|
||||
if free_mb < MIN_FREE_RAM_MB:
|
||||
return False, f"CRITICAL: {msg} — need at least {MIN_FREE_RAM_MB} MB free"
|
||||
return True, msg
|
||||
except Exception as e:
|
||||
return True, f"RAM check failed (non-fatal): {e}"
|
||||
|
||||
|
||||
def check_gpu() -> tuple[bool, str]:
|
||||
try:
|
||||
r = httpx.get(f"{OLLAMA_URL}/api/ps", timeout=5)
|
||||
r.raise_for_status()
|
||||
data = r.json()
|
||||
models = data.get("models", [])
|
||||
if models:
|
||||
names = [m.get("name", "?") for m in models]
|
||||
sizes_mb = [m.get("size_vram", 0) // (1024 * 1024) for m in models]
|
||||
loaded = ", ".join(f"{n} ({s}MB)" for n, s in zip(names, sizes_mb))
|
||||
total_vram = sum(sizes_mb)
|
||||
if total_vram > 7000:
|
||||
return False, f"GPU BUSY: {loaded} — {total_vram}MB VRAM used. Wait for models to unload."
|
||||
return True, f"GPU: {loaded} ({total_vram}MB VRAM)"
|
||||
return True, "GPU: idle"
|
||||
except httpx.ConnectError:
|
||||
return True, "GPU check skipped (Ollama not reachable)"
|
||||
except Exception as e:
|
||||
return True, f"GPU check failed (non-fatal): {e}"
|
||||
|
||||
|
||||
def check_services() -> tuple[bool, str]:
|
||||
"""Check TTS and STT are reachable."""
|
||||
msgs = []
|
||||
ok = True
|
||||
for name, url, path in [("TTS", TTS_URL, "/"), ("STT", STT_URL, "/")]:
|
||||
try:
|
||||
r = httpx.get(url + path, timeout=5)
|
||||
msgs.append(f"{name}: reachable (HTTP {r.status_code})")
|
||||
except Exception as e:
|
||||
msgs.append(f"{name}: NOT REACHABLE — {e}")
|
||||
ok = False
|
||||
return ok, " | ".join(msgs)
|
||||
|
||||
|
||||
def preflight_checks(skip_gpu_check: bool = False) -> bool:
|
||||
print("\n── Pre-flight checks ──────────────────────────────────────────")
|
||||
ram_ok, ram_msg = check_ram()
|
||||
print(f" {'✓' if ram_ok else '✗'} {ram_msg}")
|
||||
if not ram_ok:
|
||||
print("\nABORTING: not enough RAM.")
|
||||
return False
|
||||
|
||||
if not skip_gpu_check:
|
||||
gpu_ok, gpu_msg = check_gpu()
|
||||
print(f" {'✓' if gpu_ok else '✗'} {gpu_msg}")
|
||||
if not gpu_ok:
|
||||
print("\nABORTING: GPU is busy.")
|
||||
return False
|
||||
|
||||
svc_ok, svc_msg = check_services()
|
||||
print(f" {'✓' if svc_ok else '✗'} {svc_msg}")
|
||||
if not svc_ok:
|
||||
print("\nABORTING: required voice services not running.")
|
||||
print("Start them with: cd /home/alvis/agap_git/openai && docker compose up -d faster-whisper silero-tts")
|
||||
return False
|
||||
|
||||
print(" All checks passed.\n")
|
||||
return True
|
||||
|
||||
|
||||
# ── TTS ────────────────────────────────────────────────────────────────────────
|
||||
|
||||
async def synthesize(client: httpx.AsyncClient, text: str) -> bytes | None:
|
||||
"""Synthesize text to WAV via Silero TTS (OpenAI-compatible /v1/audio/speech)."""
|
||||
try:
|
||||
r = await client.post(
|
||||
f"{TTS_URL}/v1/audio/speech",
|
||||
json={"model": "tts-1", "input": text, "voice": "alloy", "response_format": "wav"},
|
||||
timeout=30,
|
||||
)
|
||||
r.raise_for_status()
|
||||
return r.content
|
||||
except Exception as e:
|
||||
print(f"\n [TTS error: {e}]", end="")
|
||||
return None
|
||||
|
||||
|
||||
# ── STT ────────────────────────────────────────────────────────────────────────
|
||||
|
||||
async def transcribe(client: httpx.AsyncClient, wav_bytes: bytes) -> str | None:
|
||||
"""Transcribe WAV to text via faster-whisper (OpenAI-compatible /v1/audio/transcriptions)."""
|
||||
try:
|
||||
files = {"file": ("audio.wav", wav_bytes, "audio/wav")}
|
||||
data = {"model": "whisper-1", "language": "ru", "response_format": "json"}
|
||||
r = await client.post(
|
||||
f"{STT_URL}/v1/audio/transcriptions",
|
||||
files=files,
|
||||
data=data,
|
||||
timeout=60,
|
||||
)
|
||||
r.raise_for_status()
|
||||
result = r.json()
|
||||
return result.get("text", "").strip()
|
||||
except Exception as e:
|
||||
print(f"\n [STT error: {e}]", end="")
|
||||
return None
|
||||
|
||||
|
||||
# ── WER ────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def normalize(text: str) -> str:
|
||||
"""Lowercase, strip punctuation, normalize unicode for WER calculation."""
|
||||
text = unicodedata.normalize("NFC", text.lower())
|
||||
text = re.sub(r"[^\w\s]", " ", text)
|
||||
return re.sub(r"\s+", " ", text).strip()
|
||||
|
||||
|
||||
def word_error_rate(reference: str, hypothesis: str) -> float:
|
||||
"""Compute WER between reference and hypothesis."""
|
||||
ref = normalize(reference).split()
|
||||
hyp = normalize(hypothesis).split()
|
||||
if not ref:
|
||||
return 0.0 if not hyp else 1.0
|
||||
# Dynamic programming edit distance
|
||||
d = [[0] * (len(hyp) + 1) for _ in range(len(ref) + 1)]
|
||||
for i in range(len(ref) + 1):
|
||||
d[i][0] = i
|
||||
for j in range(len(hyp) + 1):
|
||||
d[0][j] = j
|
||||
for i in range(1, len(ref) + 1):
|
||||
for j in range(1, len(hyp) + 1):
|
||||
if ref[i - 1] == hyp[j - 1]:
|
||||
d[i][j] = d[i - 1][j - 1]
|
||||
else:
|
||||
d[i][j] = 1 + min(d[i - 1][j], d[i][j - 1], d[i - 1][j - 1])
|
||||
return d[len(ref)][len(hyp)] / len(ref)
|
||||
|
||||
|
||||
# ── Adolf interaction ──────────────────────────────────────────────────────────
|
||||
|
||||
def get_log_tail(n: int = 60) -> str:
|
||||
result = subprocess.run(
|
||||
["docker", "logs", "deepagents", "--tail", str(n)],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
return result.stdout + result.stderr
|
||||
|
||||
|
||||
def extract_tier_from_logs(logs_before: str, logs_after: str) -> str | None:
|
||||
before_lines = set(logs_before.splitlines())
|
||||
new_lines = [l for l in logs_after.splitlines() if l not in before_lines]
|
||||
for line in reversed(new_lines):
|
||||
m = re.search(r"tier=(\w+(?:\s*\(dry-run\))?)", line)
|
||||
if m:
|
||||
return m.group(1).split()[0]
|
||||
return None
|
||||
|
||||
|
||||
async def post_to_adolf(
|
||||
client: httpx.AsyncClient,
|
||||
query_id: int,
|
||||
text: str,
|
||||
dry_run: bool = False,
|
||||
) -> bool:
|
||||
payload = {
|
||||
"text": text,
|
||||
"session_id": f"voice-bench-{query_id}",
|
||||
"channel": "cli",
|
||||
"user_id": "benchmark",
|
||||
"metadata": {"dry_run": dry_run, "benchmark": True, "voice": True},
|
||||
}
|
||||
try:
|
||||
r = await client.post(f"{ADOLF_URL}/message", json=payload, timeout=10)
|
||||
r.raise_for_status()
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"\n [Adolf error: {e}]", end="")
|
||||
return False
|
||||
|
||||
|
||||
# ── Dataset ────────────────────────────────────────────────────────────────────
|
||||
|
||||
def load_dataset() -> list[dict]:
|
||||
with open(DATASET) as f:
|
||||
return json.load(f)["queries"]
|
||||
|
||||
|
||||
def filter_queries(queries, tier, category, ids):
|
||||
if tier:
|
||||
queries = [q for q in queries if q["tier"] == tier]
|
||||
if category:
|
||||
queries = [q for q in queries if q["category"] == category]
|
||||
if ids:
|
||||
queries = [q for q in queries if q["id"] in ids]
|
||||
return queries
|
||||
|
||||
|
||||
# ── Main run ───────────────────────────────────────────────────────────────────
|
||||
|
||||
async def run(queries: list[dict], dry_run: bool = False, save_audio: bool = False) -> None:
|
||||
async with httpx.AsyncClient() as client:
|
||||
# Check Adolf
|
||||
try:
|
||||
r = await client.get(f"{ADOLF_URL}/health", timeout=5)
|
||||
r.raise_for_status()
|
||||
except Exception as e:
|
||||
print(f"ERROR: Adolf not reachable: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
total = len(queries)
|
||||
results = []
|
||||
|
||||
dry_label = " [DRY-RUN]" if dry_run else ""
|
||||
print(f"Voice benchmark: {total} queries{dry_label}\n")
|
||||
print(f"{'ID':>3} {'EXP':8} {'ACT':8} {'OK':3} {'WER':5} {'TRANSCRIPT'}")
|
||||
print("─" * 100)
|
||||
|
||||
total_wer = 0.0
|
||||
wer_count = 0
|
||||
correct = 0
|
||||
|
||||
for q in queries:
|
||||
qid = q["id"]
|
||||
expected = q["tier"]
|
||||
original = q["query"]
|
||||
print(f"{qid:>3} {expected:8} ", end="", flush=True)
|
||||
|
||||
# Step 1: TTS
|
||||
wav = await synthesize(client, original)
|
||||
if wav is None:
|
||||
print(f"{'?':8} {'ERR':3} {'?':5} [TTS failed]")
|
||||
results.append({"id": qid, "expected": expected, "actual": None, "ok": False, "wer": None, "error": "tts"})
|
||||
continue
|
||||
|
||||
if save_audio:
|
||||
audio_path = RESULTS_DIR / f"voice_audio" / f"{qid}.wav"
|
||||
audio_path.parent.mkdir(exist_ok=True)
|
||||
audio_path.write_bytes(wav)
|
||||
|
||||
# Step 2: STT
|
||||
transcript = await transcribe(client, wav)
|
||||
if transcript is None:
|
||||
print(f"{'?':8} {'ERR':3} {'?':5} [STT failed]")
|
||||
results.append({"id": qid, "expected": expected, "actual": None, "ok": False, "wer": None, "error": "stt"})
|
||||
continue
|
||||
|
||||
# Calculate WER
|
||||
wer = word_error_rate(original, transcript)
|
||||
total_wer += wer
|
||||
wer_count += 1
|
||||
|
||||
# Step 3: Send to Adolf
|
||||
send_dry = dry_run and expected == "complex"
|
||||
logs_before = get_log_tail(60)
|
||||
t0 = time.monotonic()
|
||||
|
||||
ok_post = await post_to_adolf(client, qid, transcript, dry_run=send_dry)
|
||||
if not ok_post:
|
||||
print(f"{'?':8} {'ERR':3} {wer:4.2f} {transcript[:50]}")
|
||||
results.append({"id": qid, "expected": expected, "actual": None, "ok": False, "wer": wer, "transcript": transcript})
|
||||
continue
|
||||
|
||||
# Step 4: Wait for routing decision
|
||||
actual = None
|
||||
for _ in range(TIER_WAIT * 2):
|
||||
await asyncio.sleep(0.5)
|
||||
logs_after = get_log_tail(60)
|
||||
actual = extract_tier_from_logs(logs_before, logs_after)
|
||||
if actual and actual in ("light", "medium", "complex", "fast"):
|
||||
break
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
match = actual == expected or (actual == "fast" and expected == "medium")
|
||||
if match:
|
||||
correct += 1
|
||||
|
||||
mark = "✓" if match else "✗"
|
||||
actual_str = actual or "?"
|
||||
print(f"{actual_str:8} {mark:3} {wer:4.2f} {transcript[:60]}")
|
||||
|
||||
results.append({
|
||||
"id": qid,
|
||||
"expected": expected,
|
||||
"actual": actual_str,
|
||||
"ok": match,
|
||||
"wer": round(wer, 3),
|
||||
"original": original,
|
||||
"transcript": transcript,
|
||||
"elapsed": round(elapsed, 1),
|
||||
"dry_run": send_dry,
|
||||
})
|
||||
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
print("─" * 100)
|
||||
|
||||
# Summary
|
||||
accuracy = correct / total * 100 if total else 0
|
||||
avg_wer = total_wer / wer_count * 100 if wer_count else 0
|
||||
print(f"\nRouting accuracy: {correct}/{total} ({accuracy:.0f}%)")
|
||||
print(f"Average WER: {avg_wer:.1f}% (lower is better; 0% = perfect transcription)")
|
||||
|
||||
for tier_name in ["light", "medium", "complex"]:
|
||||
tier_qs = [r for r in results if r["expected"] == tier_name]
|
||||
if tier_qs:
|
||||
tier_ok = sum(1 for r in tier_qs if r["ok"])
|
||||
tier_wers = [r["wer"] for r in tier_qs if r.get("wer") is not None]
|
||||
avg = sum(tier_wers) / len(tier_wers) * 100 if tier_wers else 0
|
||||
print(f" {tier_name:8}: routing {tier_ok}/{len(tier_qs)} avg WER {avg:.1f}%")
|
||||
|
||||
wrong = [r for r in results if not r["ok"]]
|
||||
if wrong:
|
||||
print(f"\nMisclassified after voice ({len(wrong)}):")
|
||||
for r in wrong:
|
||||
print(f" id={r['id']:3} expected={r.get('expected','?'):8} actual={r.get('actual','?'):8} transcript={r.get('transcript','')[:50]}")
|
||||
|
||||
high_wer = [r for r in results if r.get("wer") and r["wer"] > 0.3]
|
||||
if high_wer:
|
||||
print(f"\nHigh WER queries (>30%) — transcription quality issues:")
|
||||
for r in high_wer:
|
||||
print(f" id={r['id']:3} WER={r['wer']*100:.0f}% original: {r.get('original','')[:50]}")
|
||||
print(f" transcript: {r.get('transcript','')[:50]}")
|
||||
|
||||
# Save results
|
||||
ts = int(time.time())
|
||||
out_path = RESULTS_DIR / f"voice_results_{ts}.json"
|
||||
latest_path = RESULTS_DIR / "voice_results_latest.json"
|
||||
with open(out_path, "w") as f:
|
||||
json.dump({"summary": {"accuracy": accuracy, "avg_wer": avg_wer, "total": total}, "results": results}, f, indent=2, ensure_ascii=False)
|
||||
with open(latest_path, "w") as f:
|
||||
json.dump({"summary": {"accuracy": accuracy, "avg_wer": avg_wer, "total": total}, "results": results}, f, indent=2, ensure_ascii=False)
|
||||
print(f"\nResults saved to {latest_path}")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Adolf voice benchmark — TTS→STT→routing pipeline",
|
||||
epilog="Requires: Silero TTS (port 8881) and faster-whisper (port 8880) running."
|
||||
)
|
||||
parser.add_argument("--tier", choices=["light", "medium", "complex"])
|
||||
parser.add_argument("--category")
|
||||
parser.add_argument("--ids", help="Comma-separated IDs")
|
||||
parser.add_argument("--dry-run", action="store_true",
|
||||
help="Complex queries use medium model for inference (no API cost)")
|
||||
parser.add_argument("--save-audio", action="store_true",
|
||||
help="Save synthesized WAV files to voice_audio/ directory")
|
||||
parser.add_argument("--skip-gpu-check", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
if not preflight_checks(skip_gpu_check=args.skip_gpu_check):
|
||||
sys.exit(1)
|
||||
|
||||
queries = load_dataset()
|
||||
ids = [int(i) for i in args.ids.split(",")] if args.ids else None
|
||||
queries = filter_queries(queries, args.tier, args.category, ids)
|
||||
if not queries:
|
||||
print("No queries match filters.")
|
||||
sys.exit(1)
|
||||
|
||||
asyncio.run(run(queries, dry_run=args.dry_run, save_audio=args.save_audio))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
75
bifrost-config.json
Normal file
75
bifrost-config.json
Normal file
@@ -0,0 +1,75 @@
|
||||
{
|
||||
"auth_config": {
|
||||
"is_enabled": true,
|
||||
"admin_username": "admin",
|
||||
"admin_password": "env.BIFROST_ADMIN_PASSWORD"
|
||||
},
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "bifrost-db",
|
||||
"port": "5432",
|
||||
"user": "bifrost",
|
||||
"password": "bifrost",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "disable"
|
||||
}
|
||||
},
|
||||
"client": {
|
||||
"drop_excess_requests": false
|
||||
},
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "ollama-gpu",
|
||||
"value": "dummy",
|
||||
"models": [
|
||||
"qwen2.5:0.5b",
|
||||
"qwen2.5:1.5b",
|
||||
"qwen3:4b",
|
||||
"gemma3:4b",
|
||||
"qwen3:8b"
|
||||
],
|
||||
"weight": 1.0
|
||||
}
|
||||
],
|
||||
"network_config": {
|
||||
"base_url": "http://host.docker.internal:11436",
|
||||
"default_request_timeout_in_seconds": 300,
|
||||
"max_retries": 2,
|
||||
"retry_backoff_initial_ms": 500,
|
||||
"retry_backoff_max_ms": 10000
|
||||
}
|
||||
},
|
||||
"ollama-cpu": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "ollama-cpu-key",
|
||||
"value": "dummy",
|
||||
"models": [
|
||||
"gemma3:1b",
|
||||
"qwen2.5:1.5b",
|
||||
"qwen2.5:3b"
|
||||
],
|
||||
"weight": 1.0
|
||||
}
|
||||
],
|
||||
"network_config": {
|
||||
"base_url": "http://host.docker.internal:11435",
|
||||
"default_request_timeout_in_seconds": 120,
|
||||
"max_retries": 2,
|
||||
"retry_backoff_initial_ms": 500,
|
||||
"retry_backoff_max_ms": 10000
|
||||
},
|
||||
"custom_provider_config": {
|
||||
"base_provider_type": "openai",
|
||||
"allowed_requests": {
|
||||
"chat_completion": true,
|
||||
"chat_completion_stream": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
17
channels.py
17
channels.py
@@ -49,6 +49,7 @@ async def deliver(session_id: str, channel: str, text: str) -> None:
|
||||
# ── built-in channel adapters ─────────────────────────────────────────────────
|
||||
|
||||
GRAMMY_URL = os.getenv("GRAMMY_URL", "http://grammy:3001")
|
||||
MATRIX_URL = os.getenv("MATRIX_URL", "http://matrix:3002")
|
||||
|
||||
|
||||
async def _telegram_send(session_id: str, text: str) -> None:
|
||||
@@ -64,12 +65,26 @@ async def _telegram_send(session_id: str, text: str) -> None:
|
||||
)
|
||||
|
||||
|
||||
async def _matrix_send(session_id: str, text: str) -> None:
|
||||
"""Send reply to Matrix via the matrix adapter POST /send endpoint."""
|
||||
room_id = session_id.removeprefix("mx-")
|
||||
MAX_MTX = 4000
|
||||
chunks = [text[i:i + MAX_MTX] for i in range(0, len(text), MAX_MTX)]
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
for chunk in chunks:
|
||||
await client.post(
|
||||
f"{MATRIX_URL}/send",
|
||||
json={"room_id": room_id, "text": chunk},
|
||||
)
|
||||
|
||||
|
||||
async def _cli_send(session_id: str, text: str) -> None:
|
||||
"""CLI replies are handled entirely through the pending_replies queue — no-op here."""
|
||||
pass
|
||||
|
||||
|
||||
def register_defaults() -> None:
|
||||
"""Register the built-in Telegram and CLI channel adapters."""
|
||||
"""Register the built-in Telegram, Matrix, and CLI channel adapters."""
|
||||
register("telegram", _telegram_send)
|
||||
register("matrix", _matrix_send)
|
||||
register("cli", _cli_send)
|
||||
|
||||
57
cli.py
57
cli.py
@@ -1,9 +1,9 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf CLI — interactive REPL for the multi-channel gateway.
|
||||
Adolf CLI — interactive REPL with Rich streaming display.
|
||||
|
||||
Usage:
|
||||
python3 cli.py [--url http://localhost:8000] [--session cli-alvis]
|
||||
python3 cli.py [--url http://deepagents:8000] [--session cli-alvis]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
@@ -12,7 +12,13 @@ import os
|
||||
import sys
|
||||
import urllib.request
|
||||
|
||||
GATEWAY = "http://localhost:8000"
|
||||
from rich.console import Console
|
||||
from rich.live import Live
|
||||
from rich.markdown import Markdown
|
||||
from rich.text import Text
|
||||
|
||||
GATEWAY = "http://deepagents:8000"
|
||||
console = Console()
|
||||
|
||||
|
||||
def post_message(gateway: str, text: str, session_id: str) -> None:
|
||||
@@ -20,7 +26,7 @@ def post_message(gateway: str, text: str, session_id: str) -> None:
|
||||
"text": text,
|
||||
"session_id": session_id,
|
||||
"channel": "cli",
|
||||
"user_id": os.getlogin(),
|
||||
"user_id": os.getenv("USER", "user"),
|
||||
}).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{gateway}/message",
|
||||
@@ -30,33 +36,49 @@ def post_message(gateway: str, text: str, session_id: str) -> None:
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=10) as r:
|
||||
if r.status != 202:
|
||||
print(f"[error] gateway returned {r.status}", file=sys.stderr)
|
||||
console.print(f"[red][error] gateway returned {r.status}[/red]")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def wait_for_reply(gateway: str, session_id: str, timeout: int = 400) -> str:
|
||||
"""Open SSE stream and return first data event."""
|
||||
def stream_reply(gateway: str, session_id: str, timeout: int = 400) -> str:
|
||||
"""
|
||||
Open the /stream/{session_id} SSE endpoint and display tokens live with
|
||||
Rich. Returns the full assembled reply text.
|
||||
"""
|
||||
req = urllib.request.Request(
|
||||
f"{gateway}/reply/{session_id}",
|
||||
f"{gateway}/stream/{session_id}",
|
||||
headers={"Accept": "text/event-stream"},
|
||||
)
|
||||
buffer = ""
|
||||
with urllib.request.urlopen(req, timeout=timeout + 5) as r:
|
||||
with Live(Text(""), console=console, refresh_per_second=20, transient=True) as live:
|
||||
for raw_line in r:
|
||||
line = raw_line.decode("utf-8").rstrip("\n")
|
||||
if line.startswith("data:"):
|
||||
return line[5:].strip().replace("\\n", "\n")
|
||||
return ""
|
||||
if not line.startswith("data:"):
|
||||
continue
|
||||
chunk = line[5:].strip()
|
||||
if chunk == "[DONE]":
|
||||
break
|
||||
chunk = chunk.replace("\\n", "\n")
|
||||
buffer += chunk
|
||||
live.update(Text(buffer))
|
||||
|
||||
# Render the complete reply as Markdown once streaming is done
|
||||
console.print(Markdown(buffer))
|
||||
return buffer
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Adolf CLI")
|
||||
parser.add_argument("--url", default=GATEWAY, help="Gateway URL")
|
||||
parser.add_argument("--session", default=f"cli-{os.getlogin()}", help="Session ID")
|
||||
parser.add_argument("--session", default=f"cli-{os.getenv('USER', 'user')}",
|
||||
help="Session ID")
|
||||
parser.add_argument("--timeout", type=int, default=400, help="Reply timeout (seconds)")
|
||||
args = parser.parse_args()
|
||||
|
||||
print(f"Adolf CLI (session={args.session}, gateway={args.url})")
|
||||
print("Type your message and press Enter. Ctrl+C or Ctrl+D to exit.\n")
|
||||
console.print(f"[bold]Adolf CLI[/bold] (session=[cyan]{args.session}[/cyan], "
|
||||
f"gateway=[cyan]{args.url}[/cyan])")
|
||||
console.print("Type your message and press Enter. Ctrl+C or Ctrl+D to exit.\n")
|
||||
|
||||
try:
|
||||
while True:
|
||||
@@ -68,12 +90,11 @@ def main():
|
||||
continue
|
||||
|
||||
post_message(args.url, text, args.session)
|
||||
print("...", end="", flush=True)
|
||||
reply = wait_for_reply(args.url, args.session, timeout=args.timeout)
|
||||
print(f"\r{reply}\n")
|
||||
stream_reply(args.url, args.session, timeout=args.timeout)
|
||||
console.print()
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\nbye")
|
||||
console.print("\n[dim]bye[/dim]")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -6,19 +6,29 @@ services:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
# LiteLLM proxy — all LLM inference goes through here
|
||||
- LITELLM_URL=http://host.docker.internal:4000/v1
|
||||
- LITELLM_API_KEY=sk-fjQC1BxAiGFSMs
|
||||
# Direct Ollama GPU URL — used only by VRAMManager for flush/prewarm
|
||||
- OLLAMA_BASE_URL=http://host.docker.internal:11436
|
||||
- DEEPAGENTS_MODEL=qwen3:4b
|
||||
- DEEPAGENTS_COMPLEX_MODEL=qwen3:8b
|
||||
- DEEPAGENTS_COMPLEX_MODEL=deepseek/deepseek-r1:free
|
||||
- DEEPAGENTS_ROUTER_MODEL=qwen2.5:1.5b
|
||||
- SEARXNG_URL=http://host.docker.internal:11437
|
||||
- GRAMMY_URL=http://grammy:3001
|
||||
- MATRIX_URL=http://host.docker.internal:3002
|
||||
- CRAWL4AI_URL=http://crawl4ai:11235
|
||||
- ROUTECHECK_URL=http://routecheck:8090
|
||||
- ROUTECHECK_TOKEN=${ROUTECHECK_TOKEN}
|
||||
volumes:
|
||||
- ./logs:/app/logs
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
depends_on:
|
||||
- openmemory
|
||||
- grammy
|
||||
- crawl4ai
|
||||
- routecheck
|
||||
restart: unless-stopped
|
||||
|
||||
openmemory:
|
||||
@@ -27,8 +37,9 @@ services:
|
||||
ports:
|
||||
- "8765:8765"
|
||||
environment:
|
||||
# Extraction LLM (qwen2.5:1.5b) runs on GPU after reply — fast 2-5s extraction
|
||||
# Extraction LLM runs on GPU — qwen2.5:1.5b for speed (~3s)
|
||||
- OLLAMA_GPU_URL=http://host.docker.internal:11436
|
||||
- OLLAMA_EXTRACTION_MODEL=qwen2.5:1.5b
|
||||
# Embedding (nomic-embed-text) runs on CPU — fast enough for search (50-150ms)
|
||||
- OLLAMA_CPU_URL=http://host.docker.internal:11435
|
||||
extra_hosts:
|
||||
@@ -45,6 +56,33 @@ services:
|
||||
- DEEPAGENTS_URL=http://deepagents:8000
|
||||
restart: unless-stopped
|
||||
|
||||
cli:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile.cli
|
||||
container_name: cli
|
||||
environment:
|
||||
- DEEPAGENTS_URL=http://deepagents:8000
|
||||
depends_on:
|
||||
- deepagents
|
||||
stdin_open: true
|
||||
tty: true
|
||||
profiles:
|
||||
- tools
|
||||
|
||||
routecheck:
|
||||
build: ./routecheck
|
||||
container_name: routecheck
|
||||
ports:
|
||||
- "8090:8090"
|
||||
environment:
|
||||
- YANDEX_ROUTING_KEY=${YANDEX_ROUTING_KEY}
|
||||
- INTERNAL_TOKEN=${ROUTECHECK_TOKEN}
|
||||
- HTTPS_PROXY=http://host.docker.internal:56928
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
restart: unless-stopped
|
||||
|
||||
crawl4ai:
|
||||
image: unclecode/crawl4ai:latest
|
||||
container_name: crawl4ai
|
||||
|
||||
188
fast_tools.py
Normal file
188
fast_tools.py
Normal file
@@ -0,0 +1,188 @@
|
||||
"""
|
||||
Fast Tools — pre-flight tools invoked by a classifier before the main LLM call.
|
||||
|
||||
Each FastTool has:
|
||||
- matches(message) → bool : regex classifier that determines if this tool applies
|
||||
- run(message) → str : async fetch that returns enrichment context
|
||||
|
||||
FastToolRunner holds a list of FastTools. The Router uses any_matches() to force
|
||||
the tier to medium before LLM classification. run_agent_task() calls run_matching()
|
||||
to build extra context that is injected into the system prompt.
|
||||
|
||||
To add a new fast tool:
|
||||
1. Subclass FastTool, implement name/matches/run
|
||||
2. Add an instance to the list passed to FastToolRunner in agent.py
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import re
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
class FastTool(ABC):
|
||||
"""Base class for all pre-flight fast tools."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def name(self) -> str: ...
|
||||
|
||||
@abstractmethod
|
||||
def matches(self, message: str) -> bool: ...
|
||||
|
||||
@abstractmethod
|
||||
async def run(self, message: str) -> str: ...
|
||||
|
||||
|
||||
_WMO_CODES = {
|
||||
0: "clear sky", 1: "mainly clear", 2: "partly cloudy", 3: "overcast",
|
||||
45: "fog", 48: "icy fog",
|
||||
51: "light drizzle", 53: "drizzle", 55: "heavy drizzle",
|
||||
61: "light rain", 63: "rain", 65: "heavy rain",
|
||||
71: "light snow", 73: "snow", 75: "heavy snow", 77: "snow grains",
|
||||
80: "light showers", 81: "showers", 82: "heavy showers",
|
||||
85: "snow showers", 86: "heavy snow showers",
|
||||
95: "thunderstorm", 96: "thunderstorm with hail", 99: "thunderstorm with heavy hail",
|
||||
}
|
||||
|
||||
|
||||
class WeatherTool(FastTool):
|
||||
"""
|
||||
Fetches current weather for Balashikha, Moscow region directly from open-meteo.com.
|
||||
No API key required. Returns a ready-to-deliver reply — no LLM reformatting needed.
|
||||
"""
|
||||
|
||||
_PATTERN = re.compile(
|
||||
r"\b(weather|forecast|temperature|rain(ing)?|snow(ing)?|humidity|wind\s*speed"
|
||||
r"|холодно|тепло|погода|прогноз погоды"
|
||||
r"|how (hot|cold|warm) is it|what.?s the (weather|temp)|dress for the weather)\b",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
_URL = (
|
||||
"https://api.open-meteo.com/v1/forecast"
|
||||
"?latitude=55.7963&longitude=37.9382"
|
||||
"¤t=temperature_2m,apparent_temperature,relative_humidity_2m"
|
||||
",wind_speed_10m,weather_code"
|
||||
"&wind_speed_unit=ms"
|
||||
)
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return "weather"
|
||||
|
||||
def matches(self, message: str) -> bool:
|
||||
return bool(self._PATTERN.search(message))
|
||||
|
||||
async def run(self, message: str) -> str:
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
r = await client.get(self._URL)
|
||||
r.raise_for_status()
|
||||
c = r.json()["current"]
|
||||
except Exception as e:
|
||||
return f"[weather error: {e}]"
|
||||
|
||||
temp = c["temperature_2m"]
|
||||
feels = c["apparent_temperature"]
|
||||
humidity = c["relative_humidity_2m"]
|
||||
wind = c["wind_speed_10m"]
|
||||
condition = _WMO_CODES.get(c.get("weather_code", 0), "unknown")
|
||||
|
||||
return (
|
||||
f"Balashikha: {condition}, {temp:+.0f}°C (feels like {feels:+.0f}°C), "
|
||||
f"wind {wind:.1f} m/s, humidity {humidity}%."
|
||||
)
|
||||
|
||||
|
||||
class CommuteTool(FastTool):
|
||||
"""
|
||||
Returns real-time driving time from home (Balashikha) to a destination
|
||||
using Yandex traffic data via the local routecheck service.
|
||||
|
||||
Triggered by queries about commute time, arrival, or road traffic.
|
||||
The routecheck service handles Yandex API auth and the HTTPS proxy.
|
||||
"""
|
||||
|
||||
_PATTERN = re.compile(
|
||||
r"\b(commute|arrival time|how long.{0,20}(drive|get|travel|reach)"
|
||||
r"|сколько.{0,20}(ехать|добираться|минут)"
|
||||
r"|пробки|traffic|road.{0,10}now|drive to (work|office|center|москва|moscow)"
|
||||
r"|when (will i|do i) (arrive|get there|reach))\b",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Home: Balashikha. Default destination: Moscow city center.
|
||||
_HOME = "55.7963,37.9382"
|
||||
_DEST = "55.7558,37.6173"
|
||||
|
||||
def __init__(self, routecheck_url: str, internal_token: str):
|
||||
self._url = routecheck_url.rstrip("/")
|
||||
self._token = internal_token
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return "commute"
|
||||
|
||||
def matches(self, message: str) -> bool:
|
||||
return bool(self._PATTERN.search(message))
|
||||
|
||||
async def run(self, message: str) -> str:
|
||||
if not self._token:
|
||||
return "[commute: ROUTECHECK_TOKEN not configured]"
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=15) as client:
|
||||
r = await client.get(
|
||||
f"{self._url}/api/route",
|
||||
params={"from": self._HOME, "to": self._DEST, "token": self._token},
|
||||
)
|
||||
r.raise_for_status()
|
||||
d = r.json()
|
||||
except Exception as e:
|
||||
return f"[commute error: {e}]"
|
||||
|
||||
traffic = d["duration_traffic_min"]
|
||||
normal = d["duration_min"]
|
||||
dist = d["distance_km"]
|
||||
delay = traffic - normal
|
||||
|
||||
lines = [
|
||||
f"Current drive time from Balashikha to Moscow center:",
|
||||
f" With traffic: {traffic} min",
|
||||
f" Without traffic: {normal} min",
|
||||
f" Distance: {dist} km",
|
||||
]
|
||||
if delay > 5:
|
||||
lines.append(f" Traffic delay: +{delay} min")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
class FastToolRunner:
|
||||
"""
|
||||
Classifier + executor for fast tools.
|
||||
|
||||
Used in two places:
|
||||
- Router.route(): any_matches() forces medium tier before LLM classification
|
||||
- run_agent_task(): run_matching() builds enrichment context in the pre-flight gather
|
||||
"""
|
||||
|
||||
def __init__(self, tools: list[FastTool]):
|
||||
self._tools = tools
|
||||
|
||||
def any_matches(self, message: str) -> bool:
|
||||
"""True if any fast tool applies to this message."""
|
||||
return any(t.matches(message) for t in self._tools)
|
||||
|
||||
def matching_names(self, message: str) -> list[str]:
|
||||
"""Names of tools that match this message (for logging)."""
|
||||
return [t.name for t in self._tools if t.matches(message)]
|
||||
|
||||
async def run_matching(self, message: str) -> str:
|
||||
"""Run all matching tools concurrently and return combined context."""
|
||||
matching = [t for t in self._tools if t.matches(message)]
|
||||
if not matching:
|
||||
return ""
|
||||
results = await asyncio.gather(*[t.run(message) for t in matching])
|
||||
parts = [r for r in results if r and not r.startswith("[")]
|
||||
return "\n\n".join(parts)
|
||||
26
openmemory/CLAUDE.md
Normal file
26
openmemory/CLAUDE.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# openmemory
|
||||
|
||||
FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.
|
||||
|
||||
## Tools exposed (MCP)
|
||||
|
||||
- `add_memory(text, user_id)` — extract facts from a conversation turn and store in Qdrant
|
||||
- `search_memory(query, user_id)` — semantic search, returns results with score ≥ 0.5
|
||||
- `get_all_memories(user_id)` — dump all stored memories for a session
|
||||
|
||||
These are called directly by `agent.py` (outside the agent loop), never exposed to the LLM as tools.
|
||||
|
||||
## Two Ollama instances
|
||||
|
||||
- **GPU** (`OLLAMA_GPU_URL`, port 11436) — extraction model (`qwen2.5:1.5b`): pulls facts from conversation text
|
||||
- **CPU** (`OLLAMA_CPU_URL`, port 11435) — embedding model (`nomic-embed-text`): 50–150 ms per query
|
||||
|
||||
## Prompts
|
||||
|
||||
Custom `EXTRACTION_PROMPT` starts with `/no_think` to suppress qwen3 chain-of-thought and get clean JSON output. Custom `UPDATE_MEMORY_PROMPT` handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.
|
||||
|
||||
## Notes
|
||||
|
||||
- Qdrant collection is created automatically on first use
|
||||
- Memory is keyed by `user_id` which equals `session_id` in `agent.py`
|
||||
- Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before `_store_memory()` is scheduled
|
||||
@@ -6,6 +6,7 @@ from mem0 import Memory
|
||||
# Extraction LLM — GPU Ollama (qwen3:4b, same model as medium agent)
|
||||
# Runs after reply when GPU is idle; spin-wait in agent.py prevents contention
|
||||
OLLAMA_GPU_URL = os.getenv("OLLAMA_GPU_URL", "http://host.docker.internal:11436")
|
||||
EXTRACTION_MODEL = os.getenv("OLLAMA_EXTRACTION_MODEL", "qwen2.5:1.5b")
|
||||
|
||||
# Embedding — CPU Ollama (nomic-embed-text, 137 MB RAM)
|
||||
# Used for both search (50-150ms, acceptable) and store-time embedding
|
||||
@@ -94,7 +95,7 @@ config = {
|
||||
"llm": {
|
||||
"provider": "ollama",
|
||||
"config": {
|
||||
"model": "qwen3:4b",
|
||||
"model": EXTRACTION_MODEL,
|
||||
"ollama_base_url": OLLAMA_GPU_URL,
|
||||
"temperature": 0.1, # consistent JSON output
|
||||
},
|
||||
|
||||
4
pytest.ini
Normal file
4
pytest.ini
Normal file
@@ -0,0 +1,4 @@
|
||||
[pytest]
|
||||
testpaths = tests/unit
|
||||
pythonpath = .
|
||||
asyncio_mode = auto
|
||||
25
routecheck/CLAUDE.md
Normal file
25
routecheck/CLAUDE.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# routecheck
|
||||
|
||||
FastAPI service providing a Yandex Routing API proxy behind an image captcha.
|
||||
|
||||
## Purpose
|
||||
|
||||
Yandex Routing API free tier requires a website that uses the API. This service is that website.
|
||||
It also exposes an internal endpoint (`/api/route`) used by `CommuteTool` in `fast_tools.py`.
|
||||
|
||||
## Two access paths
|
||||
|
||||
- **Web UI** (`/`): solve PIL arithmetic captcha → get a token → query any two lat/lon points
|
||||
- **Internal API**: `GET /api/route?from=lat,lon&to=lat,lon&token=$ROUTECHECK_TOKEN` — no captcha
|
||||
|
||||
## Key env vars
|
||||
|
||||
- `YANDEX_ROUTING_KEY` — from developer.tech.yandex.ru, Router API, free tier
|
||||
- `INTERNAL_TOKEN` — equals `ROUTECHECK_TOKEN` from root `.env`; shared with deepagents
|
||||
- `HTTPS_PROXY` — set to `http://host.docker.internal:56928`; container has no direct external internet
|
||||
|
||||
## Notes
|
||||
|
||||
- Captchas expire after 5 min, route tokens after 1 hour, both stored in-memory (restart clears them)
|
||||
- Yandex API expects `lon,lat` order (not `lat,lon`) — `app.py` swaps automatically
|
||||
- Captcha image endpoint: `GET /captcha/image/{id}` — regenerates on each call with random noise
|
||||
6
routecheck/Dockerfile
Normal file
6
routecheck/Dockerfile
Normal file
@@ -0,0 +1,6 @@
|
||||
FROM python:3.12-slim
|
||||
WORKDIR /app
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends fonts-dejavu-core && rm -rf /var/lib/apt/lists/*
|
||||
RUN pip install --no-cache-dir fastapi uvicorn pillow httpx
|
||||
COPY app.py .
|
||||
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8090"]
|
||||
377
routecheck/app.py
Normal file
377
routecheck/app.py
Normal file
@@ -0,0 +1,377 @@
|
||||
"""
|
||||
RouteCheck — local routing web service with image captcha.
|
||||
|
||||
Endpoints:
|
||||
GET / — web UI
|
||||
GET /captcha/image/{id} — PNG captcha image
|
||||
POST /api/captcha/new — create captcha, return {id}
|
||||
POST /api/captcha/solve — {id, answer} → {token} or 400
|
||||
GET /api/route — ?from=lat,lon&to=lat,lon&token=...
|
||||
token = solved captcha token OR INTERNAL_TOKEN env var
|
||||
"""
|
||||
|
||||
import io
|
||||
import math
|
||||
import os
|
||||
import random
|
||||
import string
|
||||
import time
|
||||
import uuid
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from fastapi import FastAPI, HTTPException, Query
|
||||
from fastapi.responses import HTMLResponse, JSONResponse, StreamingResponse
|
||||
from PIL import Image, ImageDraw, ImageFilter, ImageFont
|
||||
from pydantic import BaseModel
|
||||
|
||||
app = FastAPI(title="RouteCheck")
|
||||
|
||||
# ── Config ─────────────────────────────────────────────────────────────────────
|
||||
YANDEX_KEY = os.getenv("YANDEX_ROUTING_KEY", "")
|
||||
INTERNAL_TOKEN = os.getenv("INTERNAL_TOKEN", "")
|
||||
HTTPS_PROXY = os.getenv("HTTPS_PROXY", "")
|
||||
CAPTCHA_TTL = 300 # seconds a captcha is valid
|
||||
TOKEN_TTL = 3600 # seconds a solved token is valid
|
||||
|
||||
# ── In-memory captcha store ────────────────────────────────────────────────────
|
||||
_captchas: dict[str, dict] = {} # id → {answer, token, expires}
|
||||
_tokens: dict[str, float] = {} # token → expires
|
||||
|
||||
|
||||
def _purge():
|
||||
now = time.time()
|
||||
for k in list(_captchas.keys()):
|
||||
if _captchas[k]["expires"] < now:
|
||||
del _captchas[k]
|
||||
for k in list(_tokens.keys()):
|
||||
if _tokens[k] < now:
|
||||
del _tokens[k]
|
||||
|
||||
|
||||
# ── Captcha image generation ───────────────────────────────────────────────────
|
||||
|
||||
def _rand_color(dark=False):
|
||||
if dark:
|
||||
return tuple(random.randint(0, 100) for _ in range(3))
|
||||
return tuple(random.randint(140, 255) for _ in range(3))
|
||||
|
||||
|
||||
def _make_captcha_image(text: str) -> bytes:
|
||||
W, H = 220, 80
|
||||
img = Image.new("RGB", (W, H), color=_rand_color())
|
||||
draw = ImageDraw.Draw(img)
|
||||
|
||||
# Background noise: random lines
|
||||
for _ in range(8):
|
||||
x1, y1 = random.randint(0, W), random.randint(0, H)
|
||||
x2, y2 = random.randint(0, W), random.randint(0, H)
|
||||
draw.line([(x1, y1), (x2, y2)], fill=_rand_color(dark=True), width=2)
|
||||
|
||||
# Background noise: random dots
|
||||
for _ in range(300):
|
||||
x, y = random.randint(0, W), random.randint(0, H)
|
||||
draw.point((x, y), fill=_rand_color(dark=True))
|
||||
|
||||
# Draw each character with slight random offset and rotation
|
||||
try:
|
||||
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 36)
|
||||
except Exception:
|
||||
font = ImageFont.load_default()
|
||||
|
||||
char_w = W // (len(text) + 2)
|
||||
for i, ch in enumerate(text):
|
||||
x = char_w + i * char_w + random.randint(-4, 4)
|
||||
y = (H - 40) // 2 + random.randint(-6, 6)
|
||||
# Draw shadow
|
||||
draw.text((x + 2, y + 2), ch, font=font, fill=_rand_color(dark=True))
|
||||
draw.text((x, y), ch, font=font, fill=_rand_color(dark=True))
|
||||
|
||||
# Wavy distortion via pixel manipulation
|
||||
pixels = img.load()
|
||||
for x in range(W):
|
||||
shift = int(4 * math.sin(x / 15.0))
|
||||
col = [pixels[x, y] for y in range(H)]
|
||||
for y in range(H):
|
||||
pixels[x, y] = col[(y - shift) % H]
|
||||
|
||||
img = img.filter(ImageFilter.SMOOTH)
|
||||
|
||||
buf = io.BytesIO()
|
||||
img.save(buf, format="PNG")
|
||||
return buf.getvalue()
|
||||
|
||||
|
||||
def _generate_problem() -> tuple[str, int]:
|
||||
"""Return (display_text, answer)."""
|
||||
ops = [
|
||||
lambda a, b: (f"{a} + {b} = ?", a + b),
|
||||
lambda a, b: (f"{a} × {b} = ?", a * b),
|
||||
lambda a, b: (f"{max(a,b)} − {min(a,b)} = ?", max(a, b) - min(a, b)),
|
||||
]
|
||||
op = random.choice(ops)
|
||||
a, b = random.randint(2, 9), random.randint(2, 9)
|
||||
text, answer = op(a, b)
|
||||
return text, answer
|
||||
|
||||
|
||||
# ── Routes ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
@app.get("/", response_class=HTMLResponse)
|
||||
async def index():
|
||||
return HTML_PAGE
|
||||
|
||||
|
||||
@app.get("/captcha/image/{captcha_id}")
|
||||
async def captcha_image(captcha_id: str):
|
||||
_purge()
|
||||
entry = _captchas.get(captcha_id)
|
||||
if not entry:
|
||||
raise HTTPException(404, "Captcha not found or expired")
|
||||
png = _make_captcha_image(entry["problem"])
|
||||
return StreamingResponse(io.BytesIO(png), media_type="image/png",
|
||||
headers={"Cache-Control": "no-store"})
|
||||
|
||||
|
||||
class CaptchaNewResponse(BaseModel):
|
||||
id: str
|
||||
|
||||
|
||||
@app.post("/api/captcha/new")
|
||||
async def captcha_new():
|
||||
_purge()
|
||||
problem_text, answer = _generate_problem()
|
||||
cid = str(uuid.uuid4())
|
||||
_captchas[cid] = {
|
||||
"problem": problem_text,
|
||||
"answer": answer,
|
||||
"expires": time.time() + CAPTCHA_TTL,
|
||||
}
|
||||
return {"id": cid}
|
||||
|
||||
|
||||
class SolveRequest(BaseModel):
|
||||
id: str
|
||||
answer: int
|
||||
|
||||
|
||||
@app.post("/api/captcha/solve")
|
||||
async def captcha_solve(req: SolveRequest):
|
||||
_purge()
|
||||
entry = _captchas.get(req.id)
|
||||
if not entry:
|
||||
raise HTTPException(400, "Captcha expired or not found")
|
||||
if entry["answer"] != req.answer:
|
||||
raise HTTPException(400, "Wrong answer")
|
||||
token = str(uuid.uuid4())
|
||||
_tokens[token] = time.time() + TOKEN_TTL
|
||||
del _captchas[req.id]
|
||||
return {"token": token}
|
||||
|
||||
|
||||
@app.get("/api/route")
|
||||
async def route(
|
||||
from_coords: str = Query(..., alias="from", description="lat,lon"),
|
||||
to_coords: str = Query(..., alias="to", description="lat,lon"),
|
||||
token: str = Query(...),
|
||||
):
|
||||
_purge()
|
||||
|
||||
# Auth: internal service token or valid captcha token
|
||||
if token != INTERNAL_TOKEN:
|
||||
if token not in _tokens:
|
||||
raise HTTPException(401, "Invalid or expired token — solve captcha first")
|
||||
|
||||
if not YANDEX_KEY:
|
||||
raise HTTPException(503, "YANDEX_ROUTING_KEY not configured")
|
||||
|
||||
# Parse coords
|
||||
try:
|
||||
from_lat, from_lon = map(float, from_coords.split(","))
|
||||
to_lat, to_lon = map(float, to_coords.split(","))
|
||||
except ValueError:
|
||||
raise HTTPException(400, "coords must be lat,lon")
|
||||
|
||||
# Yandex Routing API expects lon,lat order
|
||||
waypoints = f"{from_lon},{from_lat}|{to_lon},{to_lat}"
|
||||
|
||||
transport = httpx.AsyncHTTPTransport(proxy=HTTPS_PROXY) if HTTPS_PROXY else None
|
||||
async with httpx.AsyncClient(timeout=15, transport=transport) as client:
|
||||
try:
|
||||
r = await client.get(
|
||||
"https://api.routing.yandex.net/v2/route",
|
||||
params={"apikey": YANDEX_KEY, "waypoints": waypoints, "mode": "driving"},
|
||||
)
|
||||
except Exception as e:
|
||||
raise HTTPException(502, f"Yandex API unreachable: {e}")
|
||||
|
||||
if r.status_code != 200:
|
||||
raise HTTPException(502, f"Yandex API error {r.status_code}: {r.text[:200]}")
|
||||
|
||||
data = r.json()
|
||||
try:
|
||||
leg = data["route"]["legs"][0]
|
||||
duration_s = leg["duration"]
|
||||
duration_traffic_s = leg.get("duration_in_traffic", duration_s)
|
||||
distance_m = leg["distance"]
|
||||
except (KeyError, IndexError) as e:
|
||||
raise HTTPException(502, f"Unexpected Yandex response: {e} — {str(data)[:200]}")
|
||||
|
||||
return {
|
||||
"duration_min": round(duration_s / 60),
|
||||
"duration_traffic_min": round(duration_traffic_s / 60),
|
||||
"distance_km": round(distance_m / 1000, 1),
|
||||
}
|
||||
|
||||
|
||||
# ── HTML ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
HTML_PAGE = """<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>RouteCheck</title>
|
||||
<style>
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: system-ui, sans-serif; background: #0f172a; color: #e2e8f0; min-height: 100vh;
|
||||
display: flex; align-items: center; justify-content: center; }
|
||||
.card { background: #1e293b; border-radius: 12px; padding: 2rem; width: 420px;
|
||||
box-shadow: 0 20px 60px rgba(0,0,0,.5); }
|
||||
h1 { font-size: 1.4rem; font-weight: 700; color: #38bdf8; margin-bottom: .3rem; }
|
||||
.sub { color: #94a3b8; font-size: .85rem; margin-bottom: 1.5rem; }
|
||||
label { display: block; font-size: .8rem; color: #94a3b8; margin-bottom: .3rem; margin-top: 1rem; }
|
||||
input { width: 100%; background: #0f172a; border: 1px solid #334155; border-radius: 6px;
|
||||
color: #e2e8f0; padding: .55rem .75rem; font-size: .95rem; outline: none; }
|
||||
input:focus { border-color: #38bdf8; }
|
||||
button { width: 100%; margin-top: 1.2rem; padding: .7rem; background: #0ea5e9;
|
||||
border: none; border-radius: 6px; color: #fff; font-size: 1rem;
|
||||
font-weight: 600; cursor: pointer; transition: background .2s; }
|
||||
button:hover { background: #0284c7; }
|
||||
button:disabled { background: #334155; cursor: default; }
|
||||
.captcha-row { display: flex; gap: .75rem; align-items: center; margin-top: 1rem; }
|
||||
.captcha-row img { border-radius: 6px; border: 1px solid #334155; cursor: pointer; }
|
||||
.captcha-row input { flex: 1; }
|
||||
.result { margin-top: 1.2rem; background: #0f172a; border-radius: 8px; padding: 1rem;
|
||||
border-left: 3px solid #38bdf8; display: none; }
|
||||
.result .big { font-size: 1.6rem; font-weight: 700; color: #38bdf8; }
|
||||
.result .label { font-size: .8rem; color: #64748b; margin-top: .2rem; }
|
||||
.result .row { display: flex; gap: 1.5rem; margin-top: .8rem; }
|
||||
.result .metric { flex: 1; }
|
||||
.result .metric .val { font-size: 1.1rem; font-weight: 600; }
|
||||
.error { color: #f87171; margin-top: .8rem; font-size: .85rem; display: none; }
|
||||
.step { display: none; }
|
||||
.step.active { display: block; }
|
||||
a.refresh { font-size: .75rem; color: #38bdf8; text-decoration: none; display: block;
|
||||
margin-top: .4rem; }
|
||||
a.refresh:hover { text-decoration: underline; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="card">
|
||||
<h1>RouteCheck</h1>
|
||||
<p class="sub">Real-time driving time with Yandex traffic data</p>
|
||||
|
||||
<!-- Step 1: captcha -->
|
||||
<div class="step active" id="step-captcha">
|
||||
<label>Prove you are human</label>
|
||||
<div class="captcha-row">
|
||||
<img id="captcha-img" src="" alt="captcha" width="160" height="60"
|
||||
title="Click to refresh" onclick="loadCaptcha()">
|
||||
<input id="captcha-ans" type="number" placeholder="Answer" min="0" max="999">
|
||||
</div>
|
||||
<a class="refresh" href="#" onclick="loadCaptcha();return false;">↻ New challenge</a>
|
||||
<div class="error" id="captcha-err">Wrong answer, try again.</div>
|
||||
<button id="captcha-btn" onclick="solveCaptcha()">Verify →</button>
|
||||
</div>
|
||||
|
||||
<!-- Step 2: route query -->
|
||||
<div class="step" id="step-route">
|
||||
<label>From (lat, lon)</label>
|
||||
<input id="from" type="text" placeholder="55.7963, 37.9382" value="55.7963, 37.9382">
|
||||
<label>To (lat, lon)</label>
|
||||
<input id="to" type="text" placeholder="55.7558, 37.6173" value="55.7558, 37.6173">
|
||||
<button id="route-btn" onclick="queryRoute()">Get travel time</button>
|
||||
<div class="error" id="route-err"></div>
|
||||
<div class="result" id="result">
|
||||
<div class="big" id="res-traffic"></div>
|
||||
<div class="label">with current traffic</div>
|
||||
<div class="row">
|
||||
<div class="metric"><div class="val" id="res-normal"></div>
|
||||
<div class="label">without traffic</div></div>
|
||||
<div class="metric"><div class="val" id="res-dist"></div>
|
||||
<div class="label">distance</div></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
let captchaId = null;
|
||||
let routeToken = null;
|
||||
|
||||
async function loadCaptcha() {
|
||||
const r = await fetch('/api/captcha/new', {method: 'POST'});
|
||||
const d = await r.json();
|
||||
captchaId = d.id;
|
||||
document.getElementById('captcha-img').src = '/captcha/image/' + captchaId + '?t=' + Date.now();
|
||||
document.getElementById('captcha-ans').value = '';
|
||||
document.getElementById('captcha-err').style.display = 'none';
|
||||
}
|
||||
|
||||
async function solveCaptcha() {
|
||||
const ans = parseInt(document.getElementById('captcha-ans').value);
|
||||
if (isNaN(ans)) return;
|
||||
const btn = document.getElementById('captcha-btn');
|
||||
btn.disabled = true;
|
||||
const r = await fetch('/api/captcha/solve', {
|
||||
method: 'POST',
|
||||
headers: {'Content-Type': 'application/json'},
|
||||
body: JSON.stringify({id: captchaId, answer: ans})
|
||||
});
|
||||
if (r.ok) {
|
||||
const d = await r.json();
|
||||
routeToken = d.token;
|
||||
document.getElementById('step-captcha').classList.remove('active');
|
||||
document.getElementById('step-route').classList.add('active');
|
||||
} else {
|
||||
document.getElementById('captcha-err').style.display = 'block';
|
||||
loadCaptcha();
|
||||
}
|
||||
btn.disabled = false;
|
||||
}
|
||||
|
||||
async function queryRoute() {
|
||||
const from = document.getElementById('from').value.trim();
|
||||
const to = document.getElementById('to').value.trim();
|
||||
const btn = document.getElementById('route-btn');
|
||||
const err = document.getElementById('route-err');
|
||||
err.style.display = 'none';
|
||||
document.getElementById('result').style.display = 'none';
|
||||
btn.disabled = true;
|
||||
btn.textContent = 'Fetching…';
|
||||
const r = await fetch(`/api/route?from=${encodeURIComponent(from)}&to=${encodeURIComponent(to)}&token=${routeToken}`);
|
||||
btn.disabled = false;
|
||||
btn.textContent = 'Get travel time';
|
||||
if (!r.ok) {
|
||||
const d = await r.json();
|
||||
err.textContent = d.detail || 'Error';
|
||||
err.style.display = 'block';
|
||||
return;
|
||||
}
|
||||
const d = await r.json();
|
||||
document.getElementById('res-traffic').textContent = d.duration_traffic_min + ' min';
|
||||
document.getElementById('res-normal').textContent = d.duration_min + ' min';
|
||||
document.getElementById('res-dist').textContent = d.distance_km + ' km';
|
||||
document.getElementById('result').style.display = 'block';
|
||||
}
|
||||
|
||||
loadCaptcha();
|
||||
|
||||
document.getElementById('captcha-ans').addEventListener('keydown', e => {
|
||||
if (e.key === 'Enter') solveCaptcha();
|
||||
});
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
442
router.py
442
router.py
@@ -1,10 +1,38 @@
|
||||
import asyncio
|
||||
import re
|
||||
import math
|
||||
from typing import Optional
|
||||
from openai import AsyncOpenAI
|
||||
from langchain_core.messages import SystemMessage, HumanMessage
|
||||
from fast_tools import FastToolRunner
|
||||
|
||||
# ── Regex pre-classifier ──────────────────────────────────────────────────────
|
||||
# Catches obvious light-tier patterns before calling the LLM.
|
||||
# Keyed by regex → compiled pattern.
|
||||
# ── Regex pre-classifiers ─────────────────────────────────────────────────────
|
||||
|
||||
# Complex: keyword triggers that reliably signal deep multi-source research
|
||||
_COMPLEX_PATTERNS = re.compile(
|
||||
r"(?:^|\s)("
|
||||
r"research|investigate|deep.dive|think carefully"
|
||||
r"|write a (?:detailed|comprehensive|full|thorough|complete)"
|
||||
r"|compare all|find and (?:compare|summarize|analyze)"
|
||||
r"|in[- ]depth analysis|comprehensive guide"
|
||||
r"|detailed (?:report|analysis|comparison|breakdown|overview)"
|
||||
r"|everything about|all (?:major|available|self-hosted|open.source)"
|
||||
r"|pros and cons|with (?:sources|citations|references)"
|
||||
# Russian complex research keywords (no trailing \b — stems like подробн match подробное/подробный)
|
||||
r"|исследуй|изучи все|сравни все|найди и сравни|найди и опиши"
|
||||
r"|напиши подробн|напиши детальн|напиши полн"
|
||||
r"|подробный отчет|детальн\w+ (?:анализ|сравнение|отчет)"
|
||||
r"|подробное (?:руководство|сравнение)|полное руководство"
|
||||
r"|все варианты|все способы|все доступные|все самохостируемые|все платформы"
|
||||
r"|лучшие практики|все инструменты|все решения|все протоколы"
|
||||
r"|найди детальн|найди и кратко опиши"
|
||||
r"|изучи свежие|изучи лучши|изучи все"
|
||||
r"|сравни все\b"
|
||||
r")",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Light: trivial queries that need no tools or memory
|
||||
_LIGHT_PATTERNS = re.compile(
|
||||
r"^("
|
||||
# Greetings / farewells
|
||||
@@ -14,35 +42,301 @@ _LIGHT_PATTERNS = re.compile(
|
||||
r"|thanks?|thank you|thx|ty|ok|okay|k|cool|great|awesome|perfect|sounds good|got it|nice|sure"
|
||||
r"|how are you|how are you\?|how are you doing(\s+today)?[?!.]*"
|
||||
r"|what.?s up"
|
||||
# Calendar facts: "what day comes after X?" / "what comes after X?"
|
||||
# Calendar facts
|
||||
r"|what\s+day\s+(comes\s+after|follows|is\s+after)\s+\w+[?!.]*"
|
||||
r"|what\s+comes\s+after\s+\w+[?!.]*"
|
||||
# Acronym expansions: "what does X stand for?"
|
||||
# Acronym expansions
|
||||
r"|what\s+does\s+\w+\s+stand\s+for[?!.]*"
|
||||
# Russian greetings / farewells / acknowledgements
|
||||
r"|привет|пока|спасибо|здравствуй|здравствуйте|добрый день|добрый вечер|доброе утро"
|
||||
r"|окей|хорошо|отлично|понятно|ок|ладно|договорились|спс|благодарю"
|
||||
r"|пожалуйста|не за что|всё понятно|ясно"
|
||||
r"|как дела|как ты|как жизнь|всё хорошо|всё ок"
|
||||
r")[\s!.?]*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# ── LLM classification prompt ─────────────────────────────────────────────────
|
||||
CLASSIFY_PROMPT = """Classify the message. Output ONLY one word: light, medium, or complex.
|
||||
# ── Semantic router utterances ────────────────────────────────────────────────
|
||||
# These are embedded at startup. New messages are classified by cosine
|
||||
# similarity — whichever tier's centroid is closest wins.
|
||||
_LIGHT_UTTERANCES = [
|
||||
# General facts (English)
|
||||
"what is 2+2",
|
||||
"what is the capital of France",
|
||||
"name the three primary colors",
|
||||
"tell me a short joke",
|
||||
"is the sky blue",
|
||||
"is water wet",
|
||||
"how many days in a week",
|
||||
"what is the speed of light",
|
||||
"what is the boiling point of water",
|
||||
"spell the word beautiful",
|
||||
"what color is the ocean",
|
||||
"how many inches in a foot",
|
||||
"who wrote hamlet",
|
||||
"what is pi",
|
||||
"what year did world war two end",
|
||||
"what is the largest planet",
|
||||
"how many continents are there",
|
||||
"what does DNA stand for",
|
||||
"what language do they speak in Brazil",
|
||||
"what is the square root of 144",
|
||||
# Tech definitions — static knowledge (English)
|
||||
"what is Docker",
|
||||
"what is a VPN",
|
||||
"what is SSH",
|
||||
"what is a reverse proxy",
|
||||
"what is an API",
|
||||
"what is a firewall",
|
||||
"what is a container",
|
||||
"what is DNS",
|
||||
"what is HTTPS",
|
||||
"what is a load balancer",
|
||||
"what is Kubernetes",
|
||||
"what is Git",
|
||||
"what is a network port",
|
||||
"what is an IP address",
|
||||
"what is a subnet mask",
|
||||
"what is the OSI model",
|
||||
"how many bits in a byte",
|
||||
"how many bytes in a gigabyte",
|
||||
"what is TCP",
|
||||
"what is a REST API",
|
||||
# Russian — static facts and definitions
|
||||
"что такое IP-адрес",
|
||||
"что такое VPN",
|
||||
"что такое Docker",
|
||||
"что такое DNS",
|
||||
"что такое SSH",
|
||||
"что означает API",
|
||||
"сколько байт в гигабайте",
|
||||
"сколько бит в байте",
|
||||
"что такое Zigbee",
|
||||
"что такое Z-Wave",
|
||||
"что такое брандмауэр",
|
||||
"что такое виртуальная машина",
|
||||
"что такое обратный прокси",
|
||||
"привет",
|
||||
"пока",
|
||||
"спасибо",
|
||||
"как дела",
|
||||
"что такое Matter протокол",
|
||||
"сколько планет в солнечной системе",
|
||||
"чему равно число Пи",
|
||||
# Russian — more static definitions
|
||||
"что такое TCP/IP",
|
||||
"что такое подсеть",
|
||||
"скорость света",
|
||||
"сколько дней в году",
|
||||
"что такое Kubernetes",
|
||||
"что такое Git",
|
||||
"что такое REST API",
|
||||
"что такое TCP",
|
||||
"что такое UDP",
|
||||
"что такое VLAN",
|
||||
"сколько мегабайт в гигабайте",
|
||||
"что такое процессор",
|
||||
"что такое оперативная память",
|
||||
"что такое виртуализация",
|
||||
"что такое Linux",
|
||||
"что такое умный дом",
|
||||
"что такое Home Assistant",
|
||||
"что такое Matter",
|
||||
]
|
||||
|
||||
LIGHT = answerable from general knowledge, no internet needed:
|
||||
what is 2+2 / what is the capital of France / name the three primary colors
|
||||
tell me a short joke / is the sky blue / is water wet
|
||||
_MEDIUM_UTTERANCES = [
|
||||
# English — current data, memory, actions
|
||||
"what is the weather today",
|
||||
"what is the bitcoin price right now",
|
||||
"what are the latest news",
|
||||
"what did we talk about last time",
|
||||
"what is my name",
|
||||
"where do I live",
|
||||
"what do you know about me",
|
||||
"what did I tell you before",
|
||||
"what is the current temperature outside",
|
||||
"remind me what I said about my project",
|
||||
"search for the latest iPhone release",
|
||||
"find me a restaurant nearby",
|
||||
"turn on the lights in the living room",
|
||||
"turn off all lights",
|
||||
"set temperature to 22 degrees",
|
||||
"what is the current traffic to Moscow",
|
||||
"check if anyone is home",
|
||||
"what devices are currently on",
|
||||
"look up my public IP address",
|
||||
"show me recent news about Proxmox",
|
||||
# Russian — weather and commute
|
||||
"какая сегодня погода в Балашихе",
|
||||
"пойдет ли сегодня дождь",
|
||||
"какая температура на улице сейчас",
|
||||
"погода на завтра",
|
||||
"будет ли снег сегодня",
|
||||
"сколько ехать до Москвы сейчас",
|
||||
"какие пробки на дороге до Москвы",
|
||||
"время в пути на работу",
|
||||
"есть ли пробки сейчас",
|
||||
"стоит ли брать зонтик",
|
||||
# Russian — smart home control
|
||||
"включи свет в гостиной",
|
||||
"выключи свет на кухне",
|
||||
"какая температура дома",
|
||||
"установи температуру 22 градуса",
|
||||
"выключи все лампочки",
|
||||
"какие устройства сейчас включены",
|
||||
"включи ночной режим",
|
||||
"открой шторы в гостиной",
|
||||
"включи свет в спальне на 50 процентов",
|
||||
"выключи свет во всём доме",
|
||||
"включи вентилятор в детской",
|
||||
"закрыты ли все окна",
|
||||
"выключи телевизор",
|
||||
"какое потребление электричества сегодня",
|
||||
"включи кофемашину",
|
||||
"сколько у нас датчиков движения",
|
||||
"состояние всех дверных замков",
|
||||
"есть ли кто-нибудь дома",
|
||||
"установи будильник на 7 утра",
|
||||
# Russian — personal memory
|
||||
"как меня зовут",
|
||||
"где я живу",
|
||||
"что мы обсуждали в прошлый раз",
|
||||
"что ты знаешь о моем домашнем сервере",
|
||||
"напомни, какие сервисы я запускаю",
|
||||
"что я просил тебя запомнить",
|
||||
"что я говорил о своей сети",
|
||||
# Russian — current info lookups requiring network/tools
|
||||
"какой сейчас курс биткоина",
|
||||
"курс доллара к рублю сейчас",
|
||||
"какая последняя версия Docker",
|
||||
"как перезапустить Docker контейнер",
|
||||
"как посмотреть логи Docker контейнера",
|
||||
"какие новые функции в Home Assistant 2024",
|
||||
"есть ли проблемы у Cloudflare сегодня",
|
||||
"какие новые Zigbee устройства вышли в 2024 году",
|
||||
"найди хороший опенсорс менеджер фотографий",
|
||||
"последние новости Proxmox",
|
||||
"напиши bash команду для поиска больших файлов",
|
||||
"как вывести список всех запущенных контейнеров",
|
||||
"как проверить использование диска в Linux",
|
||||
]
|
||||
|
||||
MEDIUM = requires web search or the user's stored memories:
|
||||
current weather / today's news / Bitcoin price / what did we talk about
|
||||
what is my name / where do I live / what is my job / do I have any pets
|
||||
what do you know about me / what are my preferences / what did I tell you
|
||||
_COMPLEX_UTTERANCES = [
|
||||
# English
|
||||
"research everything about Elon Musk's recent projects and investments",
|
||||
"write a detailed report on climate change solutions with sources",
|
||||
"investigate the history and current state of quantum computing",
|
||||
"find and summarize the latest academic papers on transformer architectures",
|
||||
"analyze in depth the pros and cons of nuclear energy with citations",
|
||||
"research the background and controversies around this person",
|
||||
"compare all major cloud providers with detailed pricing and features",
|
||||
"write a comprehensive biography of this historical figure",
|
||||
"investigate what caused the 2008 financial crisis with multiple sources",
|
||||
"research the best programming languages in 2024 with detailed comparison",
|
||||
"find everything published about this medical condition and treatments",
|
||||
"do a deep dive into the latest developments in artificial general intelligence",
|
||||
"research and compare all options for starting a business in Europe",
|
||||
"investigate recent news and controversies around this company",
|
||||
"write a thorough analysis of geopolitical tensions in the Middle East",
|
||||
"find detailed information on the side effects and studies for this medication",
|
||||
"research the top 10 JavaScript frameworks with benchmarks and community data",
|
||||
"investigate who is funding AI research and what their goals are",
|
||||
"write a detailed market analysis for the electric vehicle industry",
|
||||
"research everything you can find about this startup or technology",
|
||||
# Russian — deep research
|
||||
"исследуй и сравни все варианты умного домашнего освещения",
|
||||
"напиши подробный отчет о протоколах умного дома",
|
||||
"изучи все самохостируемые медиасерверы и сравни их",
|
||||
"исследуй лучшие практики безопасности домашнего сервера",
|
||||
"сравни все системы резервного копирования для Linux",
|
||||
"напиши детальное сравнение WireGuard и OpenVPN",
|
||||
"исследуй все варианты голосового управления на русском языке",
|
||||
"изучи все опенсорс альтернативы Google сервисам",
|
||||
"напиши подробный анализ локальных языковых моделей",
|
||||
"исследуй лучшие инструменты мониторинга для домашнего сервера",
|
||||
# Russian — more deep research queries matching benchmark
|
||||
"исследуй и сравни Proxmox, Unraid и TrueNAS для домашней лаборатории",
|
||||
"напиши подробное руководство по безопасности домашнего сервера",
|
||||
"исследуй все доступные дашборды для самохостинга и сравни их",
|
||||
"найди детальные бенчмарки ARM одноплатных компьютеров для домашней лаборатории",
|
||||
"исследуй лучший стек мониторинга для самохостинга в 2024 году",
|
||||
"исследуй и сравни WireGuard, OpenVPN и Tailscale для домашней сети",
|
||||
"исследуй лучшие практики сегментации домашней сети с VLAN",
|
||||
"изучи все самохостируемые DNS решения и их возможности",
|
||||
"исследуй и сравни все платформы умного дома: Home Assistant и другие",
|
||||
"изучи лучшие Zigbee координаторы и их совместимость с Home Assistant",
|
||||
"напиши детальный отчет о поддержке протокола Matter и совместимости устройств",
|
||||
"исследуй все способы интеграции умных ламп с Home Assistant",
|
||||
"найди и сравни все варианты датчиков движения для умного дома",
|
||||
"исследуй и сравни все самохостируемые решения для хранения фотографий",
|
||||
"изучи лучшие самохостируемые медиасерверы: Jellyfin, Plex и Emby",
|
||||
"исследуй последние достижения в локальном LLM инференсе и обзор моделей",
|
||||
"изучи лучшие опенсорс альтернативы Google сервисов для приватности",
|
||||
"найди и кратко опиши все крупные самохостируемые менеджеры паролей",
|
||||
"напиши детальный анализ текущего состояния AI ассистентов для самохостинга",
|
||||
"исследуй и сравни все инструменты оркестрации контейнеров для домашней лаборатории",
|
||||
"изучи лучшие подходы к автоматическому резервному копированию в Linux",
|
||||
"исследуй и сравни все самохостируемые инструменты личных финансов",
|
||||
"изучи свежие CVE и уязвимости в популярном самохостируемом ПО",
|
||||
"напиши подробное руководство по настройке автоматизаций в Home Assistant",
|
||||
"исследуй все варианты голосового управления умным домом на русском языке",
|
||||
"сравни все системы резервного копирования для Linux: Restic, BorgBackup и другие",
|
||||
"исследуй лучшие самохостируемые системы мониторинга сети: Zabbix, Grafana",
|
||||
"изучи все варианты локального запуска языковых моделей на видеокарте",
|
||||
"напиши подробный отчет о технологиях синтеза речи с открытым исходным кодом",
|
||||
"исследуй все способы интеграции умных розеток с мониторингом потребления",
|
||||
"напиши полное руководство по настройке обратного прокси Caddy",
|
||||
"исследуй лучшие практики написания Docker Compose файлов для продакшена",
|
||||
"сравни все самохостируемые облачные хранилища: Nextcloud, Seafile и другие",
|
||||
"изучи все доступные локальные ассистенты с голосовым управлением",
|
||||
"исследуй все самохостируемые решения для блокировки рекламы: Pi-hole, AdGuard",
|
||||
"напиши детальное сравнение систем управления конфигурацией: Ansible, Puppet",
|
||||
"исследуй все протоколы умного дома и их плюсы и минусы: Zigbee, Z-Wave, Matter",
|
||||
"найди и сравни все фреймворки для создания локальных AI ассистентов",
|
||||
"исследуй лучшие решения для автоматического управления медиатекой",
|
||||
"изучи все варианты самохостируемых систем учёта расходов с возможностью импорта",
|
||||
"напиши сравнение всех вариантов самохостинга для хранения и синхронизации файлов",
|
||||
"исследуй все открытые протоколы для умного дома и их экосистемы",
|
||||
"изучи лучшие инструменты для автоматизации домашней инфраструктуры",
|
||||
]
|
||||
|
||||
COMPLEX = /think prefix only:
|
||||
/think compare frameworks / /think plan a trip
|
||||
|
||||
Message: {message}
|
||||
Output (one word only — light, medium, or complex):"""
|
||||
# Medium: queries that require tools, actions, or real-time data (not static knowledge)
|
||||
_MEDIUM_PATTERNS = re.compile(
|
||||
r"(?:"
|
||||
# Russian smart home commands — always need HA integration
|
||||
r"(?:включи|выключи|открой|закрой|установи|поставь|убавь|прибавь|переключи)\s"
|
||||
r"|(?:какая|какой|какое|каково)\s+(?:температура|влажность|потребление|состояние|статус)\s"
|
||||
r"|(?:сколько|есть ли)\s.*(?:датчик|устройств|замк)"
|
||||
# Russian memory queries
|
||||
r"|как меня зовут|где я живу|что мы обсуждали|что я говорил|что я просил"
|
||||
r"|напомни\b|что ты знаешь обо мне"
|
||||
# Russian current info
|
||||
r"|курс (?:доллара|биткоина|евро|рубл)"
|
||||
r"|(?:последние |свежие )?новости\b"
|
||||
r"|(?:погода|температура)\s+(?:на завтра|на неделю)"
|
||||
r")",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
LIGHT_REPLY_PROMPT = """You are a helpful Telegram assistant. Answer briefly and naturally (1-3 sentences). Be friendly."""
|
||||
|
||||
_EMBED_MODEL = "ollama/nomic-embed-text"
|
||||
|
||||
|
||||
def _cosine(a: list[float], b: list[float]) -> float:
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
norm_a = math.sqrt(sum(x * x for x in a))
|
||||
norm_b = math.sqrt(sum(x * x for x in b))
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
def _centroid(embeddings: list[list[float]]) -> list[float]:
|
||||
n = len(embeddings)
|
||||
dim = len(embeddings[0])
|
||||
return [sum(embeddings[i][d] for i in range(n)) / n for d in range(dim)]
|
||||
|
||||
|
||||
def _format_history(history: list[dict]) -> str:
|
||||
if not history:
|
||||
@@ -55,63 +349,93 @@ def _format_history(history: list[dict]) -> str:
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _parse_tier(text: str) -> str:
|
||||
"""Extract tier from raw model output. Default to medium."""
|
||||
t = text.strip().lower()
|
||||
snippet = t[:60]
|
||||
if "complex" in snippet:
|
||||
return "complex"
|
||||
if "medium" in snippet:
|
||||
return "medium"
|
||||
if "light" in snippet:
|
||||
return "light"
|
||||
# Model invented a descriptive category (e.g. "simplefact", "trivial", "basic") →
|
||||
# treat as light since it recognised the question doesn't need tools
|
||||
if any(w in snippet for w in ("simple", "fact", "trivial", "basic", "easy", "general")):
|
||||
return "light"
|
||||
return "medium" # safe default
|
||||
|
||||
|
||||
class Router:
|
||||
def __init__(self, model):
|
||||
self.model = model
|
||||
def __init__(
|
||||
self,
|
||||
model,
|
||||
embedder: AsyncOpenAI,
|
||||
fast_tool_runner: FastToolRunner | None = None,
|
||||
):
|
||||
self.model = model # qwen2.5:1.5b — used only for generating light replies
|
||||
self._embedder = embedder
|
||||
self._fast_tool_runner = fast_tool_runner
|
||||
self._light_centroid: list[float] | None = None
|
||||
self._medium_centroid: list[float] | None = None
|
||||
self._complex_centroid: list[float] | None = None
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Pre-compute utterance embeddings. Call once at startup. Retries until LiteLLM is ready."""
|
||||
print("[router] embedding utterances for semantic classifier...", flush=True)
|
||||
texts = _LIGHT_UTTERANCES + _MEDIUM_UTTERANCES + _COMPLEX_UTTERANCES
|
||||
for attempt in range(10):
|
||||
try:
|
||||
resp = await self._embedder.embeddings.create(model=_EMBED_MODEL, input=texts)
|
||||
embeddings = [item.embedding for item in resp.data]
|
||||
n_light = len(_LIGHT_UTTERANCES)
|
||||
n_medium = len(_MEDIUM_UTTERANCES)
|
||||
self._light_centroid = _centroid(embeddings[:n_light])
|
||||
self._medium_centroid = _centroid(embeddings[n_light:n_light + n_medium])
|
||||
self._complex_centroid = _centroid(embeddings[n_light + n_medium:])
|
||||
print("[router] semantic classifier ready (3-tier)", flush=True)
|
||||
return
|
||||
except Exception as e:
|
||||
print(f"[router] embedding attempt {attempt+1}/10 failed: {e}", flush=True)
|
||||
await asyncio.sleep(3)
|
||||
print("[router] WARNING: could not initialize semantic classifier — will default to medium", flush=True)
|
||||
|
||||
async def _classify_by_embedding(self, message: str) -> str:
|
||||
"""Embed message and return 'light', 'medium', or 'complex' based on centroid similarity."""
|
||||
if self._light_centroid is None or self._medium_centroid is None or self._complex_centroid is None:
|
||||
return "medium"
|
||||
try:
|
||||
resp = await self._embedder.embeddings.create(model=_EMBED_MODEL, input=[message])
|
||||
emb = resp.data[0].embedding
|
||||
score_light = _cosine(emb, self._light_centroid)
|
||||
score_medium = _cosine(emb, self._medium_centroid)
|
||||
score_complex = _cosine(emb, self._complex_centroid)
|
||||
tier = max(
|
||||
[("light", score_light), ("medium", score_medium), ("complex", score_complex)],
|
||||
key=lambda x: x[1],
|
||||
)[0]
|
||||
print(
|
||||
f"[router] semantic: light={score_light:.3f} medium={score_medium:.3f} "
|
||||
f"complex={score_complex:.3f} → {tier}",
|
||||
flush=True,
|
||||
)
|
||||
return tier
|
||||
except Exception as e:
|
||||
print(f"[router] embedding classify error, defaulting to medium: {e}", flush=True)
|
||||
return "medium"
|
||||
|
||||
async def route(
|
||||
self,
|
||||
message: str,
|
||||
history: list[dict],
|
||||
force_complex: bool = False,
|
||||
) -> tuple[str, Optional[str]]:
|
||||
"""
|
||||
Returns (tier, reply_or_None).
|
||||
For light tier: also generates the reply with a second call.
|
||||
For light tier: also generates the reply inline.
|
||||
For medium/complex: reply is None.
|
||||
"""
|
||||
if force_complex:
|
||||
return "complex", None
|
||||
if self._fast_tool_runner and self._fast_tool_runner.any_matches(message.strip()):
|
||||
names = self._fast_tool_runner.matching_names(message.strip())
|
||||
print(f"[router] fast_tool_match={names} → medium", flush=True)
|
||||
return "medium", None
|
||||
|
||||
# Step 0: regex pre-classification for obvious light patterns
|
||||
if _LIGHT_PATTERNS.match(message.strip()):
|
||||
print(f"[router] regex→light", flush=True)
|
||||
print("[router] regex→light", flush=True)
|
||||
return await self._generate_light_reply(message, history)
|
||||
|
||||
# Step 1: LLM classification with raw text output
|
||||
try:
|
||||
classify_response = await self.model.ainvoke([
|
||||
HumanMessage(content=CLASSIFY_PROMPT.format(message=message)),
|
||||
])
|
||||
raw = classify_response.content or ""
|
||||
raw = re.sub(r"<think>.*?</think>", "", raw, flags=re.DOTALL).strip()
|
||||
tier = _parse_tier(raw)
|
||||
if _COMPLEX_PATTERNS.search(message.strip()):
|
||||
print("[router] regex→complex", flush=True)
|
||||
return "complex", None
|
||||
|
||||
if tier == "complex" and not message.startswith("/think"):
|
||||
tier = "medium"
|
||||
|
||||
print(f"[router] raw={raw[:30]!r} → tier={tier}", flush=True)
|
||||
except Exception as e:
|
||||
print(f"[router] classify error, defaulting to medium: {e}", flush=True)
|
||||
if _MEDIUM_PATTERNS.search(message.strip()):
|
||||
print("[router] regex→medium", flush=True)
|
||||
return "medium", None
|
||||
|
||||
tier = await self._classify_by_embedding(message)
|
||||
|
||||
if tier != "light":
|
||||
return tier, None
|
||||
|
||||
@@ -120,7 +444,7 @@ class Router:
|
||||
async def _generate_light_reply(
|
||||
self, message: str, history: list[dict]
|
||||
) -> tuple[str, Optional[str]]:
|
||||
"""Generate a short reply using the router model for light-tier messages."""
|
||||
"""Generate a short reply using qwen2.5:1.5b for light-tier messages."""
|
||||
history_text = _format_history(history)
|
||||
context = f"\nConversation history:\n{history_text}" if history else ""
|
||||
try:
|
||||
|
||||
1172
test_pipeline.py
1172
test_pipeline.py
File diff suppressed because it is too large
Load Diff
0
tests/__init__.py
Normal file
0
tests/__init__.py
Normal file
0
tests/integration/__init__.py
Normal file
0
tests/integration/__init__.py
Normal file
273
tests/integration/common.py
Normal file
273
tests/integration/common.py
Normal file
@@ -0,0 +1,273 @@
|
||||
"""
|
||||
Shared config, helpers, and utilities for Adolf integration tests.
|
||||
"""
|
||||
|
||||
import http.client
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import time
|
||||
import urllib.request
|
||||
|
||||
# ── config ────────────────────────────────────────────────────────────────────
|
||||
DEEPAGENTS = "http://localhost:8000"
|
||||
BIFROST = "http://localhost:8080"
|
||||
OPENMEMORY = "http://localhost:8765"
|
||||
GRAMMY_HOST = "localhost"
|
||||
GRAMMY_PORT = 3001
|
||||
OLLAMA_GPU = "http://localhost:11436"
|
||||
OLLAMA_CPU = "http://localhost:11435"
|
||||
QDRANT = "http://localhost:6333"
|
||||
SEARXNG = "http://localhost:11437"
|
||||
COMPOSE_FILE = "/home/alvis/adolf/docker-compose.yml"
|
||||
DEFAULT_CHAT_ID = "346967270"
|
||||
|
||||
NAMES = [
|
||||
"Maximilian", "Cornelius", "Zephyr", "Archibald", "Balthazar",
|
||||
"Ignatius", "Lysander", "Octavian", "Reginald", "Sylvester",
|
||||
]
|
||||
|
||||
BENCHMARK = {
|
||||
"easy": [
|
||||
"hi",
|
||||
"what is 2+2?",
|
||||
"what is the capital of France?",
|
||||
"tell me a short joke",
|
||||
"how are you doing today?",
|
||||
"thanks!",
|
||||
"what day comes after Wednesday?",
|
||||
"name the three primary colors",
|
||||
"is the sky blue?",
|
||||
"what does CPU stand for?",
|
||||
],
|
||||
"medium": [
|
||||
"what is the current weather in Berlin?",
|
||||
"find the latest news about artificial intelligence",
|
||||
"what is the current price of Bitcoin?",
|
||||
"search for a good pasta carbonara recipe",
|
||||
"what movies are in theaters this week?",
|
||||
"find Python tutorials for beginners",
|
||||
"who won the last FIFA World Cup?",
|
||||
"do you remember what we talked about before?",
|
||||
"search for the best coffee shops in Tokyo",
|
||||
"what is happening in the tech industry this week?",
|
||||
"what's the weather like today?",
|
||||
],
|
||||
"hard": [
|
||||
"/think compare the top 3 Python web frameworks (Django, FastAPI, Flask) and recommend one for a production REST API",
|
||||
"/think research the history of artificial intelligence and create a timeline of key milestones",
|
||||
"/think plan a 7-day trip to Japan with daily itinerary, accommodation suggestions, and budget breakdown",
|
||||
"/think analyze microservices vs monolithic architecture: pros, cons, and when to choose each",
|
||||
"/think write a Python script that reads a CSV file, cleans the data, and generates summary statistics",
|
||||
"/think research quantum computing: explain the key concepts and how it differs from classical computing",
|
||||
"/think compare PostgreSQL, MongoDB, and Redis — when to use each and what are the trade-offs?",
|
||||
"/think create a comprehensive Docker deployment guide covering best practices for production",
|
||||
"/think research climate change: summarize the latest IPCC findings and key data points",
|
||||
"/think design a REST API with authentication, rate limiting, and proper error handling — provide architecture and code outline",
|
||||
],
|
||||
}
|
||||
|
||||
# ── terminal colours ──────────────────────────────────────────────────────────
|
||||
PASS = "\033[32mPASS\033[0m"
|
||||
FAIL = "\033[31mFAIL\033[0m"
|
||||
INFO = "\033[36mINFO\033[0m"
|
||||
WARN = "\033[33mWARN\033[0m"
|
||||
|
||||
|
||||
# ── result helpers ────────────────────────────────────────────────────────────
|
||||
|
||||
def report(results: list, name: str, ok: bool, detail: str = ""):
|
||||
tag = PASS if ok else FAIL
|
||||
print(f" [{tag}] {name}" + (f" — {detail}" if detail else ""))
|
||||
results.append((name, ok))
|
||||
|
||||
|
||||
def print_summary(results: list):
|
||||
print(f"\n{'─'*55}")
|
||||
total = len(results)
|
||||
passed = sum(1 for _, ok in results if ok)
|
||||
failed = total - passed
|
||||
print(f"Results: {passed}/{total} passed", end="")
|
||||
if failed:
|
||||
print(f" ({failed} failed)\n")
|
||||
print("Failed checks:")
|
||||
for name, ok in results:
|
||||
if not ok:
|
||||
print(f" - {name}")
|
||||
else:
|
||||
print(" — all good")
|
||||
print()
|
||||
|
||||
|
||||
def tf(v):
|
||||
"""Format timing value."""
|
||||
return f"{v:6.2f}s" if v is not None else " n/a"
|
||||
|
||||
|
||||
# ── HTTP helpers ──────────────────────────────────────────────────────────────
|
||||
|
||||
def get(url, timeout=5):
|
||||
with urllib.request.urlopen(urllib.request.Request(url), timeout=timeout) as r:
|
||||
return r.status, r.read().decode()
|
||||
|
||||
|
||||
def post_json(url, payload, timeout=10):
|
||||
data = json.dumps(payload).encode()
|
||||
req = urllib.request.Request(
|
||||
url, data=data,
|
||||
headers={"Content-Type": "application/json"},
|
||||
method="POST",
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=timeout) as r:
|
||||
return r.status, json.loads(r.read().decode())
|
||||
|
||||
|
||||
def check_sse(host, port, path):
|
||||
try:
|
||||
conn = http.client.HTTPConnection(host, port, timeout=5)
|
||||
conn.request("GET", path, headers={"Accept": "text/event-stream"})
|
||||
r = conn.getresponse()
|
||||
conn.close()
|
||||
return r.status == 200, f"HTTP {r.status}"
|
||||
except Exception as e:
|
||||
return False, str(e)
|
||||
|
||||
|
||||
def qdrant_count():
|
||||
try:
|
||||
_, body = get(f"{QDRANT}/collections/adolf_memories")
|
||||
return json.loads(body).get("result", {}).get("points_count", 0)
|
||||
except Exception:
|
||||
return 0
|
||||
|
||||
|
||||
# ── log helpers ───────────────────────────────────────────────────────────────
|
||||
|
||||
def fetch_logs(since_s=600):
|
||||
"""Return deepagents log lines from the last since_s seconds."""
|
||||
try:
|
||||
r = subprocess.run(
|
||||
["docker", "compose", "-f", COMPOSE_FILE, "logs", "deepagents",
|
||||
f"--since={int(since_s)}s", "--no-log-prefix"],
|
||||
capture_output=True, text=True, timeout=15,
|
||||
)
|
||||
return r.stdout.splitlines()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def fetch_bifrost_logs(since_s=120):
|
||||
"""Return bifrost container log lines from the last since_s seconds."""
|
||||
try:
|
||||
r = subprocess.run(
|
||||
["docker", "compose", "-f", COMPOSE_FILE, "logs", "bifrost",
|
||||
f"--since={int(since_s)}s", "--no-log-prefix"],
|
||||
capture_output=True, text=True, timeout=10,
|
||||
)
|
||||
return r.stdout.splitlines()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def parse_run_block(lines, msg_prefix):
|
||||
"""
|
||||
Scan log lines for the LAST '[agent] running: <msg_prefix>' block.
|
||||
Extracts reply timing, tier, and memory timing from that block.
|
||||
|
||||
Returns dict or None if the reply has not appeared in logs yet.
|
||||
Dict keys:
|
||||
reply_total, llm, send, tier, reply_text — from "[agent] replied in ..."
|
||||
memory_s — from "[memory] stored in ..."
|
||||
memory_error — True if "[memory] error" found
|
||||
"""
|
||||
search = msg_prefix[:50]
|
||||
start_idx = None
|
||||
for i, line in enumerate(lines):
|
||||
if "[agent] running:" in line and search in line:
|
||||
start_idx = i # keep updating — we want the LAST occurrence
|
||||
|
||||
if start_idx is None:
|
||||
return None
|
||||
|
||||
block = lines[start_idx:]
|
||||
last_ai_text = None
|
||||
reply_data = None
|
||||
|
||||
for j, line in enumerate(block):
|
||||
if "AIMessage:" in line and "→" not in line:
|
||||
txt = line.split("AIMessage:", 1)[-1].strip()
|
||||
if txt:
|
||||
last_ai_text = txt
|
||||
|
||||
m = re.search(r"replied in ([\d.]+)s \(llm=([\d.]+)s, send=([\d.]+)s\)", line)
|
||||
if m:
|
||||
tier_m = re.search(r"\btier=(\w+)", line)
|
||||
tier = tier_m.group(1) if tier_m else "unknown"
|
||||
reply_data = {
|
||||
"reply_total": float(m.group(1)),
|
||||
"llm": float(m.group(2)),
|
||||
"send": float(m.group(3)),
|
||||
"tier": tier,
|
||||
"reply_text": last_ai_text,
|
||||
"memory_s": None,
|
||||
"memory_error": False,
|
||||
"_j": j,
|
||||
}
|
||||
break
|
||||
|
||||
if reply_data is not None:
|
||||
next_lines = block[reply_data["_j"] + 1: reply_data["_j"] + 3]
|
||||
for line in next_lines:
|
||||
if line.startswith("[agent] reply_text:"):
|
||||
reply_data["reply_text"] = line[len("[agent] reply_text:"):].strip()
|
||||
break
|
||||
|
||||
if reply_data is None:
|
||||
return None
|
||||
|
||||
for line in block[reply_data["_j"] + 1:]:
|
||||
mm = re.search(r"\[memory\] stored in ([\d.]+)s", line)
|
||||
if mm:
|
||||
reply_data["memory_s"] = float(mm.group(1))
|
||||
break
|
||||
if "[memory] error" in line:
|
||||
reply_data["memory_error"] = True
|
||||
break
|
||||
|
||||
return reply_data
|
||||
|
||||
|
||||
def wait_for(label, msg_prefix, timeout_s=200, need_memory=True):
|
||||
"""
|
||||
Poll deepagents logs until the message is fully processed.
|
||||
Shows a live progress line. Returns timing dict or None on timeout.
|
||||
"""
|
||||
t_start = time.monotonic()
|
||||
deadline = t_start + timeout_s
|
||||
tick = 0
|
||||
last_result = None
|
||||
|
||||
while time.monotonic() < deadline:
|
||||
since = int(time.monotonic() - t_start) + 90
|
||||
lines = fetch_logs(since_s=since)
|
||||
result = parse_run_block(lines, msg_prefix)
|
||||
|
||||
if result:
|
||||
last_result = result
|
||||
has_mem = result["memory_s"] is not None or result["memory_error"]
|
||||
if (not need_memory) or has_mem:
|
||||
elapsed = time.monotonic() - t_start
|
||||
print(f"\r [{label}] done after {elapsed:.0f}s{' ' * 30}")
|
||||
return result
|
||||
|
||||
time.sleep(4)
|
||||
tick += 1
|
||||
rem = int(deadline - time.monotonic())
|
||||
if last_result:
|
||||
phase = "waiting for memory..." if need_memory else "done"
|
||||
else:
|
||||
phase = "waiting for LLM reply..."
|
||||
print(f"\r [{label}] {tick*4}s elapsed, {rem}s left — {phase} ", end="", flush=True)
|
||||
|
||||
print(f"\r [{label}] TIMEOUT after {timeout_s}s{' ' * 30}")
|
||||
return None
|
||||
214
tests/integration/test_health.py
Normal file
214
tests/integration/test_health.py
Normal file
@@ -0,0 +1,214 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf service health integration tests.
|
||||
|
||||
Checks:
|
||||
1. deepagents /health — agent_ready
|
||||
1b. openmemory /sse reachable
|
||||
1c. grammy /sse reachable
|
||||
2. Bifrost /health, /v1/models, direct inference, deepagents startup log
|
||||
3. GPU Ollama — reachable, qwen3:8b present
|
||||
4. CPU Ollama — reachable, nomic-embed-text present
|
||||
5. Qdrant — reachable, adolf_memories collection, vector dims=768
|
||||
6. SearXNG — reachable, JSON results, latency < 5s
|
||||
|
||||
Usage:
|
||||
python3 test_health.py
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import time
|
||||
import urllib.request
|
||||
|
||||
from common import (
|
||||
DEEPAGENTS, BIFROST, GRAMMY_HOST, GRAMMY_PORT,
|
||||
OLLAMA_GPU, OLLAMA_CPU, QDRANT, SEARXNG, COMPOSE_FILE,
|
||||
INFO, FAIL,
|
||||
report, print_summary, tf,
|
||||
get, post_json, check_sse, fetch_logs,
|
||||
)
|
||||
|
||||
results = []
|
||||
timings = {}
|
||||
|
||||
|
||||
# ── 1. Service health ─────────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 1. Service health")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{DEEPAGENTS}/health")
|
||||
data = json.loads(body)
|
||||
ok = status == 200 and data.get("agent_ready") is True
|
||||
report(results, "deepagents /health — agent_ready", ok,
|
||||
f"agent_ready={data.get('agent_ready')}")
|
||||
except Exception as e:
|
||||
report(results, "deepagents /health", False, str(e))
|
||||
|
||||
ok, detail = check_sse("localhost", 8765, "/sse")
|
||||
report(results, "openmemory /sse reachable", ok, detail)
|
||||
|
||||
ok, detail = check_sse(GRAMMY_HOST, GRAMMY_PORT, "/sse")
|
||||
report(results, "grammy /sse reachable", ok, detail)
|
||||
|
||||
timings["health_check"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 2. Bifrost gateway ────────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 2. Bifrost gateway (port 8080)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{BIFROST}/health", timeout=5)
|
||||
report(results, "Bifrost /health reachable", status == 200, f"HTTP {status}")
|
||||
except Exception as e:
|
||||
report(results, "Bifrost /health reachable", False, str(e))
|
||||
|
||||
try:
|
||||
status, body = get(f"{BIFROST}/v1/models", timeout=5)
|
||||
data = json.loads(body)
|
||||
model_ids = [m.get("id", "") for m in data.get("data", [])]
|
||||
gpu_models = [m for m in model_ids if m.startswith("ollama/")]
|
||||
report(results, "Bifrost lists ollama GPU models", len(gpu_models) > 0,
|
||||
f"found: {gpu_models}")
|
||||
for expected in ["ollama/qwen3:4b", "ollama/qwen3:8b", "ollama/qwen2.5:1.5b"]:
|
||||
report(results, f" model {expected} listed", expected in model_ids)
|
||||
except Exception as e:
|
||||
report(results, "Bifrost /v1/models", False, str(e))
|
||||
|
||||
print(f" [bifrost-infer] POST /v1/chat/completions → ollama/qwen2.5:0.5b ...")
|
||||
t_infer = time.monotonic()
|
||||
try:
|
||||
infer_payload = {
|
||||
"model": "ollama/qwen2.5:0.5b",
|
||||
"messages": [{"role": "user", "content": "Reply with exactly one word: pong"}],
|
||||
"max_tokens": 16,
|
||||
}
|
||||
data = json.dumps(infer_payload).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{BIFROST}/v1/chat/completions",
|
||||
data=data,
|
||||
headers={"Content-Type": "application/json"},
|
||||
method="POST",
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=60) as r:
|
||||
infer_status = r.status
|
||||
infer_body = json.loads(r.read().decode())
|
||||
infer_elapsed = time.monotonic() - t_infer
|
||||
reply_content = infer_body.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
used_model = infer_body.get("model", "")
|
||||
report(results, "Bifrost → Ollama GPU inference succeeds",
|
||||
infer_status == 200 and bool(reply_content),
|
||||
f"{infer_elapsed:.1f}s model={used_model!r} reply={reply_content[:60]!r}")
|
||||
timings["bifrost_direct_infer"] = infer_elapsed
|
||||
except Exception as e:
|
||||
report(results, "Bifrost → Ollama GPU inference succeeds", False, str(e))
|
||||
timings["bifrost_direct_infer"] = None
|
||||
|
||||
try:
|
||||
import subprocess
|
||||
r = subprocess.run(
|
||||
["docker", "compose", "-f", COMPOSE_FILE, "logs", "deepagents",
|
||||
"--since=3600s", "--no-log-prefix"],
|
||||
capture_output=True, text=True, timeout=10,
|
||||
)
|
||||
log_lines = r.stdout.splitlines()
|
||||
bifrost_line = next(
|
||||
(l for l in log_lines if "[agent] bifrost=" in l and "bifrost:8080" in l),
|
||||
None,
|
||||
)
|
||||
report(results, "deepagents startup log confirms bifrost URL",
|
||||
bifrost_line is not None,
|
||||
bifrost_line.strip() if bifrost_line else "line not found in logs")
|
||||
if bifrost_line:
|
||||
has_prefix = "router=ollama/" in bifrost_line and "medium=ollama/" in bifrost_line
|
||||
report(results, "deepagents model names use ollama/ prefix", has_prefix,
|
||||
bifrost_line.strip())
|
||||
except Exception as e:
|
||||
report(results, "deepagents startup log check", False, str(e))
|
||||
|
||||
timings["bifrost_check"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 3. GPU Ollama ─────────────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 3. GPU Ollama (port 11436)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{OLLAMA_GPU}/api/tags")
|
||||
models = [m["name"] for m in json.loads(body).get("models", [])]
|
||||
has_qwen = any("qwen3" in m for m in models)
|
||||
report(results, "GPU Ollama reachable", True, f"models: {models}")
|
||||
report(results, "qwen3:8b present", has_qwen)
|
||||
except Exception as e:
|
||||
report(results, "GPU Ollama reachable", False, str(e))
|
||||
report(results, "qwen3:8b present", False, "skipped")
|
||||
|
||||
timings["gpu_ollama_ping"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 4. CPU Ollama ─────────────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 4. CPU Ollama (port 11435)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{OLLAMA_CPU}/api/tags")
|
||||
models = [m["name"] for m in json.loads(body).get("models", [])]
|
||||
has_embed = any("nomic-embed-text" in m for m in models)
|
||||
report(results, "CPU Ollama reachable", True, f"models: {models}")
|
||||
report(results, "nomic-embed-text present", has_embed)
|
||||
except Exception as e:
|
||||
report(results, "CPU Ollama reachable", False, str(e))
|
||||
report(results, "nomic-embed-text present", False, "skipped")
|
||||
|
||||
timings["cpu_ollama_ping"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 5. Qdrant ─────────────────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 5. Qdrant (port 6333)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{QDRANT}/collections")
|
||||
cols = [c["name"] for c in json.loads(body).get("result", {}).get("collections", [])]
|
||||
report(results, "Qdrant reachable", True, f"collections: {cols}")
|
||||
report(results, "adolf_memories collection exists", "adolf_memories" in cols)
|
||||
except Exception as e:
|
||||
report(results, "Qdrant reachable", False, str(e))
|
||||
report(results, "adolf_memories collection exists", False, "skipped")
|
||||
|
||||
try:
|
||||
status, body = get(f"{QDRANT}/collections/adolf_memories")
|
||||
info = json.loads(body).get("result", {})
|
||||
dims = info.get("config", {}).get("params", {}).get("vectors", {}).get("size")
|
||||
report(results, "vector dims = 768", dims == 768, f"got {dims}")
|
||||
except Exception as e:
|
||||
report(results, "adolf_memories collection info", False, str(e))
|
||||
|
||||
timings["qdrant_ping"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 6. SearXNG ────────────────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 6. SearXNG (port 11437)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{SEARXNG}/search?q=test&format=json", timeout=15)
|
||||
elapsed = time.monotonic() - t0
|
||||
n = len(json.loads(body).get("results", []))
|
||||
report(results, "SearXNG reachable + JSON results", status == 200 and n > 0,
|
||||
f"{n} results in {elapsed:.1f}s")
|
||||
report(results, "SearXNG response < 5s", elapsed < 5, f"{elapsed:.2f}s")
|
||||
timings["searxng_latency"] = elapsed
|
||||
except Exception as e:
|
||||
report(results, "SearXNG reachable", False, str(e))
|
||||
report(results, "SearXNG response < 5s", False, "skipped")
|
||||
timings["searxng_latency"] = None
|
||||
|
||||
timings["searxng_check"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── summary ───────────────────────────────────────────────────────────────────
|
||||
print_summary(results)
|
||||
sys.exit(0 if all(ok for _, ok in results) else 1)
|
||||
438
tests/integration/test_memory.py
Normal file
438
tests/integration/test_memory.py
Normal file
@@ -0,0 +1,438 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf memory integration tests.
|
||||
|
||||
Tests:
|
||||
1. Name store — POST "remember that your name is <RandomName>"
|
||||
2. Qdrant point — verifies a new vector was written after store
|
||||
3. Name recall — POST "what is your name?" → reply must contain <RandomName>
|
||||
4. Bifrost — verifies store/recall requests passed through Bifrost
|
||||
5. Timing profile — breakdown of store and recall latencies
|
||||
6. Memory benchmark — store 5 personal facts, recall with 10 questions
|
||||
7. Dedup test — same fact stored twice must not grow Qdrant by 2 points
|
||||
|
||||
Usage:
|
||||
python3 test_memory.py [--chat-id CHAT_ID] [--name-only] [--bench-only] [--dedup-only]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import random
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import urllib.request
|
||||
|
||||
from common import (
|
||||
DEEPAGENTS, QDRANT, COMPOSE_FILE, DEFAULT_CHAT_ID,
|
||||
NAMES,
|
||||
INFO, PASS, FAIL, WARN,
|
||||
report, print_summary, tf,
|
||||
get, post_json, qdrant_count, fetch_logs, fetch_bifrost_logs,
|
||||
parse_run_block, wait_for,
|
||||
)
|
||||
|
||||
# ── args ──────────────────────────────────────────────────────────────────────
|
||||
parser = argparse.ArgumentParser(description="Adolf memory integration tests")
|
||||
parser.add_argument("--chat-id", default=DEFAULT_CHAT_ID)
|
||||
parser.add_argument("--name-only", action="store_true", help="Run only the name store/recall test")
|
||||
parser.add_argument("--bench-only", action="store_true", help="Run only the memory benchmark")
|
||||
parser.add_argument("--dedup-only", action="store_true", help="Run only the deduplication test")
|
||||
args = parser.parse_args()
|
||||
|
||||
CHAT_ID = args.chat_id
|
||||
_only = args.name_only or args.bench_only or args.dedup_only
|
||||
_run_name = not _only or args.name_only
|
||||
_run_bench = not _only or args.bench_only
|
||||
_run_dedup = not _only or args.dedup_only
|
||||
|
||||
results = []
|
||||
timings = {}
|
||||
|
||||
random_name = random.choice(NAMES)
|
||||
TEST_CHAT_ID = f"{CHAT_ID}-{random_name.lower()}"
|
||||
|
||||
if _run_name:
|
||||
print(f"\n Test name : \033[1m{random_name}\033[0m")
|
||||
print(f" Chat ID : {TEST_CHAT_ID}")
|
||||
|
||||
|
||||
# ── 1–4. Name store / recall pipeline ────────────────────────────────────────
|
||||
if _run_name:
|
||||
print(f"\n[{INFO}] 1. Name store / recall pipeline")
|
||||
|
||||
store_msg = f"remember that your name is {random_name}"
|
||||
recall_msg = "what is your name?"
|
||||
|
||||
# Clear memories so each run starts clean
|
||||
try:
|
||||
post_json(f"{QDRANT}/collections/adolf_memories/points/delete",
|
||||
{"filter": {}}, timeout=5)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
pts_before = qdrant_count()
|
||||
print(f" Qdrant points before: {pts_before}")
|
||||
|
||||
# ── 1. Store ──────────────────────────────────────────────────────────────
|
||||
print(f"\n [store] '{store_msg}'")
|
||||
t_store = time.monotonic()
|
||||
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": store_msg, "chat_id": TEST_CHAT_ID}, timeout=5)
|
||||
t_accept = time.monotonic() - t_store
|
||||
report(results, "POST /chat (store) returns 202 immediately",
|
||||
status == 202 and t_accept < 1, f"status={status}, t={t_accept:.3f}s")
|
||||
timings["store_http_accept"] = t_accept
|
||||
except Exception as e:
|
||||
report(results, "POST /chat (store)", False, str(e))
|
||||
print_summary(results)
|
||||
sys.exit(1)
|
||||
|
||||
store = wait_for("store", store_msg, timeout_s=220, need_memory=True)
|
||||
|
||||
if store:
|
||||
timings.update({
|
||||
"store_llm": store["llm"],
|
||||
"store_send": store["send"],
|
||||
"store_reply": store["reply_total"],
|
||||
"store_memory": store["memory_s"],
|
||||
})
|
||||
report(results, "Agent replied to store message", True,
|
||||
f"{store['reply_total']:.1f}s total llm={store['llm']:.1f}s "
|
||||
f"send={store['send']:.1f}s tier={store['tier']}")
|
||||
if store["memory_s"] is not None:
|
||||
report(results, "Memory stored without error", True, f"{store['memory_s']:.1f}s")
|
||||
elif store["memory_error"]:
|
||||
report(results, "Memory stored without error", False, "error in [memory] log")
|
||||
else:
|
||||
report(results, "Memory stored without error", False, "not found in logs")
|
||||
print(f" Store reply: {store['reply_text']!r}")
|
||||
else:
|
||||
report(results, "Agent replied to store message", False, "timeout")
|
||||
report(results, "Memory stored without error", False, "timeout")
|
||||
print_summary(results)
|
||||
sys.exit(1)
|
||||
|
||||
# ── 2. Qdrant point check ─────────────────────────────────────────────────
|
||||
pts_after = qdrant_count()
|
||||
new_pts = pts_after - pts_before
|
||||
report(results, "New memory point(s) added to Qdrant", new_pts > 0,
|
||||
f"{pts_before} → {pts_after} (+{new_pts})")
|
||||
timings["qdrant_new_points"] = new_pts
|
||||
|
||||
# ── 3. Recall ─────────────────────────────────────────────────────────────
|
||||
print(f"\n [recall] '{recall_msg}'")
|
||||
t_recall = time.monotonic()
|
||||
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": recall_msg, "chat_id": TEST_CHAT_ID}, timeout=5)
|
||||
t_accept2 = time.monotonic() - t_recall
|
||||
report(results, "POST /chat (recall) returns 202 immediately",
|
||||
status == 202 and t_accept2 < 1, f"status={status}, t={t_accept2:.3f}s")
|
||||
timings["recall_http_accept"] = t_accept2
|
||||
except Exception as e:
|
||||
report(results, "POST /chat (recall)", False, str(e))
|
||||
|
||||
recall = wait_for("recall", recall_msg, timeout_s=160, need_memory=False)
|
||||
|
||||
if recall:
|
||||
timings.update({
|
||||
"recall_llm": recall["llm"],
|
||||
"recall_send": recall["send"],
|
||||
"recall_reply": recall["reply_total"],
|
||||
})
|
||||
report(results, "Agent replied to recall message", True,
|
||||
f"{recall['reply_total']:.1f}s total llm={recall['llm']:.1f}s "
|
||||
f"send={recall['send']:.1f}s tier={recall['tier']}")
|
||||
reply_text = recall["reply_text"] or ""
|
||||
name_in_reply = random_name.lower() in reply_text.lower()
|
||||
report(results, f"Reply contains '{random_name}'", name_in_reply,
|
||||
f"reply: {reply_text[:120]!r}")
|
||||
else:
|
||||
report(results, "Agent replied to recall message", False, "timeout")
|
||||
report(results, f"Reply contains '{random_name}'", False, "no reply")
|
||||
|
||||
# ── 4. Bifrost pass-through check ─────────────────────────────────────────
|
||||
bifrost_lines = fetch_bifrost_logs(since_s=300)
|
||||
report(results, "Bifrost container has log output (requests forwarded)",
|
||||
len(bifrost_lines) > 0, f"{len(bifrost_lines)} lines in bifrost logs")
|
||||
bifrost_raw = "\n".join(bifrost_lines)
|
||||
report(results, " Bifrost log shows AsyncOpenAI agent requests",
|
||||
"AsyncOpenAI" in bifrost_raw,
|
||||
f"{'found' if 'AsyncOpenAI' in bifrost_raw else 'NOT found'} in bifrost logs")
|
||||
|
||||
# ── 5. Timing profile ─────────────────────────────────────────────────────
|
||||
print(f"\n[{INFO}] 5. Timing profile")
|
||||
W = 36
|
||||
print(f"\n {'Stage':<{W}} {'Time':>8}")
|
||||
print(f" {'─'*W} {'─'*8}")
|
||||
|
||||
for label, key in [
|
||||
("[GPU] HTTP accept — store turn", "store_http_accept"),
|
||||
("[GPU] qwen3:Xb inference — store turn", "store_llm"),
|
||||
("[GPU] Telegram send — store turn", "store_send"),
|
||||
("[GPU] Total reply latency — store", "store_reply"),
|
||||
("[GPU] qwen2.5:1.5b+embed — async mem", "store_memory"),
|
||||
]:
|
||||
print(f" {label:<{W}} {tf(timings.get(key)):>8}")
|
||||
|
||||
print(f" {'─'*W} {'─'*8}")
|
||||
|
||||
for label, key in [
|
||||
("[GPU] HTTP accept — recall turn", "recall_http_accept"),
|
||||
("[GPU] qwen3:Xb inference — recall", "recall_llm"),
|
||||
("[GPU] Telegram send — recall turn", "recall_send"),
|
||||
("[GPU] Total reply latency — recall", "recall_reply"),
|
||||
]:
|
||||
print(f" {label:<{W}} {tf(timings.get(key)):>8}")
|
||||
|
||||
print(f"\n Bottleneck analysis (each █ ≈ 5s):")
|
||||
print(f" {'─'*(W+12)}")
|
||||
candidates = [
|
||||
("[GPU] qwen3:Xb — store reply ", timings.get("store_llm") or 0),
|
||||
("[GPU] qwen3:Xb — recall reply", timings.get("recall_llm") or 0),
|
||||
("[GPU] qwen2.5:1.5b+embed (async)", timings.get("store_memory") or 0),
|
||||
]
|
||||
candidates.sort(key=lambda x: x[1], reverse=True)
|
||||
for label, t in candidates:
|
||||
bar = "█" * min(int(t / 5), 24)
|
||||
total_pipeline = (timings.get("store_reply") or 0) + (timings.get("store_memory") or 0)
|
||||
pct = f" {t/total_pipeline*100:4.0f}%" if total_pipeline > 0 else ""
|
||||
print(f" {label} {t:6.1f}s {bar}{pct}")
|
||||
print()
|
||||
|
||||
|
||||
# ── 6. Memory benchmark ───────────────────────────────────────────────────────
|
||||
if _run_bench:
|
||||
_mem_name = random.choice(["Alice", "Bruno", "Camille", "Diego", "Elena",
|
||||
"Farid", "Greta", "Hiroshi", "Irina", "Jonas"])
|
||||
_mem_city = random.choice(["Tokyo", "Berlin", "Cairo", "Sydney", "Oslo",
|
||||
"Nairobi", "Lisbon", "Seoul", "Montreal", "Bangkok"])
|
||||
_mem_allergy = random.choice(["nuts", "gluten", "dairy", "shellfish", "eggs"])
|
||||
_mem_job = random.choice([
|
||||
("software engineer", "startup"),
|
||||
("data scientist", "research lab"),
|
||||
("product manager", "tech company"),
|
||||
("DevOps engineer", "cloud provider"),
|
||||
])
|
||||
_mem_lang = random.choice(["Python", "Rust", "Go", "TypeScript", "Kotlin"])
|
||||
_mem_pet_name = random.choice(["Whiskers", "Biscuit", "Mango", "Pebble", "Shadow",
|
||||
"Noodle", "Cheddar", "Cosmo", "Pippin", "Ziggy"])
|
||||
|
||||
print(f"\n[{INFO}] 6. Memory benchmark")
|
||||
print(f" name={_mem_name} city={_mem_city} allergy={_mem_allergy} "
|
||||
f"job={_mem_job[0]}@{_mem_job[1]} lang={_mem_lang} pet={_mem_pet_name}")
|
||||
print(f" Storing 5 facts, then querying with 10 recall questions")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
# Wipe collection and restart openmemory for a clean slate
|
||||
try:
|
||||
req = urllib.request.Request(f"{QDRANT}/collections/adolf_memories", method="DELETE")
|
||||
with urllib.request.urlopen(req, timeout=5):
|
||||
pass
|
||||
print(f" [{INFO}] Wiped adolf_memories collection")
|
||||
except Exception as e:
|
||||
print(f" [{WARN}] Could not wipe collection: {e}")
|
||||
|
||||
try:
|
||||
subprocess.run(
|
||||
["docker", "compose", "-f", COMPOSE_FILE, "restart", "openmemory"],
|
||||
capture_output=True, timeout=30,
|
||||
)
|
||||
time.sleep(6)
|
||||
print(f" [{INFO}] Restarted openmemory — fresh collection ready")
|
||||
except Exception as e:
|
||||
print(f" [{WARN}] Could not restart openmemory: {e}")
|
||||
|
||||
MEMORY_FACTS = [
|
||||
f"My name is {_mem_name} and I live in {_mem_city}",
|
||||
f"I prefer vegetarian food and I'm allergic to {_mem_allergy}",
|
||||
f"I work as a {_mem_job[0]} at a {_mem_job[1]}",
|
||||
f"My favorite programming language is {_mem_lang}",
|
||||
f"I have a cat named {_mem_pet_name}",
|
||||
]
|
||||
|
||||
MEMORY_RECALLS = [
|
||||
("What is my name?", [_mem_name.lower()]),
|
||||
("Where do I live?", [_mem_city.lower()]),
|
||||
("Do I have any food allergies?", [_mem_allergy.lower()]),
|
||||
("What is my job?", [_mem_job[0].split()[0].lower()]),
|
||||
("What programming language do I prefer?", [_mem_lang.lower()]),
|
||||
("Do I have any pets?", [_mem_pet_name.lower()]),
|
||||
("Am I vegetarian or do I eat meat?", ["vegetarian"]),
|
||||
("What city am I in?", [_mem_city.lower()]),
|
||||
("Tell me what you know about me", [_mem_name.lower(), _mem_city.lower()]),
|
||||
("What's the name of my pet?", [_mem_pet_name.lower()]),
|
||||
]
|
||||
|
||||
STORE_TIMEOUT = 180
|
||||
RECALL_TIMEOUT = 180
|
||||
|
||||
print(f" Storing {len(MEMORY_FACTS)} facts...")
|
||||
store_ok = 0
|
||||
for i, fact in enumerate(MEMORY_FACTS, 1):
|
||||
print(f" [mem-store-{i:02d}] {fact!r}")
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": fact, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
continue
|
||||
|
||||
found = wait_for(f"mem-store-{i:02d}", fact, timeout_s=STORE_TIMEOUT, need_memory=True)
|
||||
if found:
|
||||
store_ok += 1
|
||||
print(f" → [{PASS}] stored tier={found['tier']} mem={found['memory_s']}s")
|
||||
else:
|
||||
print(f" → [{FAIL}] timeout")
|
||||
|
||||
report(results, f"All memory facts stored ({store_ok}/{len(MEMORY_FACTS)})",
|
||||
store_ok == len(MEMORY_FACTS))
|
||||
|
||||
# Wait for async extraction to settle
|
||||
print(f"\n Waiting for memory extraction to settle (up to 60s)...")
|
||||
_prev_count = -1
|
||||
_stable_ticks = 0
|
||||
_cur_count = 0
|
||||
for _ in range(30):
|
||||
time.sleep(2)
|
||||
try:
|
||||
_, body = get(f"{QDRANT}/collections/adolf_memories")
|
||||
_cur_count = json.loads(body).get("result", {}).get("points_count", 0)
|
||||
except Exception:
|
||||
_cur_count = _prev_count
|
||||
if _cur_count == _prev_count:
|
||||
_stable_ticks += 1
|
||||
if _stable_ticks >= 3:
|
||||
break
|
||||
else:
|
||||
_stable_ticks = 0
|
||||
_prev_count = _cur_count
|
||||
print(f" Memory settled: {_cur_count} points in Qdrant")
|
||||
|
||||
print(f"\n Querying with {len(MEMORY_RECALLS)} recall questions...")
|
||||
recall_results = []
|
||||
|
||||
for i, (question, keywords) in enumerate(MEMORY_RECALLS, 1):
|
||||
print(f" [mem-recall-{i:02d}] {question!r}")
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
recall_results.append((question, keywords, None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
recall_results.append((question, keywords, None, False))
|
||||
continue
|
||||
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < RECALL_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 30
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question)
|
||||
if found:
|
||||
break
|
||||
time.sleep(2)
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] timeout")
|
||||
recall_results.append((question, keywords, None, False))
|
||||
continue
|
||||
|
||||
reply_text = (found.get("reply_text") or "").lower()
|
||||
hit_keywords = [kw for kw in keywords if kw.lower() in reply_text]
|
||||
passed = len(hit_keywords) == len(keywords)
|
||||
tag_str = PASS if passed else WARN
|
||||
missing = [kw for kw in keywords if kw.lower() not in reply_text]
|
||||
detail = f"tier={found['tier']} lat={found['reply_total']:.1f}s"
|
||||
if missing:
|
||||
detail += f" missing keywords: {missing}"
|
||||
print(f" → [{tag_str}] {detail}")
|
||||
recall_results.append((question, keywords, found.get("reply_text"), passed))
|
||||
time.sleep(1)
|
||||
|
||||
print(f"\n {'#':<4} {'Pass':<5} {'Question':<45} {'Keywords'}")
|
||||
print(f" {'─'*4} {'─'*5} {'─'*45} {'─'*30}")
|
||||
for idx, (q, kws, reply, ok) in enumerate(recall_results, 1):
|
||||
ok_str = "✓" if ok else "✗"
|
||||
print(f" {ok_str} {idx:<3} {'yes' if ok else 'no':<5} {q[:45]:<45} {kws}")
|
||||
|
||||
recall_pass = sum(1 for _, _, _, ok in recall_results if ok)
|
||||
total_recall = len(recall_results)
|
||||
print(f"\n Memory recall score: {recall_pass}/{total_recall}")
|
||||
report(results, f"Memory recall ({recall_pass}/{total_recall} keywords found)",
|
||||
recall_pass == total_recall,
|
||||
f"{recall_pass}/{total_recall} questions had all expected keywords in reply")
|
||||
|
||||
|
||||
# ── 7. Deduplication test ─────────────────────────────────────────────────────
|
||||
if _run_dedup:
|
||||
print(f"\n[{INFO}] 7. Memory deduplication test")
|
||||
print(f" Sends the same fact twice — Qdrant point count must not increase by 2")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
DEDUP_TIMEOUT = 120
|
||||
_dedup_fact = f"My lucky number is {random.randint(1000, 9999)}"
|
||||
print(f" Fact: {_dedup_fact!r}")
|
||||
|
||||
pts_before = qdrant_count()
|
||||
print(f" Qdrant points before: {pts_before}")
|
||||
|
||||
print(f" [dedup-1] sending fact (first time)")
|
||||
found1 = None
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": _dedup_fact, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
report(results, "Dedup: first POST accepted", False, f"status={status}")
|
||||
else:
|
||||
found1 = wait_for("dedup-1", _dedup_fact, timeout_s=DEDUP_TIMEOUT, need_memory=True)
|
||||
if found1:
|
||||
print(f" [dedup-1] stored tier={found1['tier']} mem={found1['memory_s']}s")
|
||||
else:
|
||||
print(f" [dedup-1] timeout")
|
||||
except Exception as e:
|
||||
report(results, "Dedup: first POST accepted", False, str(e))
|
||||
|
||||
pts_after_first = qdrant_count()
|
||||
new_first = pts_after_first - pts_before
|
||||
print(f" Qdrant after first send: {pts_before} → {pts_after_first} (+{new_first})")
|
||||
|
||||
print(f" [dedup-2] sending same fact (second time)")
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": _dedup_fact, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
report(results, "Dedup: second POST accepted", False, f"status={status}")
|
||||
else:
|
||||
found2 = wait_for("dedup-2", _dedup_fact, timeout_s=DEDUP_TIMEOUT, need_memory=True)
|
||||
if found2:
|
||||
print(f" [dedup-2] stored tier={found2['tier']} mem={found2['memory_s']}s")
|
||||
else:
|
||||
print(f" [dedup-2] timeout")
|
||||
except Exception as e:
|
||||
report(results, "Dedup: second POST accepted", False, str(e))
|
||||
|
||||
pts_after_second = qdrant_count()
|
||||
new_second = pts_after_second - pts_after_first
|
||||
print(f" Qdrant after second send: {pts_after_first} → {pts_after_second} (+{new_second})")
|
||||
|
||||
dedup_ok = new_second <= new_first
|
||||
report(results, "Deduplication: second identical fact not added to Qdrant", dedup_ok,
|
||||
f"first send: +{new_first} pts, second send: +{new_second} pts (want second ≤ first)")
|
||||
|
||||
|
||||
# ── summary ───────────────────────────────────────────────────────────────────
|
||||
print_summary(results)
|
||||
sys.exit(0 if all(ok for _, ok in results) else 1)
|
||||
317
tests/integration/test_routing.py
Normal file
317
tests/integration/test_routing.py
Normal file
@@ -0,0 +1,317 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf tier routing benchmark.
|
||||
|
||||
Tests:
|
||||
easy — 10 questions that must route to 'light' tier
|
||||
medium — 11 questions that must route to 'medium' (light acceptable for some; complex = fail)
|
||||
hard — 10 /think questions that must route to 'complex' (medium fallback acceptable)
|
||||
|
||||
Usage:
|
||||
python3 test_routing.py [--chat-id CHAT_ID]
|
||||
[--easy-only] # only easy benchmark
|
||||
[--medium-only] # only medium benchmark
|
||||
[--hard-only] # only hard benchmark
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
import time
|
||||
|
||||
from common import (
|
||||
DEEPAGENTS, COMPOSE_FILE, DEFAULT_CHAT_ID,
|
||||
BENCHMARK,
|
||||
INFO, PASS, FAIL, WARN,
|
||||
report, print_summary,
|
||||
post_json, fetch_logs,
|
||||
parse_run_block,
|
||||
)
|
||||
|
||||
# ── args ──────────────────────────────────────────────────────────────────────
|
||||
parser = argparse.ArgumentParser(description="Adolf routing benchmark")
|
||||
parser.add_argument("--chat-id", default=DEFAULT_CHAT_ID)
|
||||
parser.add_argument("--easy-only", action="store_true")
|
||||
parser.add_argument("--medium-only", action="store_true")
|
||||
parser.add_argument("--hard-only", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
CHAT_ID = args.chat_id
|
||||
_only = args.easy_only or args.medium_only or args.hard_only
|
||||
_run_easy = not _only or args.easy_only
|
||||
_run_medium = not _only or args.medium_only
|
||||
_run_hard = not _only or args.hard_only
|
||||
|
||||
results = []
|
||||
|
||||
|
||||
# ── easy benchmark ────────────────────────────────────────────────────────────
|
||||
if _run_easy:
|
||||
print(f"\n[{INFO}] Easy routing benchmark")
|
||||
print(f" {len(BENCHMARK['easy'])} questions — all must route to 'light'")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
bench_results = []
|
||||
LIGHT_TIMEOUT = 60
|
||||
|
||||
for i, question in enumerate(BENCHMARK["easy"], 1):
|
||||
tag = f"easy-{i:02d}"
|
||||
print(f" [{tag}] {question[:55]!r}")
|
||||
|
||||
t_send = time.monotonic()
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
bench_results.append((question, "?", None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
bench_results.append((question, "?", None, False))
|
||||
continue
|
||||
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < LIGHT_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 30
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question)
|
||||
if found:
|
||||
break
|
||||
time.sleep(1)
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] no reply within {LIGHT_TIMEOUT}s")
|
||||
bench_results.append((question, "timeout", None, False))
|
||||
continue
|
||||
|
||||
tier = found.get("tier", "unknown")
|
||||
is_light = (tier == "light")
|
||||
tag_str = PASS if is_light else FAIL
|
||||
print(f" → [{tag_str}] tier={tier} latency={found['reply_total']:.1f}s llm={found['llm']:.1f}s")
|
||||
bench_results.append((question, tier, found["reply_total"], is_light))
|
||||
time.sleep(1)
|
||||
|
||||
print(f"\n {'#':<4} {'Tier':<8} {'Latency':>8} {'Question'}")
|
||||
print(f" {'─'*4} {'─'*8} {'─'*8} {'─'*50}")
|
||||
for idx, (q, tier, lat, ok) in enumerate(bench_results, 1):
|
||||
lat_str = f"{lat:.1f}s" if lat is not None else "timeout"
|
||||
ok_str = "✓" if ok else "✗"
|
||||
print(f" {ok_str} {idx:<3} {tier:<8} {lat_str:>8} {q[:50]!r}")
|
||||
|
||||
light_count = sum(1 for _, _, _, ok in bench_results if ok)
|
||||
total_bench = len(bench_results)
|
||||
lats = [lat for _, _, lat, ok in bench_results if ok and lat is not None]
|
||||
avg_lat = sum(lats) / len(lats) if lats else 0
|
||||
|
||||
print(f"\n Light-path score: {light_count}/{total_bench}")
|
||||
if lats:
|
||||
print(f" Avg latency (light): {avg_lat:.1f}s min={min(lats):.1f}s max={max(lats):.1f}s")
|
||||
|
||||
report(results, f"All easy questions routed to light ({light_count}/{total_bench})",
|
||||
light_count == total_bench,
|
||||
f"{light_count}/{total_bench} via light path, avg {avg_lat:.1f}s")
|
||||
|
||||
|
||||
# ── medium benchmark ──────────────────────────────────────────────────────────
|
||||
if _run_medium:
|
||||
print(f"\n[{INFO}] Medium routing benchmark")
|
||||
print(f" {len(BENCHMARK['medium'])} questions — must route to medium (light ok for some; complex = fail)")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
LIGHT_ACCEPTABLE = {
|
||||
"who won the last FIFA World Cup?",
|
||||
"search for a good pasta carbonara recipe",
|
||||
"find Python tutorials for beginners",
|
||||
"search for the best coffee shops in Tokyo",
|
||||
}
|
||||
|
||||
med_results = []
|
||||
MEDIUM_TIMEOUT = 120
|
||||
|
||||
for i, question in enumerate(BENCHMARK["medium"], 1):
|
||||
tag = f"med-{i:02d}"
|
||||
print(f" [{tag}] {question[:60]!r}")
|
||||
|
||||
t_send = time.monotonic()
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
med_results.append((question, "?", None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
med_results.append((question, "?", None, False))
|
||||
continue
|
||||
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < MEDIUM_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 60
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question)
|
||||
if found:
|
||||
break
|
||||
time.sleep(3)
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] no reply within {MEDIUM_TIMEOUT}s")
|
||||
med_results.append((question, "timeout", None, False))
|
||||
continue
|
||||
|
||||
tier = found.get("tier", "unknown")
|
||||
light_ok = question in LIGHT_ACCEPTABLE
|
||||
|
||||
if tier == "medium":
|
||||
correct, label, note = True, PASS, "medium ✓"
|
||||
elif tier == "light":
|
||||
correct = light_ok
|
||||
label = PASS if light_ok else WARN
|
||||
note = "light (acceptable)" if light_ok else "light (should be medium)"
|
||||
elif tier == "complex":
|
||||
correct, label, note = False, FAIL, "complex — wrong escalation"
|
||||
else:
|
||||
correct, label, note = False, FAIL, f"unknown tier {tier!r}"
|
||||
|
||||
print(f" → [{label}] {note} latency={found['reply_total']:.1f}s llm={found['llm']:.1f}s")
|
||||
med_results.append((question, tier, found["reply_total"], correct))
|
||||
time.sleep(1)
|
||||
|
||||
print(f"\n {'#':<4} {'Tier':<8} {'Latency':>8} {'Question'}")
|
||||
print(f" {'─'*4} {'─'*8} {'─'*8} {'─'*55}")
|
||||
for idx, (q, tier, lat, ok) in enumerate(med_results, 1):
|
||||
lat_str = f"{lat:.1f}s" if lat is not None else "timeout"
|
||||
ok_str = "✓" if ok else ("~" if tier == "light" else "✗")
|
||||
print(f" {ok_str} {idx:<3} {tier:<8} {lat_str:>8} {q[:55]!r}")
|
||||
|
||||
total_med = len(med_results)
|
||||
medium_count = sum(1 for _, tier, _, _ in med_results if tier == "medium")
|
||||
light_count = sum(1 for _, tier, _, _ in med_results if tier == "light")
|
||||
complex_count = sum(1 for _, tier, _, _ in med_results if tier == "complex")
|
||||
timeout_count = sum(1 for _, tier, _, _ in med_results if tier == "timeout")
|
||||
light_misroute = sum(1 for q, tier, _, _ in med_results
|
||||
if tier == "light" and q not in LIGHT_ACCEPTABLE)
|
||||
lats = [lat for _, _, lat, _ in med_results if lat is not None]
|
||||
|
||||
print(f"\n Breakdown: medium={medium_count} light={light_count} "
|
||||
f"complex={complex_count} timeout={timeout_count}")
|
||||
if light_misroute:
|
||||
print(f" [{WARN}] {light_misroute} question(s) answered via light when medium expected")
|
||||
if lats:
|
||||
print(f" Avg latency: {sum(lats)/len(lats):.1f}s min={min(lats):.1f}s max={max(lats):.1f}s")
|
||||
|
||||
report(results,
|
||||
f"Medium questions: no complex escalation ({medium_count + light_count}/{total_med} routed)",
|
||||
complex_count == 0,
|
||||
f"medium={medium_count} light={light_count} complex={complex_count} timeout={timeout_count}")
|
||||
if timeout_count:
|
||||
report(results, f"Medium questions: all completed within {MEDIUM_TIMEOUT}s", False,
|
||||
f"{timeout_count} question(s) timed out")
|
||||
|
||||
|
||||
# ── hard benchmark ────────────────────────────────────────────────────────────
|
||||
if _run_hard:
|
||||
print(f"\n[{INFO}] Hard routing benchmark")
|
||||
print(f" {len(BENCHMARK['hard'])} /think questions — must route to 'complex'")
|
||||
print(f" Acceptable fallback: 'medium' if VRAM eviction timed out")
|
||||
print(f" Fail condition: tier=light or timeout")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
hard_results = []
|
||||
COMPLEX_TIMEOUT = 300
|
||||
_VRAM_ENTER = "[vram] enter_complex_mode"
|
||||
_VRAM_EXIT = "[vram] exit_complex_mode"
|
||||
|
||||
for i, question in enumerate(BENCHMARK["hard"], 1):
|
||||
tag = f"hard-{i:02d}"
|
||||
short_q = question[len("/think "):].strip()[:60]
|
||||
print(f" [{tag}] /think {short_q!r}")
|
||||
|
||||
t_send = time.monotonic()
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
hard_results.append((question, "?", None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
hard_results.append((question, "?", None, False))
|
||||
continue
|
||||
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < COMPLEX_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 90
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question[len("/think "):].strip())
|
||||
if found:
|
||||
break
|
||||
time.sleep(5)
|
||||
|
||||
elapsed = time.monotonic() - t_send
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] no reply within {COMPLEX_TIMEOUT}s")
|
||||
hard_results.append((question, "timeout", None, False))
|
||||
continue
|
||||
|
||||
tier = found.get("tier", "unknown")
|
||||
|
||||
if tier == "complex":
|
||||
ok, label, note = True, PASS, "complex ✓"
|
||||
elif tier == "medium":
|
||||
ok, label, note = True, WARN, "medium (VRAM fallback — check [vram] logs)"
|
||||
else:
|
||||
ok, label, note = False, FAIL, f"tier={tier} — unexpected"
|
||||
|
||||
lines_block = fetch_logs(since_s=int(elapsed) + 120)
|
||||
recent = "\n".join(lines_block[-200:])
|
||||
vram_enter_seen = _VRAM_ENTER in recent
|
||||
vram_note = ""
|
||||
if tier == "complex":
|
||||
vram_note = " [vram:flush✓]" if vram_enter_seen else f" [{WARN}:no vram flush log]"
|
||||
|
||||
print(f" → [{label}] {note} latency={found['reply_total']:.1f}s llm={found['llm']:.1f}s{vram_note}")
|
||||
hard_results.append((question, tier, found["reply_total"], ok))
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\n {'#':<4} {'Tier':<8} {'Latency':>8} {'Question (/think ...)'}")
|
||||
print(f" {'─'*4} {'─'*8} {'─'*8} {'─'*55}")
|
||||
for idx, (q, tier, lat, ok) in enumerate(hard_results, 1):
|
||||
lat_str = f"{lat:.1f}s" if lat is not None else "timeout"
|
||||
ok_str = "✓" if tier == "complex" else ("~" if tier == "medium" else "✗")
|
||||
short = q[len("/think "):].strip()[:55]
|
||||
print(f" {ok_str} {idx:<3} {tier:<8} {lat_str:>8} {short!r}")
|
||||
|
||||
total_hard = len(hard_results)
|
||||
complex_count = sum(1 for _, t, _, _ in hard_results if t == "complex")
|
||||
medium_fb = sum(1 for _, t, _, _ in hard_results if t == "medium")
|
||||
light_count = sum(1 for _, t, _, _ in hard_results if t == "light")
|
||||
timeout_count = sum(1 for _, t, _, _ in hard_results if t == "timeout")
|
||||
lats = [lat for _, _, lat, _ in hard_results if lat is not None]
|
||||
|
||||
print(f"\n Breakdown: complex={complex_count} medium(fallback)={medium_fb} "
|
||||
f"light={light_count} timeout={timeout_count}")
|
||||
if medium_fb:
|
||||
print(f" [{WARN}] {medium_fb} question(s) fell back to medium (VRAM eviction timeout)")
|
||||
if light_count:
|
||||
print(f" [{FAIL}] {light_count} question(s) routed to light — /think prefix not detected")
|
||||
if lats:
|
||||
print(f" Avg latency: {sum(lats)/len(lats):.1f}s min={min(lats):.1f}s max={max(lats):.1f}s")
|
||||
|
||||
report(results,
|
||||
f"Hard questions routed to complex (not light) ({complex_count + medium_fb}/{total_hard})",
|
||||
light_count == 0 and timeout_count == 0,
|
||||
f"complex={complex_count} medium_fallback={medium_fb} light={light_count} timeout={timeout_count}")
|
||||
|
||||
|
||||
# ── summary ───────────────────────────────────────────────────────────────────
|
||||
print_summary(results)
|
||||
sys.exit(0 if all(ok for _, ok in results) else 1)
|
||||
2
tests/requirements.txt
Normal file
2
tests/requirements.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
pytest>=8.0
|
||||
pytest-asyncio>=0.23
|
||||
0
tests/unit/__init__.py
Normal file
0
tests/unit/__init__.py
Normal file
80
tests/unit/conftest.py
Normal file
80
tests/unit/conftest.py
Normal file
@@ -0,0 +1,80 @@
|
||||
"""
|
||||
Stub out all third-party packages that Adolf's source modules import.
|
||||
This lets the unit tests run without a virtualenv or Docker environment.
|
||||
Stubs are installed into sys.modules before any test file is collected.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
|
||||
# ── helpers ────────────────────────────────────────────────────────────────────
|
||||
|
||||
def _mock(name: str) -> MagicMock:
|
||||
m = MagicMock(name=name)
|
||||
sys.modules[name] = m
|
||||
return m
|
||||
|
||||
|
||||
# ── pydantic: BaseModel must be a real class so `class Foo(BaseModel)` works ──
|
||||
|
||||
class _FakeBaseModel:
|
||||
model_fields: dict = {}
|
||||
|
||||
def __init_subclass__(cls, **kwargs):
|
||||
pass
|
||||
|
||||
def __init__(self, **data):
|
||||
for k, v in data.items():
|
||||
setattr(self, k, v)
|
||||
|
||||
|
||||
_pydantic = _mock("pydantic")
|
||||
_pydantic.BaseModel = _FakeBaseModel
|
||||
|
||||
# ── httpx: used by channels.py, vram_manager.py, agent.py ────────────────────
|
||||
|
||||
_mock("httpx")
|
||||
|
||||
# ── fastapi ───────────────────────────────────────────────────────────────────
|
||||
|
||||
_fastapi = _mock("fastapi")
|
||||
_mock("fastapi.responses")
|
||||
|
||||
# ── langchain stack ───────────────────────────────────────────────────────────
|
||||
|
||||
_mock("langchain_openai")
|
||||
|
||||
_lc_core = _mock("langchain_core")
|
||||
_lc_msgs = _mock("langchain_core.messages")
|
||||
_mock("langchain_core.tools")
|
||||
|
||||
# Provide real-ish message classes so router.py can instantiate them
|
||||
class _FakeMsg:
|
||||
def __init__(self, content=""):
|
||||
self.content = content
|
||||
|
||||
class SystemMessage(_FakeMsg):
|
||||
pass
|
||||
|
||||
class HumanMessage(_FakeMsg):
|
||||
pass
|
||||
|
||||
class AIMessage(_FakeMsg):
|
||||
def __init__(self, content="", tool_calls=None):
|
||||
super().__init__(content)
|
||||
self.tool_calls = tool_calls or []
|
||||
|
||||
_lc_msgs.SystemMessage = SystemMessage
|
||||
_lc_msgs.HumanMessage = HumanMessage
|
||||
_lc_msgs.AIMessage = AIMessage
|
||||
|
||||
_mock("langchain_mcp_adapters")
|
||||
_mock("langchain_mcp_adapters.client")
|
||||
_mock("langchain_community")
|
||||
_mock("langchain_community.utilities")
|
||||
|
||||
# ── deepagents (agent_factory.py) ─────────────────────────────────────────────
|
||||
|
||||
_mock("deepagents")
|
||||
|
||||
198
tests/unit/test_agent_helpers.py
Normal file
198
tests/unit/test_agent_helpers.py
Normal file
@@ -0,0 +1,198 @@
|
||||
"""
|
||||
Unit tests for agent.py helper functions:
|
||||
- _strip_think(text)
|
||||
- _extract_final_text(result)
|
||||
|
||||
agent.py has heavy FastAPI/LangChain imports; conftest.py stubs them out so
|
||||
these pure functions can be imported and tested in isolation.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
# conftest.py has already installed all stubs into sys.modules.
|
||||
# The FastAPI app is instantiated at module level in agent.py —
|
||||
# with the mocked fastapi, that just creates a MagicMock() object
|
||||
# and the route decorators are no-ops.
|
||||
from agent import _strip_think, _extract_final_text, _extract_urls
|
||||
|
||||
|
||||
# ── _strip_think ───────────────────────────────────────────────────────────────
|
||||
|
||||
class TestStripThink:
|
||||
def test_removes_single_think_block(self):
|
||||
text = "<think>internal reasoning</think>Final answer."
|
||||
assert _strip_think(text) == "Final answer."
|
||||
|
||||
def test_removes_multiline_think_block(self):
|
||||
text = "<think>\nLine one.\nLine two.\n</think>\nResult here."
|
||||
assert _strip_think(text) == "Result here."
|
||||
|
||||
def test_no_think_block_unchanged(self):
|
||||
text = "This is a plain answer with no think block."
|
||||
assert _strip_think(text) == text
|
||||
|
||||
def test_removes_multiple_think_blocks(self):
|
||||
text = "<think>step 1</think>middle<think>step 2</think>end"
|
||||
assert _strip_think(text) == "middleend"
|
||||
|
||||
def test_strips_surrounding_whitespace(self):
|
||||
text = " <think>stuff</think> answer "
|
||||
assert _strip_think(text) == "answer"
|
||||
|
||||
def test_empty_think_block(self):
|
||||
text = "<think></think>Hello."
|
||||
assert _strip_think(text) == "Hello."
|
||||
|
||||
def test_empty_string(self):
|
||||
assert _strip_think("") == ""
|
||||
|
||||
def test_only_think_block_returns_empty(self):
|
||||
text = "<think>nothing useful</think>"
|
||||
assert _strip_think(text) == ""
|
||||
|
||||
def test_think_block_with_nested_tags(self):
|
||||
text = "<think>I should use <b>bold</b> here</think>Done."
|
||||
assert _strip_think(text) == "Done."
|
||||
|
||||
def test_preserves_markdown(self):
|
||||
text = "<think>plan</think>## Report\n\n- Point one\n- Point two"
|
||||
result = _strip_think(text)
|
||||
assert result == "## Report\n\n- Point one\n- Point two"
|
||||
|
||||
|
||||
# ── _extract_final_text ────────────────────────────────────────────────────────
|
||||
|
||||
class TestExtractFinalText:
|
||||
def _ai_msg(self, content: str, tool_calls=None):
|
||||
"""Create a minimal AIMessage-like object."""
|
||||
class AIMessage:
|
||||
pass
|
||||
m = AIMessage()
|
||||
m.content = content
|
||||
m.tool_calls = tool_calls or []
|
||||
return m
|
||||
|
||||
def _human_msg(self, content: str):
|
||||
class HumanMessage:
|
||||
pass
|
||||
m = HumanMessage()
|
||||
m.content = content
|
||||
return m
|
||||
|
||||
def test_returns_last_ai_message_content(self):
|
||||
result = {
|
||||
"messages": [
|
||||
self._human_msg("what is 2+2"),
|
||||
self._ai_msg("The answer is 4."),
|
||||
]
|
||||
}
|
||||
assert _extract_final_text(result) == "The answer is 4."
|
||||
|
||||
def test_returns_last_of_multiple_ai_messages(self):
|
||||
result = {
|
||||
"messages": [
|
||||
self._ai_msg("First response."),
|
||||
self._human_msg("follow-up"),
|
||||
self._ai_msg("Final response."),
|
||||
]
|
||||
}
|
||||
assert _extract_final_text(result) == "Final response."
|
||||
|
||||
def test_skips_empty_ai_messages(self):
|
||||
result = {
|
||||
"messages": [
|
||||
self._ai_msg("Real answer."),
|
||||
self._ai_msg(""), # empty — should be skipped
|
||||
]
|
||||
}
|
||||
assert _extract_final_text(result) == "Real answer."
|
||||
|
||||
def test_strips_think_tags_from_ai_message(self):
|
||||
result = {
|
||||
"messages": [
|
||||
self._ai_msg("<think>reasoning here</think>Clean reply."),
|
||||
]
|
||||
}
|
||||
assert _extract_final_text(result) == "Clean reply."
|
||||
|
||||
def test_falls_back_to_output_field(self):
|
||||
result = {
|
||||
"messages": [],
|
||||
"output": "Fallback output.",
|
||||
}
|
||||
assert _extract_final_text(result) == "Fallback output."
|
||||
|
||||
def test_strips_think_from_output_field(self):
|
||||
result = {
|
||||
"messages": [],
|
||||
"output": "<think>thoughts</think>Actual output.",
|
||||
}
|
||||
assert _extract_final_text(result) == "Actual output."
|
||||
|
||||
def test_returns_none_when_no_content(self):
|
||||
result = {"messages": []}
|
||||
assert _extract_final_text(result) is None
|
||||
|
||||
def test_returns_none_when_no_messages_and_no_output(self):
|
||||
result = {"messages": [], "output": ""}
|
||||
# output is falsy → returns None
|
||||
assert _extract_final_text(result) is None
|
||||
|
||||
def test_skips_non_ai_messages(self):
|
||||
result = {
|
||||
"messages": [
|
||||
self._human_msg("user question"),
|
||||
]
|
||||
}
|
||||
assert _extract_final_text(result) is None
|
||||
|
||||
def test_handles_ai_message_with_tool_calls_but_no_content(self):
|
||||
"""AIMessage that only has tool_calls (no content) should be skipped."""
|
||||
msg = self._ai_msg("", tool_calls=[{"name": "web_search", "args": {}}])
|
||||
result = {"messages": [msg]}
|
||||
assert _extract_final_text(result) is None
|
||||
|
||||
def test_multiline_think_stripped_correctly(self):
|
||||
result = {
|
||||
"messages": [
|
||||
self._ai_msg("<think>\nLong\nreasoning\nblock\n</think>\n## Report\n\nSome content."),
|
||||
]
|
||||
}
|
||||
assert _extract_final_text(result) == "## Report\n\nSome content."
|
||||
|
||||
|
||||
# ── _extract_urls ──────────────────────────────────────────────────────────────
|
||||
|
||||
class TestExtractUrls:
|
||||
def test_single_url(self):
|
||||
assert _extract_urls("check this out https://example.com please") == ["https://example.com"]
|
||||
|
||||
def test_multiple_urls(self):
|
||||
urls = _extract_urls("see https://foo.com and https://bar.org/path?q=1")
|
||||
assert urls == ["https://foo.com", "https://bar.org/path?q=1"]
|
||||
|
||||
def test_no_urls(self):
|
||||
assert _extract_urls("no links here at all") == []
|
||||
|
||||
def test_http_and_https(self):
|
||||
urls = _extract_urls("http://old.site and https://new.site")
|
||||
assert "http://old.site" in urls
|
||||
assert "https://new.site" in urls
|
||||
|
||||
def test_url_at_start_of_message(self):
|
||||
assert _extract_urls("https://example.com is interesting") == ["https://example.com"]
|
||||
|
||||
def test_url_only(self):
|
||||
assert _extract_urls("https://example.com/page") == ["https://example.com/page"]
|
||||
|
||||
def test_url_with_path_and_query(self):
|
||||
url = "https://example.com/articles/123?ref=home&page=2"
|
||||
assert _extract_urls(url) == [url]
|
||||
|
||||
def test_empty_string(self):
|
||||
assert _extract_urls("") == []
|
||||
|
||||
def test_does_not_include_surrounding_quotes(self):
|
||||
# URLs inside quotes should not include the quote character
|
||||
urls = _extract_urls('visit "https://example.com" today')
|
||||
assert urls == ["https://example.com"]
|
||||
125
tests/unit/test_channels.py
Normal file
125
tests/unit/test_channels.py
Normal file
@@ -0,0 +1,125 @@
|
||||
"""Unit tests for channels.py — register, deliver, pending_replies queue."""
|
||||
|
||||
import asyncio
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import channels
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_channels_state():
|
||||
"""Clear module-level state before and after every test."""
|
||||
channels._callbacks.clear()
|
||||
channels.pending_replies.clear()
|
||||
yield
|
||||
channels._callbacks.clear()
|
||||
channels.pending_replies.clear()
|
||||
|
||||
|
||||
# ── register ───────────────────────────────────────────────────────────────────
|
||||
|
||||
class TestRegister:
|
||||
def test_register_stores_callback(self):
|
||||
cb = AsyncMock()
|
||||
channels.register("test_channel", cb)
|
||||
assert channels._callbacks["test_channel"] is cb
|
||||
|
||||
def test_register_overwrites_existing(self):
|
||||
cb1 = AsyncMock()
|
||||
cb2 = AsyncMock()
|
||||
channels.register("ch", cb1)
|
||||
channels.register("ch", cb2)
|
||||
assert channels._callbacks["ch"] is cb2
|
||||
|
||||
def test_register_multiple_channels(self):
|
||||
cb_a = AsyncMock()
|
||||
cb_b = AsyncMock()
|
||||
channels.register("a", cb_a)
|
||||
channels.register("b", cb_b)
|
||||
assert channels._callbacks["a"] is cb_a
|
||||
assert channels._callbacks["b"] is cb_b
|
||||
|
||||
|
||||
# ── deliver ────────────────────────────────────────────────────────────────────
|
||||
|
||||
class TestDeliver:
|
||||
async def test_deliver_enqueues_reply(self):
|
||||
channels.register("cli", AsyncMock())
|
||||
await channels.deliver("cli-alvis", "cli", "hello world")
|
||||
q = channels.pending_replies["cli-alvis"]
|
||||
assert not q.empty()
|
||||
assert await q.get() == "hello world"
|
||||
|
||||
async def test_deliver_calls_channel_callback(self):
|
||||
cb = AsyncMock()
|
||||
channels.register("telegram", cb)
|
||||
await channels.deliver("tg-123", "telegram", "reply text")
|
||||
cb.assert_awaited_once_with("tg-123", "reply text")
|
||||
|
||||
async def test_deliver_unknown_channel_still_enqueues(self):
|
||||
"""No registered callback for channel → reply still goes to the queue."""
|
||||
await channels.deliver("cli-bob", "nonexistent", "fallback reply")
|
||||
q = channels.pending_replies["cli-bob"]
|
||||
assert await q.get() == "fallback reply"
|
||||
|
||||
async def test_deliver_unknown_channel_does_not_raise(self):
|
||||
"""Missing callback must not raise an exception."""
|
||||
await channels.deliver("cli-x", "ghost_channel", "msg")
|
||||
|
||||
async def test_deliver_creates_queue_if_absent(self):
|
||||
channels.register("cli", AsyncMock())
|
||||
assert "cli-new" not in channels.pending_replies
|
||||
await channels.deliver("cli-new", "cli", "hi")
|
||||
assert "cli-new" in channels.pending_replies
|
||||
|
||||
async def test_deliver_reuses_existing_queue(self):
|
||||
"""Second deliver to the same session appends to the same queue."""
|
||||
channels.register("cli", AsyncMock())
|
||||
await channels.deliver("cli-alvis", "cli", "first")
|
||||
await channels.deliver("cli-alvis", "cli", "second")
|
||||
q = channels.pending_replies["cli-alvis"]
|
||||
assert await q.get() == "first"
|
||||
assert await q.get() == "second"
|
||||
|
||||
async def test_deliver_telegram_sends_to_callback(self):
|
||||
sent = []
|
||||
|
||||
async def fake_tg(session_id, text):
|
||||
sent.append((session_id, text))
|
||||
|
||||
channels.register("telegram", fake_tg)
|
||||
await channels.deliver("tg-999", "telegram", "test message")
|
||||
assert sent == [("tg-999", "test message")]
|
||||
|
||||
|
||||
# ── register_defaults ──────────────────────────────────────────────────────────
|
||||
|
||||
class TestRegisterDefaults:
|
||||
def test_registers_telegram_and_cli(self):
|
||||
channels.register_defaults()
|
||||
assert "telegram" in channels._callbacks
|
||||
assert "cli" in channels._callbacks
|
||||
|
||||
async def test_cli_callback_is_noop(self):
|
||||
"""CLI send callback does nothing (replies are handled via SSE queue)."""
|
||||
channels.register_defaults()
|
||||
cb = channels._callbacks["cli"]
|
||||
# Should not raise and should return None
|
||||
result = await cb("cli-alvis", "some reply")
|
||||
assert result is None
|
||||
|
||||
async def test_telegram_callback_chunks_long_messages(self):
|
||||
"""Telegram callback splits messages > 4000 chars into chunks."""
|
||||
channels.register_defaults()
|
||||
cb = channels._callbacks["telegram"]
|
||||
long_text = "x" * 9000 # > 4000 chars → should produce 3 chunks
|
||||
with patch("channels.httpx.AsyncClient") as mock_client_cls:
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post = AsyncMock()
|
||||
mock_client_cls.return_value = mock_client
|
||||
await cb("tg-123", long_text)
|
||||
# 9000 chars / 4000 per chunk = 3 POST calls
|
||||
assert mock_client.post.await_count == 3
|
||||
200
tests/unit/test_router.py
Normal file
200
tests/unit/test_router.py
Normal file
@@ -0,0 +1,200 @@
|
||||
"""Unit tests for router.py — Router, _parse_tier, _format_history, _LIGHT_PATTERNS."""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
from router import Router, _parse_tier, _format_history, _LIGHT_PATTERNS
|
||||
|
||||
|
||||
# ── _LIGHT_PATTERNS regex ──────────────────────────────────────────────────────
|
||||
|
||||
class TestLightPatterns:
|
||||
@pytest.mark.parametrize("text", [
|
||||
"hi", "Hi", "HI",
|
||||
"hello", "hey", "yo", "sup",
|
||||
"good morning", "good evening", "good night", "good afternoon",
|
||||
"bye", "goodbye", "see you", "cya", "later", "ttyl",
|
||||
"thanks", "thank you", "thx", "ty",
|
||||
"ok", "okay", "k", "cool", "great", "awesome", "perfect",
|
||||
"sounds good", "got it", "nice", "sure",
|
||||
"how are you", "how are you?", "how are you doing today?",
|
||||
"what's up",
|
||||
"what day comes after Monday?",
|
||||
"what day follows Friday?",
|
||||
"what comes after summer?",
|
||||
"what does NASA stand for?",
|
||||
"what does AI stand for?",
|
||||
# with trailing punctuation
|
||||
"hi!", "hello.", "thanks!",
|
||||
])
|
||||
def test_matches(self, text):
|
||||
assert _LIGHT_PATTERNS.match(text.strip()), f"Expected light match for: {text!r}"
|
||||
|
||||
@pytest.mark.parametrize("text", [
|
||||
"what is the capital of France",
|
||||
"tell me about bitcoin",
|
||||
"what is 2+2",
|
||||
"write me a poem",
|
||||
"search for news about the election",
|
||||
"what did we talk about last time",
|
||||
"what is my name",
|
||||
"/think compare these frameworks",
|
||||
"how do I install Python",
|
||||
"explain machine learning",
|
||||
"", # empty string doesn't match the pattern
|
||||
])
|
||||
def test_no_match(self, text):
|
||||
assert not _LIGHT_PATTERNS.match(text.strip()), f"Expected NO light match for: {text!r}"
|
||||
|
||||
|
||||
# ── _parse_tier ────────────────────────────────────────────────────────────────
|
||||
|
||||
class TestParseTier:
|
||||
@pytest.mark.parametrize("raw,expected", [
|
||||
("light", "light"),
|
||||
("Light", "light"),
|
||||
("LIGHT\n", "light"),
|
||||
("medium", "medium"),
|
||||
("Medium.", "medium"),
|
||||
("complex", "complex"),
|
||||
("Complex!", "complex"),
|
||||
# descriptive words → light
|
||||
("simplefact", "light"),
|
||||
("trivial question", "light"),
|
||||
("basic", "light"),
|
||||
("easy answer", "light"),
|
||||
("general knowledge", "light"),
|
||||
# unknown → medium
|
||||
("unknown_category", "medium"),
|
||||
("", "medium"),
|
||||
("I don't know", "medium"),
|
||||
# complex only if 'complex' appears in first 60 chars
|
||||
("this is a complex query requiring search", "complex"),
|
||||
# _parse_tier checks "complex" before "medium", so complex wins even if medium appears first
|
||||
("medium complexity, not complex", "complex"),
|
||||
])
|
||||
def test_parse_tier(self, raw, expected):
|
||||
assert _parse_tier(raw) == expected
|
||||
|
||||
|
||||
# ── _format_history ────────────────────────────────────────────────────────────
|
||||
|
||||
class TestFormatHistory:
|
||||
def test_empty(self):
|
||||
assert _format_history([]) == "(none)"
|
||||
|
||||
def test_single_user_message(self):
|
||||
history = [{"role": "user", "content": "hello there"}]
|
||||
result = _format_history(history)
|
||||
assert "user: hello there" in result
|
||||
|
||||
def test_multiple_turns(self):
|
||||
history = [
|
||||
{"role": "user", "content": "What is Python?"},
|
||||
{"role": "assistant", "content": "Python is a programming language."},
|
||||
]
|
||||
result = _format_history(history)
|
||||
assert "user: What is Python?" in result
|
||||
assert "assistant: Python is a programming language." in result
|
||||
|
||||
def test_truncates_long_content(self):
|
||||
long_content = "x" * 300
|
||||
history = [{"role": "user", "content": long_content}]
|
||||
result = _format_history(history)
|
||||
# content is truncated to 200 chars in _format_history
|
||||
assert len(result) < 250
|
||||
|
||||
def test_missing_keys_handled(self):
|
||||
# Should not raise — uses .get() with defaults
|
||||
history = [{"role": "user"}] # no content key
|
||||
result = _format_history(history)
|
||||
assert "user:" in result
|
||||
|
||||
|
||||
# ── Router.route() ─────────────────────────────────────────────────────────────
|
||||
|
||||
class TestRouterRoute:
|
||||
def _make_router(self, classify_response: str, reply_response: str = "Sure!") -> Router:
|
||||
"""Return a Router with a mock model that returns given classification and reply."""
|
||||
model = MagicMock()
|
||||
classify_msg = MagicMock()
|
||||
classify_msg.content = classify_response
|
||||
reply_msg = MagicMock()
|
||||
reply_msg.content = reply_response
|
||||
# First ainvoke call → classification; second → reply
|
||||
model.ainvoke = AsyncMock(side_effect=[classify_msg, reply_msg])
|
||||
return Router(model=model)
|
||||
|
||||
async def test_force_complex_bypasses_classification(self):
|
||||
router = self._make_router("medium")
|
||||
tier, reply = await router.route("some question", [], force_complex=True)
|
||||
assert tier == "complex"
|
||||
assert reply is None
|
||||
# Model should NOT have been called
|
||||
router.model.ainvoke.assert_not_called()
|
||||
|
||||
async def test_regex_light_skips_llm_classification(self):
|
||||
# Regex match bypasses classification entirely; the only ainvoke call is the reply.
|
||||
model = MagicMock()
|
||||
reply_msg = MagicMock()
|
||||
reply_msg.content = "I'm doing great!"
|
||||
model.ainvoke = AsyncMock(return_value=reply_msg)
|
||||
router = Router(model=model)
|
||||
tier, reply = await router.route("how are you", [], force_complex=False)
|
||||
assert tier == "light"
|
||||
assert reply == "I'm doing great!"
|
||||
# Exactly one model call — no classification step
|
||||
assert router.model.ainvoke.call_count == 1
|
||||
|
||||
async def test_llm_classifies_medium(self):
|
||||
router = self._make_router("medium")
|
||||
tier, reply = await router.route("what is the bitcoin price?", [], force_complex=False)
|
||||
assert tier == "medium"
|
||||
assert reply is None
|
||||
|
||||
async def test_llm_classifies_light_generates_reply(self):
|
||||
router = self._make_router("light", "Paris is the capital of France.")
|
||||
tier, reply = await router.route("what is the capital of France?", [], force_complex=False)
|
||||
assert tier == "light"
|
||||
assert reply == "Paris is the capital of France."
|
||||
|
||||
async def test_llm_classifies_complex_downgraded_to_medium(self):
|
||||
# Without /think prefix, complex classification → downgraded to medium
|
||||
router = self._make_router("complex")
|
||||
tier, reply = await router.route("compare React and Vue", [], force_complex=False)
|
||||
assert tier == "medium"
|
||||
assert reply is None
|
||||
|
||||
async def test_llm_error_falls_back_to_medium(self):
|
||||
model = MagicMock()
|
||||
model.ainvoke = AsyncMock(side_effect=Exception("connection error"))
|
||||
router = Router(model=model)
|
||||
tier, reply = await router.route("some question", [], force_complex=False)
|
||||
assert tier == "medium"
|
||||
assert reply is None
|
||||
|
||||
async def test_light_reply_empty_falls_back_to_medium(self):
|
||||
"""If the light reply comes back empty, router returns medium instead."""
|
||||
router = self._make_router("light", "") # empty reply
|
||||
tier, reply = await router.route("what is 2+2", [], force_complex=False)
|
||||
assert tier == "medium"
|
||||
assert reply is None
|
||||
|
||||
async def test_strips_think_tags_from_classification(self):
|
||||
"""Router strips <think>...</think> from model output before parsing tier."""
|
||||
model = MagicMock()
|
||||
classify_msg = MagicMock()
|
||||
classify_msg.content = "<think>Hmm let me think...</think>medium"
|
||||
reply_msg = MagicMock()
|
||||
reply_msg.content = "I'm fine!"
|
||||
model.ainvoke = AsyncMock(side_effect=[classify_msg, reply_msg])
|
||||
router = Router(model=model)
|
||||
tier, _ = await router.route("what is the news?", [], force_complex=False)
|
||||
assert tier == "medium"
|
||||
|
||||
async def test_think_prefix_forces_complex(self):
|
||||
"""/think prefix is already stripped by agent.py; force_complex=True is passed."""
|
||||
router = self._make_router("medium")
|
||||
tier, reply = await router.route("analyse this", [], force_complex=True)
|
||||
assert tier == "complex"
|
||||
assert reply is None
|
||||
164
tests/unit/test_vram_manager.py
Normal file
164
tests/unit/test_vram_manager.py
Normal file
@@ -0,0 +1,164 @@
|
||||
"""Unit tests for vram_manager.py — VRAMManager flush/poll/prewarm logic."""
|
||||
|
||||
import asyncio
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
from vram_manager import VRAMManager
|
||||
|
||||
|
||||
BASE_URL = "http://localhost:11434"
|
||||
|
||||
|
||||
def _make_manager() -> VRAMManager:
|
||||
return VRAMManager(base_url=BASE_URL)
|
||||
|
||||
|
||||
def _mock_client(get_response=None, post_response=None):
|
||||
"""Return a context-manager mock for httpx.AsyncClient."""
|
||||
client = AsyncMock()
|
||||
client.__aenter__ = AsyncMock(return_value=client)
|
||||
client.__aexit__ = AsyncMock(return_value=False)
|
||||
if get_response is not None:
|
||||
client.get = AsyncMock(return_value=get_response)
|
||||
if post_response is not None:
|
||||
client.post = AsyncMock(return_value=post_response)
|
||||
return client
|
||||
|
||||
|
||||
# ── _flush ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
class TestFlush:
|
||||
async def test_sends_keep_alive_zero(self):
|
||||
client = _mock_client(post_response=MagicMock())
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
await mgr._flush("qwen3:4b")
|
||||
client.post.assert_awaited_once()
|
||||
_, kwargs = client.post.await_args
|
||||
body = kwargs.get("json") or client.post.call_args[1].get("json") or client.post.call_args[0][1]
|
||||
assert body["model"] == "qwen3:4b"
|
||||
assert body["keep_alive"] == 0
|
||||
|
||||
async def test_posts_to_correct_endpoint(self):
|
||||
client = _mock_client(post_response=MagicMock())
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
await mgr._flush("qwen3:8b")
|
||||
url = client.post.call_args[0][0]
|
||||
assert url == f"{BASE_URL}/api/generate"
|
||||
|
||||
async def test_ignores_exceptions_silently(self):
|
||||
client = AsyncMock()
|
||||
client.__aenter__ = AsyncMock(return_value=client)
|
||||
client.__aexit__ = AsyncMock(return_value=False)
|
||||
client.post = AsyncMock(side_effect=Exception("connection refused"))
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
# Should not raise
|
||||
await mgr._flush("qwen3:4b")
|
||||
|
||||
|
||||
# ── _prewarm ───────────────────────────────────────────────────────────────────
|
||||
|
||||
class TestPrewarm:
|
||||
async def test_sends_keep_alive_300(self):
|
||||
client = _mock_client(post_response=MagicMock())
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
await mgr._prewarm("qwen3:4b")
|
||||
_, kwargs = client.post.await_args
|
||||
body = kwargs.get("json") or client.post.call_args[1].get("json") or client.post.call_args[0][1]
|
||||
assert body["keep_alive"] == 300
|
||||
assert body["model"] == "qwen3:4b"
|
||||
|
||||
async def test_ignores_exceptions_silently(self):
|
||||
client = AsyncMock()
|
||||
client.__aenter__ = AsyncMock(return_value=client)
|
||||
client.__aexit__ = AsyncMock(return_value=False)
|
||||
client.post = AsyncMock(side_effect=Exception("timeout"))
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
await mgr._prewarm("qwen3:4b")
|
||||
|
||||
|
||||
# ── _poll_evicted ──────────────────────────────────────────────────────────────
|
||||
|
||||
class TestPollEvicted:
|
||||
async def test_returns_true_when_models_absent(self):
|
||||
resp = MagicMock()
|
||||
resp.json.return_value = {"models": [{"name": "some_other_model"}]}
|
||||
client = _mock_client(get_response=resp)
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
result = await mgr._poll_evicted(["qwen3:4b", "qwen2.5:1.5b"], timeout=5)
|
||||
assert result is True
|
||||
|
||||
async def test_returns_false_on_timeout_when_model_still_loaded(self):
|
||||
resp = MagicMock()
|
||||
resp.json.return_value = {"models": [{"name": "qwen3:4b"}]}
|
||||
client = _mock_client(get_response=resp)
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
result = await mgr._poll_evicted(["qwen3:4b"], timeout=0.1)
|
||||
assert result is False
|
||||
|
||||
async def test_returns_true_immediately_if_already_empty(self):
|
||||
resp = MagicMock()
|
||||
resp.json.return_value = {"models": []}
|
||||
client = _mock_client(get_response=resp)
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
result = await mgr._poll_evicted(["qwen3:4b"], timeout=5)
|
||||
assert result is True
|
||||
|
||||
async def test_handles_poll_error_and_continues(self):
|
||||
"""If /api/ps errors, polling continues until timeout."""
|
||||
client = AsyncMock()
|
||||
client.__aenter__ = AsyncMock(return_value=client)
|
||||
client.__aexit__ = AsyncMock(return_value=False)
|
||||
client.get = AsyncMock(side_effect=Exception("network error"))
|
||||
with patch("vram_manager.httpx.AsyncClient", return_value=client):
|
||||
mgr = _make_manager()
|
||||
result = await mgr._poll_evicted(["qwen3:4b"], timeout=0.2)
|
||||
assert result is False
|
||||
|
||||
|
||||
# ── enter_complex_mode / exit_complex_mode ─────────────────────────────────────
|
||||
|
||||
class TestComplexMode:
|
||||
async def test_enter_complex_mode_returns_true_on_success(self):
|
||||
mgr = _make_manager()
|
||||
mgr._flush = AsyncMock()
|
||||
mgr._poll_evicted = AsyncMock(return_value=True)
|
||||
result = await mgr.enter_complex_mode()
|
||||
assert result is True
|
||||
|
||||
async def test_enter_complex_mode_flushes_medium_models(self):
|
||||
mgr = _make_manager()
|
||||
mgr._flush = AsyncMock()
|
||||
mgr._poll_evicted = AsyncMock(return_value=True)
|
||||
await mgr.enter_complex_mode()
|
||||
flushed = {call.args[0] for call in mgr._flush.call_args_list}
|
||||
assert "qwen3:4b" in flushed
|
||||
assert "qwen2.5:1.5b" in flushed
|
||||
|
||||
async def test_enter_complex_mode_returns_false_on_eviction_timeout(self):
|
||||
mgr = _make_manager()
|
||||
mgr._flush = AsyncMock()
|
||||
mgr._poll_evicted = AsyncMock(return_value=False)
|
||||
result = await mgr.enter_complex_mode()
|
||||
assert result is False
|
||||
|
||||
async def test_exit_complex_mode_flushes_complex_and_prewarms_medium(self):
|
||||
mgr = _make_manager()
|
||||
mgr._flush = AsyncMock()
|
||||
mgr._prewarm = AsyncMock()
|
||||
await mgr.exit_complex_mode()
|
||||
# Must flush 8b
|
||||
flushed = {call.args[0] for call in mgr._flush.call_args_list}
|
||||
assert "qwen3:8b" in flushed
|
||||
# Must prewarm medium models
|
||||
prewarmed = {call.args[0] for call in mgr._prewarm.call_args_list}
|
||||
assert "qwen3:4b" in prewarmed
|
||||
assert "qwen2.5:1.5b" in prewarmed
|
||||
41
tests/use_cases/apple_pie_research.md
Normal file
41
tests/use_cases/apple_pie_research.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Use Case: Apple Pie Research
|
||||
|
||||
Verify that a deep research query triggers the complex tier, uses web search and
|
||||
page fetching, and produces a substantive, well-sourced recipe response.
|
||||
|
||||
## Steps
|
||||
|
||||
**1. Send the research query** (the `/think` prefix forces complex tier):
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8000/message \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "/think what is the best recipe for an apple pie?", "session_id": "use-case-apple-pie", "channel": "cli", "user_id": "claude"}'
|
||||
```
|
||||
|
||||
**2. Wait for the streaming reply** (complex tier can take up to 5 minutes):
|
||||
|
||||
```bash
|
||||
curl -s -N --max-time 300 "http://localhost:8000/stream/use-case-apple-pie"
|
||||
```
|
||||
|
||||
**3. Confirm tier and tool usage in agent logs:**
|
||||
|
||||
```bash
|
||||
docker compose -f /home/alvis/adolf/docker-compose.yml logs deepagents \
|
||||
--since=600s | grep -E "tier=complex|web_search|fetch_url|crawl4ai"
|
||||
```
|
||||
|
||||
## Evaluate (use your judgment)
|
||||
|
||||
Check each of the following:
|
||||
|
||||
- **Tier**: logs show `tier=complex` for this session
|
||||
- **Tool use**: logs show `web_search` or `fetch_url` calls during the request
|
||||
- **Ingredients**: response lists specific apple pie ingredients (apples, flour, butter, sugar, etc.)
|
||||
- **Method**: response includes preparation or baking steps
|
||||
- **Sources**: response cites real URLs it fetched, not invented links
|
||||
- **Quality**: response is structured and practical — not a refusal, stub, or generic placeholder
|
||||
|
||||
Report PASS only if all six criteria are met. For any failure, state which criterion
|
||||
failed and quote the relevant part of the response or logs.
|
||||
18
tests/use_cases/cli_startup.md
Normal file
18
tests/use_cases/cli_startup.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Use Case: CLI Startup
|
||||
|
||||
Verify the Adolf CLI container starts cleanly, shows the welcome banner,
|
||||
and exits without error when the user closes input.
|
||||
|
||||
## Steps
|
||||
|
||||
```bash
|
||||
echo "" | docker compose --profile tools run --rm -T cli \
|
||||
python3 cli.py --url http://deepagents:8000 --session use-case-cli-startup
|
||||
echo "exit code: $?"
|
||||
```
|
||||
|
||||
## Pass if
|
||||
|
||||
- Output contains `Adolf CLI`
|
||||
- Output contains the session name and gateway URL
|
||||
- Exit code is 0
|
||||
40
tests/use_cases/weather_now.md
Normal file
40
tests/use_cases/weather_now.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Use Case: Current Weather Query
|
||||
|
||||
Verify how Adolf handles a real-time information request ("what's the weather now?").
|
||||
This question requires live data that an LLM cannot answer from training alone.
|
||||
|
||||
## Steps
|
||||
|
||||
**1. Send the weather query:**
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8000/message \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "whats the weather right now?", "session_id": "use-case-weather", "channel": "cli", "user_id": "claude"}'
|
||||
```
|
||||
|
||||
**2. Stream the reply** (medium tier should respond within 30s):
|
||||
|
||||
```bash
|
||||
curl -s -N --max-time 60 "http://localhost:8000/stream/use-case-weather"
|
||||
```
|
||||
|
||||
**3. Check routing tier and any tool usage in logs:**
|
||||
|
||||
```bash
|
||||
docker compose -f /home/alvis/adolf/docker-compose.yml logs deepagents \
|
||||
--since=120s | grep -E "tier=|web_search|fetch_url|crawl4ai"
|
||||
```
|
||||
|
||||
## Evaluate (use your judgment)
|
||||
|
||||
Check each of the following:
|
||||
|
||||
- **Routing**: which tier was selected? Was it appropriate for a real-time query?
|
||||
- **Tool use**: did the agent use web_search or any external data source?
|
||||
- **Accuracy**: does the response contain actual current weather data (temperature, conditions) or is it a guess/refusal?
|
||||
- **Honesty**: if the agent cannot fetch weather, does it say so — or does it hallucinate fake data?
|
||||
- **Helpfulness**: does the response suggest how the user could get weather info (e.g. check a website, use /think)?
|
||||
|
||||
Report PASS only if the response is both honest and helpful. A hallucinated weather
|
||||
report is a FAIL. A honest "I can't check weather" with guidance is a PASS.
|
||||
Reference in New Issue
Block a user