Files
AgapHost/adolf/ARCHITECTURE.md
2026-03-05 11:22:34 +00:00

119 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Adolf
Autonomous personal assistant with a multi-channel gateway. Three-tier model routing with GPU VRAM management.
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ CHANNEL ADAPTERS │
│ │
│ [Telegram/Grammy] [CLI] [Voice — future] │
│ ↕ ↕ ↕ │
│ └────────────────┴────────────┘ │
│ ↕ │
│ ┌─────────────────────────┐ │
│ │ GATEWAY (agent.py) │ │
│ │ FastAPI :8000 │ │
│ │ │ │
│ │ POST /message │ ← all inbound │
│ │ POST /chat (legacy) │ │
│ │ GET /reply/{id} SSE │ ← CLI polling │
│ │ GET /health │ │
│ │ │ │
│ │ channels.py registry │ │
│ │ conversation buffers │ │
│ └──────────┬──────────────┘ │
│ ↓ │
│ ┌──────────────────────┐ │
│ │ AGENT CORE │ │
│ │ three-tier routing │ │
│ │ VRAM management │ │
│ └──────────────────────┘ │
│ ↓ │
│ channels.deliver(session_id, channel, text)│
│ ↓ ↓ │
│ telegram → POST grammy/send cli → SSE queue │
└─────────────────────────────────────────────────────┘
```
## Channel Adapters
| Channel | session_id | Inbound | Outbound |
|---------|-----------|---------|---------|
| Telegram | `tg-<chat_id>` | Grammy long-poll → POST /message | channels.py → POST grammy:3001/send |
| CLI | `cli-<user>` | POST /message directly | GET /reply/{id} SSE stream |
| Voice | `voice-<device>` | (future) | (future) |
## Unified Message Flow
```
1. Channel adapter receives message
2. POST /message {text, session_id, channel, user_id}
3. 202 Accepted immediately
4. Background: run_agent_task(message, session_id, channel)
5. Route → run agent tier → get reply text
6. channels.deliver(session_id, channel, reply_text)
- always puts reply in pending_replies[session_id] queue (for SSE)
- calls channel-specific send callback
7. GET /reply/{session_id} SSE clients receive the reply
```
## Three-Tier Model Routing
| Tier | Model | VRAM | Trigger | Latency |
|------|-------|------|---------|---------|
| Light | qwen2.5:1.5b (router answers) | ~1.2 GB | Router classifies as light | ~24s |
| Medium | qwen3:4b | ~2.5 GB | Default | ~2040s |
| Complex | qwen3:8b | ~6.0 GB | `/think` prefix | ~60120s |
**`/think` prefix**: forces complex tier, stripped before sending to agent.
## VRAM Management
GTX 1070 — 8 GB. Ollama must be restarted if CUDA init fails (model loads on CPU).
1. Flush explicitly before loading qwen3:8b (`keep_alive=0`)
2. Verify eviction via `/api/ps` poll (15s timeout) before proceeding
3. Fallback: timeout → run medium agent instead
4. Post-complex: flush 8b, pre-warm 4b + router
## Session ID Convention
- Telegram: `tg-<chat_id>` (e.g. `tg-346967270`)
- CLI: `cli-<username>` (e.g. `cli-alvis`)
Conversation history is keyed by session_id (5-turn buffer).
## Files
```
adolf/
├── docker-compose.yml Services: deepagents, openmemory, grammy
├── Dockerfile deepagents container (Python 3.12)
├── agent.py FastAPI gateway + three-tier routing
├── channels.py Channel registry + deliver() + pending_replies
├── router.py Router class — qwen2.5:1.5b routing
├── vram_manager.py VRAMManager — flush/prewarm/poll Ollama VRAM
├── agent_factory.py build_medium_agent / build_complex_agent
├── cli.py Interactive CLI REPL client
├── wiki_research.py Batch wiki research pipeline (uses /message + SSE)
├── .env TELEGRAM_BOT_TOKEN (not committed)
├── openmemory/
│ ├── server.py FastMCP + mem0 MCP tools
│ └── Dockerfile
└── grammy/
├── bot.mjs grammY Telegram bot + POST /send HTTP endpoint
├── package.json
└── Dockerfile
```
## External Services (from openai/ stack)
| Service | Host Port | Role |
|---------|-----------|------|
| Ollama GPU | 11436 | All reply inference |
| Ollama CPU | 11435 | Memory embedding (nomic-embed-text) |
| Qdrant | 6333 | Vector store for memories |
| SearXNG | 11437 | Web search |