Adolf

Autonomous personal assistant with a multi-channel gateway. Three-tier model routing with GPU VRAM management.

Architecture

┌─────────────────────────────────────────────────────┐
│                 CHANNEL ADAPTERS                    │
│                                                     │
│  [Telegram/Grammy]   [CLI]   [Voice — future]       │
│       ↕                ↕            ↕               │
│       └────────────────┴────────────┘               │
│                        ↕                            │
│          ┌─────────────────────────┐                │
│          │   GATEWAY  (agent.py)   │                │
│          │   FastAPI  :8000        │                │
│          │                         │                │
│          │  POST /message          │  ← all inbound │
│          │  POST /chat  (legacy)   │                │
│          │  GET  /reply/{id}  SSE  │  ← CLI polling │
│          │  GET  /health           │                │
│          │                         │                │
│          │  channels.py registry   │                │
│          │  conversation buffers   │                │
│          └──────────┬──────────────┘                │
│                     ↓                               │
│          ┌──────────────────────┐                   │
│          │    AGENT CORE        │                   │
│          │  three-tier routing  │                   │
│          │  VRAM management     │                   │
│          └──────────────────────┘                   │
│                     ↓                               │
│          channels.deliver(session_id, channel, text)│
│               ↓                    ↓                │
│    telegram → POST grammy/send   cli → SSE queue    │
└─────────────────────────────────────────────────────┘

Channel Adapters

Channel	session_id	Inbound	Outbound
Telegram	`tg-<chat_id>`	Grammy long-poll → POST /message	channels.py → POST grammy:3001/send
CLI	`cli-<user>`	POST /message directly	GET /reply/{id} SSE stream
Voice	`voice-<device>`	(future)	(future)

Unified Message Flow

1. Channel adapter receives message
2. POST /message {text, session_id, channel, user_id}
3. 202 Accepted immediately
4. Background: run_agent_task(message, session_id, channel)
5. Route → run agent tier → get reply text
6. channels.deliver(session_id, channel, reply_text)
   - always puts reply in pending_replies[session_id] queue (for SSE)
   - calls channel-specific send callback
7. GET /reply/{session_id} SSE clients receive the reply

Three-Tier Model Routing

Tier	Model	VRAM	Trigger	Latency
Light	qwen2.5:1.5b (router answers)	~1.2 GB	Router classifies as light	~2–4s
Medium	qwen3:4b	~2.5 GB	Default	~20–40s
Complex	qwen3:8b	~6.0 GB	`/think` prefix	~60–120s

/think prefix: forces complex tier, stripped before sending to agent.

VRAM Management

GTX 1070 — 8 GB. Ollama must be restarted if CUDA init fails (model loads on CPU).

Flush explicitly before loading qwen3:8b (keep_alive=0)
Verify eviction via /api/ps poll (15s timeout) before proceeding
Fallback: timeout → run medium agent instead
Post-complex: flush 8b, pre-warm 4b + router

Session ID Convention

Telegram: tg-<chat_id> (e.g. tg-346967270)
CLI: cli-<username> (e.g. cli-alvis)

Conversation history is keyed by session_id (5-turn buffer).

Files

adolf/
├── docker-compose.yml      Services: deepagents, openmemory, grammy
├── Dockerfile              deepagents container (Python 3.12)
├── agent.py                FastAPI gateway + three-tier routing
├── channels.py             Channel registry + deliver() + pending_replies
├── router.py               Router class — qwen2.5:1.5b routing
├── vram_manager.py         VRAMManager — flush/prewarm/poll Ollama VRAM
├── agent_factory.py        build_medium_agent / build_complex_agent
├── cli.py                  Interactive CLI REPL client
├── wiki_research.py        Batch wiki research pipeline (uses /message + SSE)
├── .env                    TELEGRAM_BOT_TOKEN (not committed)
├── openmemory/
│   ├── server.py           FastMCP + mem0 MCP tools
│   └── Dockerfile
└── grammy/
    ├── bot.mjs             grammY Telegram bot + POST /send HTTP endpoint
    ├── package.json
    └── Dockerfile

External Services (from openai/ stack)

Service	Host Port	Role
Ollama GPU	11436	All reply inference
Ollama CPU	11435	Memory embedding (nomic-embed-text)
Qdrant	6333	Vector store for memories
SearXNG	11437	Web search

5.4 KiB Raw Blame History Unescape Escape