Files
adolf/ARCHITECTURE.md
2026-03-05 11:22:34 +00:00

5.4 KiB
Raw Blame History

Adolf

Autonomous personal assistant with a multi-channel gateway. Three-tier model routing with GPU VRAM management.

Architecture

┌─────────────────────────────────────────────────────┐
│                 CHANNEL ADAPTERS                    │
│                                                     │
│  [Telegram/Grammy]   [CLI]   [Voice — future]       │
│       ↕                ↕            ↕               │
│       └────────────────┴────────────┘               │
│                        ↕                            │
│          ┌─────────────────────────┐                │
│          │   GATEWAY  (agent.py)   │                │
│          │   FastAPI  :8000        │                │
│          │                         │                │
│          │  POST /message          │  ← all inbound │
│          │  POST /chat  (legacy)   │                │
│          │  GET  /reply/{id}  SSE  │  ← CLI polling │
│          │  GET  /health           │                │
│          │                         │                │
│          │  channels.py registry   │                │
│          │  conversation buffers   │                │
│          └──────────┬──────────────┘                │
│                     ↓                               │
│          ┌──────────────────────┐                   │
│          │    AGENT CORE        │                   │
│          │  three-tier routing  │                   │
│          │  VRAM management     │                   │
│          └──────────────────────┘                   │
│                     ↓                               │
│          channels.deliver(session_id, channel, text)│
│               ↓                    ↓                │
│    telegram → POST grammy/send   cli → SSE queue    │
└─────────────────────────────────────────────────────┘

Channel Adapters

Channel session_id Inbound Outbound
Telegram tg-<chat_id> Grammy long-poll → POST /message channels.py → POST grammy:3001/send
CLI cli-<user> POST /message directly GET /reply/{id} SSE stream
Voice voice-<device> (future) (future)

Unified Message Flow

1. Channel adapter receives message
2. POST /message {text, session_id, channel, user_id}
3. 202 Accepted immediately
4. Background: run_agent_task(message, session_id, channel)
5. Route → run agent tier → get reply text
6. channels.deliver(session_id, channel, reply_text)
   - always puts reply in pending_replies[session_id] queue (for SSE)
   - calls channel-specific send callback
7. GET /reply/{session_id} SSE clients receive the reply

Three-Tier Model Routing

Tier Model VRAM Trigger Latency
Light qwen2.5:1.5b (router answers) ~1.2 GB Router classifies as light ~24s
Medium qwen3:4b ~2.5 GB Default ~2040s
Complex qwen3:8b ~6.0 GB /think prefix ~60120s

/think prefix: forces complex tier, stripped before sending to agent.

VRAM Management

GTX 1070 — 8 GB. Ollama must be restarted if CUDA init fails (model loads on CPU).

  1. Flush explicitly before loading qwen3:8b (keep_alive=0)
  2. Verify eviction via /api/ps poll (15s timeout) before proceeding
  3. Fallback: timeout → run medium agent instead
  4. Post-complex: flush 8b, pre-warm 4b + router

Session ID Convention

  • Telegram: tg-<chat_id> (e.g. tg-346967270)
  • CLI: cli-<username> (e.g. cli-alvis)

Conversation history is keyed by session_id (5-turn buffer).

Files

adolf/
├── docker-compose.yml      Services: deepagents, openmemory, grammy
├── Dockerfile              deepagents container (Python 3.12)
├── agent.py                FastAPI gateway + three-tier routing
├── channels.py             Channel registry + deliver() + pending_replies
├── router.py               Router class — qwen2.5:1.5b routing
├── vram_manager.py         VRAMManager — flush/prewarm/poll Ollama VRAM
├── agent_factory.py        build_medium_agent / build_complex_agent
├── cli.py                  Interactive CLI REPL client
├── wiki_research.py        Batch wiki research pipeline (uses /message + SSE)
├── .env                    TELEGRAM_BOT_TOKEN (not committed)
├── openmemory/
│   ├── server.py           FastMCP + mem0 MCP tools
│   └── Dockerfile
└── grammy/
    ├── bot.mjs             grammY Telegram bot + POST /send HTTP endpoint
    ├── package.json
    └── Dockerfile

External Services (from openai/ stack)

Service Host Port Role
Ollama GPU 11436 All reply inference
Ollama CPU 11435 Memory embedding (nomic-embed-text)
Qdrant 6333 Vector store for memories
SearXNG 11437 Web search