# Adolf Persistent AI assistant reachable via Telegram. GPU-accelerated inference with long-term memory and web search. ## Architecture ``` Telegram user ↕ (long-polling) [grammy] Node.js — port 3001 - grammY bot polls Telegram - on message: fire-and-forget POST /chat to deepagents - exposes MCP SSE server: tool send_telegram_message(chat_id, text) ↕ fire-and-forget HTTP ↕ MCP SSE tool call [deepagents] Python FastAPI — port 8000 - POST /chat → 202 Accepted immediately - background task: run LangGraph react agent - LLM: qwen3:8b via Ollama GPU (host port 11436) - tools: search_memory, get_all_memories, web_search - after reply: async fire-and-forget → store memory on CPU ↕ MCP SSE ↕ HTTP (SearXNG) [openmemory] Python + mem0 — port 8765 [SearXNG — port 11437] - MCP tools: add_memory, search_memory, get_all_memories - mem0 backend: Qdrant (port 6333) + CPU Ollama (port 11435) - embedder: nomic-embed-text (768 dims) - extractor: gemma3:1b - collection: adolf_memories ``` ## Queuing and Concurrency Two semaphores prevent resource contention: | Semaphore | Guards | Notes | |-----------|--------|-------| | `_reply_semaphore(1)` | GPU Ollama (qwen3:8b) | One LLM inference at a time | | `_memory_semaphore(1)` | CPU Ollama (gemma3:1b) | One memory store at a time | **Reply-first pipeline:** 1. User message arrives via Telegram → Grammy forwards to deepagents (fire-and-forget) 2. Deepagents queues behind `_reply_semaphore`, runs agent, sends reply via Grammy MCP tool 3. After reply is sent, `asyncio.create_task` fires `store_memory_async` in background 4. Memory task queues behind `_memory_semaphore`, calls `add_memory` on openmemory 5. openmemory uses CPU Ollama: embedding (~0.3s) + extraction (~1.6s) → stored in Qdrant Reply latency: ~10–18s (GPU qwen3:8b inference + tool calls). Memory latency: ~5–16s (runs async, never blocks replies). ## External Services (from openai/ stack) | Service | Host Port | Role | |---------|-----------|------| | Ollama GPU | 11436 | Main LLM (qwen3:8b) | | Ollama CPU | 11435 | Memory embedding + extraction | | Qdrant | 6333 | Vector store for memories | | SearXNG | 11437 | Web search | ## Compose Stack Config: `agap_git/adolf/docker-compose.yml` ```bash cd agap_git/adolf docker compose up -d ``` Requires `TELEGRAM_BOT_TOKEN` in `adolf/.env`. ## Memory - Stored per `chat_id` (Telegram user ID) as `user_id` in mem0 - Semantic search via Qdrant (cosine similarity, 768-dim nomic-embed-text vectors) - mem0 uses gemma3:1b to extract structured facts before embedding - Collection: `adolf_memories` in Qdrant ## Files ``` adolf/ ├── docker-compose.yml Services: deepagents, openmemory, grammy ├── Dockerfile deepagents container (Python 3.12) ├── agent.py FastAPI + LangGraph react agent ├── .env TELEGRAM_BOT_TOKEN (not committed) ├── openmemory/ │ ├── server.py FastMCP + mem0 MCP tools │ ├── requirements.txt │ └── Dockerfile └── grammy/ ├── bot.mjs grammY bot + MCP SSE server ├── package.json └── Dockerfile ```