- ARCHITECTURE.md: comprehensive pipeline description (copied from Gitea wiki) - test_pipeline.py: tests all services, memory, async timing, and recall Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.2 KiB
3.2 KiB
Adolf
Persistent AI assistant reachable via Telegram. GPU-accelerated inference with long-term memory and web search.
Architecture
Telegram user
↕ (long-polling)
[grammy] Node.js — port 3001
- grammY bot polls Telegram
- on message: fire-and-forget POST /chat to deepagents
- exposes MCP SSE server: tool send_telegram_message(chat_id, text)
↕ fire-and-forget HTTP ↕ MCP SSE tool call
[deepagents] Python FastAPI — port 8000
- POST /chat → 202 Accepted immediately
- background task: run LangGraph react agent
- LLM: qwen3:8b via Ollama GPU (host port 11436)
- tools: search_memory, get_all_memories, web_search
- after reply: async fire-and-forget → store memory on CPU
↕ MCP SSE ↕ HTTP (SearXNG)
[openmemory] Python + mem0 — port 8765 [SearXNG — port 11437]
- MCP tools: add_memory, search_memory, get_all_memories
- mem0 backend: Qdrant (port 6333) + CPU Ollama (port 11435)
- embedder: nomic-embed-text (768 dims)
- extractor: gemma3:1b
- collection: adolf_memories
Queuing and Concurrency
Two semaphores prevent resource contention:
| Semaphore | Guards | Notes |
|---|---|---|
_reply_semaphore(1) |
GPU Ollama (qwen3:8b) | One LLM inference at a time |
_memory_semaphore(1) |
CPU Ollama (gemma3:1b) | One memory store at a time |
Reply-first pipeline:
- User message arrives via Telegram → Grammy forwards to deepagents (fire-and-forget)
- Deepagents queues behind
_reply_semaphore, runs agent, sends reply via Grammy MCP tool - After reply is sent,
asyncio.create_taskfiresstore_memory_asyncin background - Memory task queues behind
_memory_semaphore, callsadd_memoryon openmemory - openmemory uses CPU Ollama: embedding (~0.3s) + extraction (~1.6s) → stored in Qdrant
Reply latency: ~10–18s (GPU qwen3:8b inference + tool calls). Memory latency: ~5–16s (runs async, never blocks replies).
External Services (from openai/ stack)
| Service | Host Port | Role |
|---|---|---|
| Ollama GPU | 11436 | Main LLM (qwen3:8b) |
| Ollama CPU | 11435 | Memory embedding + extraction |
| Qdrant | 6333 | Vector store for memories |
| SearXNG | 11437 | Web search |
Compose Stack
Config: agap_git/adolf/docker-compose.yml
cd agap_git/adolf
docker compose up -d
Requires TELEGRAM_BOT_TOKEN in adolf/.env.
Memory
- Stored per
chat_id(Telegram user ID) asuser_idin mem0 - Semantic search via Qdrant (cosine similarity, 768-dim nomic-embed-text vectors)
- mem0 uses gemma3:1b to extract structured facts before embedding
- Collection:
adolf_memoriesin Qdrant
Files
adolf/
├── docker-compose.yml Services: deepagents, openmemory, grammy
├── Dockerfile deepagents container (Python 3.12)
├── agent.py FastAPI + LangGraph react agent
├── .env TELEGRAM_BOT_TOKEN (not committed)
├── openmemory/
│ ├── server.py FastMCP + mem0 MCP tools
│ ├── requirements.txt
│ └── Dockerfile
└── grammy/
├── bot.mjs grammY bot + MCP SSE server
├── package.json
└── Dockerfile