From fb4bea12d4a47bbfee0dc8ebc95c4f93144674d7 Mon Sep 17 00:00:00 2001 From: alvis Date: Mon, 23 Feb 2026 04:42:01 +0000 Subject: [PATCH] Add Adolf page: Telegram AI assistant with GPU inference and async memory --- Adolf.md | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Hello.md | 1 + Home.md | 3 +- 3 files changed, 95 insertions(+), 1 deletion(-) create mode 100644 Adolf.md diff --git a/Adolf.md b/Adolf.md new file mode 100644 index 0000000..43f31cc --- /dev/null +++ b/Adolf.md @@ -0,0 +1,92 @@ +# Adolf + +Persistent AI assistant reachable via Telegram. GPU-accelerated inference with long-term memory and web search. + +## Architecture + +``` +Telegram user + ↕ (long-polling) +[grammy] Node.js — port 3001 + - grammY bot polls Telegram + - on message: fire-and-forget POST /chat to deepagents + - exposes MCP SSE server: tool send_telegram_message(chat_id, text) + ↕ fire-and-forget HTTP ↕ MCP SSE tool call +[deepagents] Python FastAPI — port 8000 + - POST /chat → 202 Accepted immediately + - background task: run LangGraph react agent + - LLM: qwen3:8b via Ollama GPU (host port 11436) + - tools: search_memory, get_all_memories, web_search + - after reply: async fire-and-forget → store memory on CPU + ↕ MCP SSE ↕ HTTP (SearXNG) +[openmemory] Python + mem0 — port 8765 [SearXNG — port 11437] + - MCP tools: add_memory, search_memory, get_all_memories + - mem0 backend: Qdrant (port 6333) + CPU Ollama (port 11435) + - embedder: nomic-embed-text (768 dims) + - extractor: gemma3:1b + - collection: adolf_memories +``` + +## Queuing and Concurrency + +Two semaphores prevent resource contention: + +| Semaphore | Guards | Notes | +|-----------|--------|-------| +| `_reply_semaphore(1)` | GPU Ollama (qwen3:8b) | One LLM inference at a time | +| `_memory_semaphore(1)` | CPU Ollama (gemma3:1b) | One memory store at a time | + +**Reply-first pipeline:** +1. User message arrives via Telegram → Grammy forwards to deepagents (fire-and-forget) +2. Deepagents queues behind `_reply_semaphore`, runs agent, sends reply via Grammy MCP tool +3. After reply is sent, `asyncio.create_task` fires `store_memory_async` in background +4. Memory task queues behind `_memory_semaphore`, calls `add_memory` on openmemory +5. openmemory uses CPU Ollama: embedding (~0.3s) + extraction (~1.6s) → stored in Qdrant + +Reply latency: ~10–18s (GPU qwen3:8b inference + tool calls). +Memory latency: ~5–16s (runs async, never blocks replies). + +## External Services (from openai/ stack) + +| Service | Host Port | Role | +|---------|-----------|------| +| Ollama GPU | 11436 | Main LLM (qwen3:8b) | +| Ollama CPU | 11435 | Memory embedding + extraction | +| Qdrant | 6333 | Vector store for memories | +| SearXNG | 11437 | Web search | + +## Compose Stack + +Config: `agap_git/adolf/docker-compose.yml` + +```bash +cd agap_git/adolf +docker compose up -d +``` + +Requires `TELEGRAM_BOT_TOKEN` in `adolf/.env`. + +## Memory + +- Stored per `chat_id` (Telegram user ID) as `user_id` in mem0 +- Semantic search via Qdrant (cosine similarity, 768-dim nomic-embed-text vectors) +- mem0 uses gemma3:1b to extract structured facts before embedding +- Collection: `adolf_memories` in Qdrant + +## Files + +``` +adolf/ +├── docker-compose.yml Services: deepagents, openmemory, grammy +├── Dockerfile deepagents container (Python 3.12) +├── agent.py FastAPI + LangGraph react agent +├── .env TELEGRAM_BOT_TOKEN (not committed) +├── openmemory/ +│ ├── server.py FastMCP + mem0 MCP tools +│ ├── requirements.txt +│ └── Dockerfile +└── grammy/ + ├── bot.mjs grammY bot + MCP SSE server + ├── package.json + └── Dockerfile +``` diff --git a/Hello.md b/Hello.md index bb616ac..35a1f19 100644 --- a/Hello.md +++ b/Hello.md @@ -16,6 +16,7 @@ This repository contains Docker Compose files, configuration templates, and depl | Zabbix | Monitoring (server + host agents) | | Home Assistant | Home automation | | 3X-UI | VPN / proxy | +| Adolf | Persistent AI assistant via Telegram (GPU inference, long-term memory) | ## Stack diff --git a/Home.md b/Home.md index 5c227ee..cfa59ac 100644 --- a/Home.md +++ b/Home.md @@ -12,7 +12,8 @@ - [[Open-WebUI]] — AI chat interface - [[Home-Assistant]] — KVM virtual machine - [[3X-UI]] — VPN proxy -- [Zabbix](Zabbix) — Monitoring (Zabbix 7.4, PostgreSQL, Apache) +- [[Zabbix]] — Monitoring (Zabbix 7.4, PostgreSQL, Apache) +- [[Adolf]] — Persistent AI assistant (Telegram, GPU, memory) ## Quick Start