Compare commits
15 Commits
09a93c661e
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e04f9059ae | ||
|
|
002f9863b0 | ||
|
|
77c7cd09aa | ||
|
|
b66a74df06 | ||
|
|
b8db06cd21 | ||
|
|
7e889d8530 | ||
|
|
73ba559593 | ||
|
|
10cb24b7e5 | ||
|
|
20c318b3c1 | ||
|
|
8873e441c2 | ||
|
|
d72fd95dfd | ||
|
|
87eb4fb765 | ||
|
|
e2e15009e2 | ||
|
|
5017827af2 | ||
|
|
a30936f120 |
2
.gitignore
vendored
Normal file
2
.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
adolf/.env
|
||||
seafile/.env
|
||||
100
CLAUDE.md
100
CLAUDE.md
@@ -13,6 +13,7 @@ This repository manages Docker Compose configurations for the **Agap** self-host
|
||||
| `immich-app/` | Immich (photo management) | 2283 | Main compose via root `docker-compose.yml` |
|
||||
| `gitea/` | Gitea (git hosting) + Postgres | 3000, 222 | Standalone compose |
|
||||
| `openai/` | Open WebUI + Ollama (AI chat) | 3125 | Requires NVIDIA GPU |
|
||||
| `vaultwarden/` | Vaultwarden (password manager) | 8041 | Backup script in `vaultwarden/backup.sh` |
|
||||
|
||||
## Common Commands
|
||||
|
||||
@@ -90,6 +91,8 @@ When changes are made to infrastructure (services, config, setup), update the re
|
||||
| Home-Assistant | KVM-based Home Assistant setup |
|
||||
| 3X-UI | VPN proxy panel |
|
||||
| Gitea | Git hosting Docker service |
|
||||
| Vaultwarden | Password manager, CLI setup, backup |
|
||||
| Seafile | File sync, document editing, OnlyOffice, WebDAV |
|
||||
|
||||
### Read Wiki Pages (API)
|
||||
|
||||
@@ -125,3 +128,100 @@ git push http://alvis:$GITEA_TOKEN@localhost:3000/alvis/AgapHost.wiki.git main
|
||||
- Remove outdated or redundant content when updating
|
||||
- Create a new page if a topic doesn't exist yet
|
||||
- Wiki files are Markdown, named `<PageTitle>.md`
|
||||
|
||||
## Home Assistant API
|
||||
|
||||
**Instance**: `https://haos.alogins.net`
|
||||
**Token**: Read from `$HA_TOKEN` environment variable — never hardcode it
|
||||
**Base URL**: `https://haos.alogins.net/api/`
|
||||
**Auth header**: `Authorization: Bearer <token>`
|
||||
|
||||
### Common Endpoints
|
||||
```bash
|
||||
# Health check
|
||||
curl -s -H "Authorization: Bearer $HA_TOKEN" \
|
||||
https://haos.alogins.net/api/
|
||||
|
||||
# Get all entity states
|
||||
curl -s -H "Authorization: Bearer $HA_TOKEN" \
|
||||
https://haos.alogins.net/api/states
|
||||
|
||||
# Get specific entity
|
||||
curl -s -H "Authorization: Bearer $HA_TOKEN" \
|
||||
https://haos.alogins.net/api/states/<entity_id>
|
||||
|
||||
# Call service (e.g., turn on light)
|
||||
curl -s -X POST \
|
||||
-H "Authorization: Bearer $HA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"entity_id":"light.example"}' \
|
||||
https://haos.alogins.net/api/services/<domain>/<service>
|
||||
```
|
||||
|
||||
**Note**: Status 401 = token invalid/expired
|
||||
|
||||
## HA → Zabbix Alerting
|
||||
|
||||
Home Assistant automations push alerts to Zabbix via `history.push` API (Zabbix 7.4 trapper items). No middleware needed.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
[HA sensor ON] → [HA automation] → [rest_command: HTTP POST] → [Zabbix history.push] → [trapper item] → [trigger] → [Telegram]
|
||||
```
|
||||
|
||||
### Water Leak Sensors
|
||||
|
||||
3x HOBEIAN ZG-222Z moisture sensors → Disaster-level Zabbix alert with room name.
|
||||
|
||||
| HA Entity | Room |
|
||||
|-----------|------|
|
||||
| `binary_sensor.hobeian_zg_222z` | Kitchen |
|
||||
| `binary_sensor.hobeian_zg_222z_2` | Bathroom |
|
||||
| `binary_sensor.hobeian_zg_222z_3` | Laundry |
|
||||
|
||||
**Zabbix side** (host "HA Agap", hostid 10780):
|
||||
- Trapper item: `water.leak` (text type) — receives room name or "ok"
|
||||
- Trigger: `last(/HA Agap/water.leak)<>"ok"` — Disaster (severity 5), manual close
|
||||
- Trigger name uses `{ITEM.LASTVALUE}` to show room in notification
|
||||
|
||||
**HA side** (`configuration.yaml`):
|
||||
- `rest_command.zabbix_water_leak` — POST to Zabbix `history.push`, accepts `{{ room }}` template variable
|
||||
- `rest_command.zabbix_water_leak_clear` — pushes "ok" to clear
|
||||
- Automation "Water Leak Alert" — any sensor ON → sends room name to Zabbix
|
||||
- Automation "Water Leak Clear" — all sensors OFF → sends "ok"
|
||||
|
||||
### Adding a New HA → Zabbix Alert
|
||||
|
||||
1. **Zabbix**: Create trapper item (type 2) on "HA Agap" via `item.create` API. Create trigger via `trigger.create`.
|
||||
2. **HA config**: Add `rest_command` entry in `configuration.yaml` with `history.push` payload. Restart HA.
|
||||
3. **HA automation**: Create via `POST /api/config/automation/config/<id>` with trigger on sensor state and action calling the rest_command.
|
||||
4. **Test**: Call `rest_command` via HA API, verify Zabbix problem appears.
|
||||
|
||||
## Zabbix API
|
||||
|
||||
**Instance**: `http://localhost:81` (local), `https://zb.alogins.net` (external)
|
||||
**Endpoint**: `http://localhost:81/api_jsonrpc.php`
|
||||
**Token**: Read from `$ZABBIX_TOKEN` environment variable — never hardcode it
|
||||
**Auth header**: `Authorization: Bearer <token>`
|
||||
|
||||
### Common Requests
|
||||
```bash
|
||||
# Check API version
|
||||
curl -s -X POST http://localhost:81/api_jsonrpc.php \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $ZABBIX_TOKEN" \
|
||||
-d '{"jsonrpc":"2.0","method":"apiinfo.version","params":{},"id":1}'
|
||||
|
||||
# Get all hosts
|
||||
curl -s -X POST http://localhost:81/api_jsonrpc.php \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $ZABBIX_TOKEN" \
|
||||
-d '{"jsonrpc":"2.0","method":"host.get","params":{"output":"extend"},"id":1}'
|
||||
|
||||
# Get problems/issues
|
||||
curl -s -X POST http://localhost:81/api_jsonrpc.php \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $ZABBIX_TOKEN" \
|
||||
-d '{"jsonrpc":"2.0","method":"problem.get","params":{"output":"extend"},"id":1}'
|
||||
```
|
||||
|
||||
122
Caddyfile
Normal file
122
Caddyfile
Normal file
@@ -0,0 +1,122 @@
|
||||
haos.alogins.net {
|
||||
reverse_proxy http://192.168.1.141:8123 {
|
||||
|
||||
header_up X-Forwarded-For {remote_host}
|
||||
header_up X-Forwarded-Proto {scheme}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
vi.alogins.net {
|
||||
reverse_proxy localhost:2283
|
||||
}
|
||||
|
||||
doc.alogins.net {
|
||||
reverse_proxy localhost:11001
|
||||
}
|
||||
|
||||
zb.alogins.net {
|
||||
reverse_proxy localhost:81
|
||||
}
|
||||
|
||||
wiki.alogins.net {
|
||||
reverse_proxy localhost:8083 {
|
||||
header_up Host {http.request.host}
|
||||
header_up X-Forwarded-Proto {scheme}
|
||||
header_up X-Real-IP {remote_host}
|
||||
}
|
||||
}
|
||||
|
||||
nn.alogins.net {
|
||||
reverse_proxy localhost:5678
|
||||
}
|
||||
|
||||
git.alogins.net {
|
||||
reverse_proxy localhost:3000
|
||||
}
|
||||
|
||||
ds.alogins.net {
|
||||
reverse_proxy localhost:3974
|
||||
}
|
||||
|
||||
ai.alogins.net {
|
||||
reverse_proxy localhost:3125
|
||||
}
|
||||
|
||||
openpi.alogins.net {
|
||||
root * /home/alvis/tmp/files/pi05_droid
|
||||
file_server browse
|
||||
|
||||
}
|
||||
|
||||
|
||||
vui3.alogins.net {
|
||||
@xhttp {
|
||||
path /VLSpdG9k/xht*
|
||||
}
|
||||
handle @xhttp {
|
||||
reverse_proxy http://localhost:8445 {
|
||||
flush_interval -1
|
||||
header_up X-Real-IP {remote_host}
|
||||
transport http {
|
||||
read_timeout 0
|
||||
write_timeout 0
|
||||
dial_timeout 10s
|
||||
}
|
||||
}
|
||||
}
|
||||
reverse_proxy /gnYCNq4EbYukS5qtOe/* localhost:58959
|
||||
respond 401
|
||||
}
|
||||
|
||||
vui4.alogins.net {
|
||||
reverse_proxy localhost:58959
|
||||
}
|
||||
|
||||
ntfy.alogins.net {
|
||||
reverse_proxy localhost:8840
|
||||
}
|
||||
|
||||
docs.alogins.net {
|
||||
reverse_proxy localhost:8078
|
||||
}
|
||||
|
||||
office.alogins.net {
|
||||
reverse_proxy localhost:6233
|
||||
}
|
||||
|
||||
vw.alogins.net {
|
||||
reverse_proxy localhost:8041
|
||||
}
|
||||
|
||||
mtx.alogins.net {
|
||||
handle /.well-known/matrix/client {
|
||||
header Content-Type application/json
|
||||
header Access-Control-Allow-Origin *
|
||||
respond `{"m.homeserver":{"base_url":"https://mtx.alogins.net"},"org.matrix.msc4143.rtc_foci":[{"type":"livekit","livekit_service_url":"https://lkjwt.alogins.net"}]}`
|
||||
}
|
||||
handle /.well-known/matrix/server {
|
||||
header Content-Type application/json
|
||||
header Access-Control-Allow-Origin *
|
||||
respond `{"m.server":"mtx.alogins.net:443"}`
|
||||
}
|
||||
handle /_matrix/client/unstable/org.matrix.msc4143/rtc/transports {
|
||||
header Content-Type application/json
|
||||
header Access-Control-Allow-Origin *
|
||||
respond `{"foci":[{"type":"livekit","livekit_service_url":"https://lkjwt.alogins.net"}]}`
|
||||
}
|
||||
reverse_proxy localhost:8008
|
||||
}
|
||||
|
||||
lkjwt.alogins.net {
|
||||
reverse_proxy localhost:8009
|
||||
}
|
||||
|
||||
lk.alogins.net {
|
||||
reverse_proxy localhost:7880
|
||||
}
|
||||
|
||||
localhost:8042 {
|
||||
reverse_proxy localhost:8041
|
||||
tls internal
|
||||
}
|
||||
@@ -1,144 +0,0 @@
|
||||
# Adolf
|
||||
|
||||
Persistent AI assistant reachable via Telegram. Three-tier model routing with GPU VRAM management.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Telegram user
|
||||
↕ (long-polling)
|
||||
[grammy] Node.js — port 3001
|
||||
- grammY bot polls Telegram
|
||||
- on message: fire-and-forget POST /chat to deepagents
|
||||
- exposes MCP SSE server: tool send_telegram_message(chat_id, text)
|
||||
↓ POST /chat → 202 Accepted immediately
|
||||
[deepagents] Python FastAPI — port 8000
|
||||
↓
|
||||
Pre-check: starts with /think? → force_complex=True, strip prefix
|
||||
↓
|
||||
Router (qwen2.5:0.5b, ~1-2s, always warm in VRAM)
|
||||
Structured output: {tier: light|medium|complex, confidence: 0.0-1.0, reply?: str}
|
||||
- light: simple conversational → router answers directly, ~1-2s
|
||||
- medium: needs memory/web search → qwen3:4b + deepagents tools
|
||||
- complex: multi-step research, planning, code → qwen3:8b + subagents
|
||||
force_complex always overrides to complex
|
||||
complex only if confidence >= 0.85 (else downgraded to medium)
|
||||
↓
|
||||
├── light ─────────── router reply used directly (no extra LLM call)
|
||||
├── medium ────────── deepagents qwen3:4b + TodoList + tools
|
||||
└── complex ───────── VRAM flush → deepagents qwen3:8b + TodoList + subagents
|
||||
└→ background: exit_complex_mode (flush 8b, prewarm 4b+router)
|
||||
↓
|
||||
send_telegram_message via grammy MCP
|
||||
↓
|
||||
asyncio.create_task(store_memory_async) — spin-wait GPU idle → openmemory add_memory
|
||||
↕ MCP SSE ↕ HTTP
|
||||
[openmemory] Python + mem0 — port 8765 [SearXNG — port 11437]
|
||||
- add_memory, search_memory, get_all_memories
|
||||
- extractor: qwen2.5:1.5b on GPU Ollama (11436) — 2–5s
|
||||
- embedder: nomic-embed-text on CPU Ollama (11435) — 50–150ms
|
||||
- vector store: Qdrant (port 6333), 768 dims
|
||||
```
|
||||
|
||||
## Three-Tier Model Routing
|
||||
|
||||
| Tier | Model | VRAM | Trigger | Latency |
|
||||
|------|-------|------|---------|---------|
|
||||
| Light | qwen2.5:1.5b (router answers) | ~1.2 GB (shared with extraction) | Router classifies as light | ~2–4s |
|
||||
| Medium | qwen3:4b | ~2.5 GB | Default; router classifies medium | ~20–40s |
|
||||
| Complex | qwen3:8b | ~5.5 GB | `/think` prefix | ~60–120s |
|
||||
|
||||
**Normal VRAM** (light + medium): router/extraction(1.2, shared) + medium(2.5) = ~3.7 GB
|
||||
**Complex VRAM**: 8b alone = ~5.5 GB — must flush others first
|
||||
|
||||
### Router model: qwen2.5:1.5b (not 0.5b)
|
||||
|
||||
qwen2.5:0.5b is too small for reliable classification — tends to output "medium" for everything
|
||||
or produces nonsensical output. qwen2.5:1.5b is already loaded in VRAM for memory extraction,
|
||||
so switching adds zero net VRAM overhead while dramatically improving accuracy.
|
||||
|
||||
Router uses **raw text generation** (not structured output/JSON schema):
|
||||
- Ask model to output one word: `light`, `medium`, or `complex`
|
||||
- Parse with simple keyword matching (fallback: `medium`)
|
||||
- For `light` tier: a second call generates the reply text
|
||||
|
||||
## VRAM Management
|
||||
|
||||
GTX 1070 has 8 GB VRAM. Ollama's auto-eviction can spill models to CPU RAM permanently
|
||||
(all subsequent loads stay on CPU). To prevent this:
|
||||
|
||||
1. **Always flush explicitly** before loading qwen3:8b (`keep_alive=0`)
|
||||
2. **Verify eviction** via `/api/ps` poll (15s timeout) before proceeding
|
||||
3. **Fallback**: timeout → log warning, run medium agent instead
|
||||
4. **Post-complex**: flush 8b immediately, pre-warm 4b + router
|
||||
|
||||
```python
|
||||
# Flush (force immediate unload):
|
||||
POST /api/generate {"model": "qwen3:4b", "prompt": "", "keep_alive": 0}
|
||||
|
||||
# Pre-warm (load into VRAM for 5 min):
|
||||
POST /api/generate {"model": "qwen3:4b", "prompt": "", "keep_alive": 300}
|
||||
```
|
||||
|
||||
## Agents
|
||||
|
||||
**Medium agent** (`build_medium_agent`):
|
||||
- `create_deep_agent` with TodoListMiddleware (auto-included)
|
||||
- Tools: `search_memory`, `get_all_memories`, `web_search`
|
||||
- No subagents
|
||||
|
||||
**Complex agent** (`build_complex_agent`):
|
||||
- `create_deep_agent` with TodoListMiddleware + SubAgentMiddleware
|
||||
- Tools: all agent tools
|
||||
- Subagents:
|
||||
- `research`: web_search only, for thorough multi-query web research
|
||||
- `memory`: search_memory + get_all_memories, for comprehensive context retrieval
|
||||
|
||||
## Concurrency
|
||||
|
||||
| Semaphore | Guards | Notes |
|
||||
|-----------|--------|-------|
|
||||
| `_reply_semaphore(1)` | GPU Ollama (all tiers) | One LLM reply inference at a time |
|
||||
| `_memory_semaphore(1)` | GPU Ollama (qwen2.5:1.5b extraction) | One memory extraction at a time |
|
||||
|
||||
Light path holds `_reply_semaphore` briefly (no GPU inference).
|
||||
Memory extraction spin-waits until `_reply_semaphore` is free (60s timeout).
|
||||
|
||||
## Pipeline
|
||||
|
||||
1. User message → Grammy → `POST /chat` → 202 Accepted
|
||||
2. Background: acquire `_reply_semaphore` → route → run agent tier → send reply
|
||||
3. `asyncio.create_task(store_memory_async)` — spin-waits GPU free, then extracts memories
|
||||
4. For complex: `asyncio.create_task(exit_complex_mode)` — flushes 8b, pre-warms 4b+router
|
||||
|
||||
## External Services (from openai/ stack)
|
||||
|
||||
| Service | Host Port | Role |
|
||||
|---------|-----------|------|
|
||||
| Ollama GPU | 11436 | All reply inference + extraction (qwen2.5:1.5b) |
|
||||
| Ollama CPU | 11435 | Memory embedding (nomic-embed-text) |
|
||||
| Qdrant | 6333 | Vector store for memories |
|
||||
| SearXNG | 11437 | Web search |
|
||||
|
||||
GPU Ollama config: `OLLAMA_MAX_LOADED_MODELS=2`, `OLLAMA_NUM_PARALLEL=1`.
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
adolf/
|
||||
├── docker-compose.yml Services: deepagents, openmemory, grammy
|
||||
├── Dockerfile deepagents container (Python 3.12)
|
||||
├── agent.py FastAPI + three-tier routing + run_agent_task
|
||||
├── router.py Router class — qwen2.5:0.5b structured output routing
|
||||
├── vram_manager.py VRAMManager — flush/prewarm/poll Ollama VRAM
|
||||
├── agent_factory.py build_medium_agent / build_complex_agent (deepagents)
|
||||
├── .env TELEGRAM_BOT_TOKEN (not committed)
|
||||
├── openmemory/
|
||||
│ ├── server.py FastMCP + mem0 MCP tools
|
||||
│ ├── requirements.txt
|
||||
│ └── Dockerfile
|
||||
└── grammy/
|
||||
├── bot.mjs grammY bot + MCP SSE server
|
||||
├── package.json
|
||||
└── Dockerfile
|
||||
```
|
||||
@@ -1,10 +0,0 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN pip install --no-cache-dir deepagents langchain-ollama langgraph \
|
||||
fastapi uvicorn langchain-mcp-adapters langchain-community httpx
|
||||
|
||||
COPY agent.py vram_manager.py router.py agent_factory.py hello_world.py .
|
||||
|
||||
CMD ["uvicorn", "agent:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
309
adolf/agent.py
309
adolf/agent.py
@@ -1,309 +0,0 @@
|
||||
import asyncio
|
||||
import os
|
||||
import time
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
from fastapi import FastAPI, BackgroundTasks
|
||||
from fastapi.responses import JSONResponse
|
||||
from pydantic import BaseModel
|
||||
|
||||
from langchain_ollama import ChatOllama
|
||||
from langchain_mcp_adapters.client import MultiServerMCPClient
|
||||
from langchain_community.utilities import SearxSearchWrapper
|
||||
from langchain_core.tools import Tool
|
||||
|
||||
from vram_manager import VRAMManager
|
||||
from router import Router
|
||||
from agent_factory import build_medium_agent, build_complex_agent
|
||||
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
|
||||
ROUTER_MODEL = os.getenv("DEEPAGENTS_ROUTER_MODEL", "qwen2.5:0.5b")
|
||||
MEDIUM_MODEL = os.getenv("DEEPAGENTS_MODEL", "qwen3:4b")
|
||||
COMPLEX_MODEL = os.getenv("DEEPAGENTS_COMPLEX_MODEL", "qwen3:8b")
|
||||
SEARXNG_URL = os.getenv("SEARXNG_URL", "http://host.docker.internal:11437")
|
||||
OPENMEMORY_URL = os.getenv("OPENMEMORY_URL", "http://openmemory:8765")
|
||||
GRAMMY_URL = os.getenv("GRAMMY_URL", "http://grammy:3001")
|
||||
|
||||
MAX_HISTORY_TURNS = 5
|
||||
_conversation_buffers: dict[str, list] = {}
|
||||
|
||||
MEDIUM_SYSTEM_PROMPT = (
|
||||
"You are a helpful AI assistant talking to a user via Telegram. "
|
||||
"The user's ID is {user_id}. "
|
||||
"IMPORTANT: When calling any memory tool (search_memory, get_all_memories), "
|
||||
"always use user_id=\"{user_id}\". "
|
||||
"Every conversation is automatically saved to memory after you reply — "
|
||||
"you do NOT need to explicitly store anything. "
|
||||
"NEVER tell the user you cannot remember or store information. "
|
||||
"If the user asks you to remember something, acknowledge it and confirm it will be remembered. "
|
||||
"Use search_memory when context from past conversations may be relevant. "
|
||||
"Use web_search for questions about current events or facts you don't know. "
|
||||
"Reply concisely."
|
||||
)
|
||||
|
||||
COMPLEX_SYSTEM_PROMPT = (
|
||||
"You are a capable AI assistant tackling a complex, multi-step task for a Telegram user. "
|
||||
"The user's ID is {user_id}. "
|
||||
"IMPORTANT: When calling any memory tool (search_memory, get_all_memories), "
|
||||
"always use user_id=\"{user_id}\". "
|
||||
"Plan your work using write_todos before diving in. "
|
||||
"Delegate: use the 'research' subagent for thorough web research across multiple queries, "
|
||||
"and the 'memory' subagent to gather comprehensive context from past conversations. "
|
||||
"Every conversation is automatically saved to memory — you do NOT need to store anything. "
|
||||
"NEVER tell the user you cannot remember or store information. "
|
||||
"Produce a thorough, well-structured reply."
|
||||
)
|
||||
|
||||
medium_agent = None
|
||||
complex_agent = None
|
||||
router: Router = None
|
||||
vram_manager: VRAMManager = None
|
||||
mcp_client = None
|
||||
send_tool = None
|
||||
add_memory_tool = None
|
||||
|
||||
# GPU mutex: one LLM inference at a time
|
||||
_reply_semaphore = asyncio.Semaphore(1)
|
||||
# Memory semaphore: one async extraction at a time
|
||||
_memory_semaphore = asyncio.Semaphore(1)
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
global medium_agent, complex_agent, router, vram_manager
|
||||
global mcp_client, send_tool, add_memory_tool
|
||||
|
||||
# Three model instances
|
||||
router_model = ChatOllama(
|
||||
model=ROUTER_MODEL, base_url=OLLAMA_BASE_URL, think=False, num_ctx=4096,
|
||||
temperature=0, # deterministic classification
|
||||
)
|
||||
medium_model = ChatOllama(
|
||||
model=MEDIUM_MODEL, base_url=OLLAMA_BASE_URL, think=False, num_ctx=8192
|
||||
)
|
||||
complex_model = ChatOllama(
|
||||
model=COMPLEX_MODEL, base_url=OLLAMA_BASE_URL, think=True, num_ctx=16384
|
||||
)
|
||||
|
||||
vram_manager = VRAMManager(base_url=OLLAMA_BASE_URL)
|
||||
router = Router(model=router_model)
|
||||
|
||||
mcp_connections = {
|
||||
"openmemory": {"transport": "sse", "url": f"{OPENMEMORY_URL}/sse"},
|
||||
"grammy": {"transport": "sse", "url": f"{GRAMMY_URL}/sse"},
|
||||
}
|
||||
mcp_client = MultiServerMCPClient(mcp_connections)
|
||||
for attempt in range(12):
|
||||
try:
|
||||
mcp_tools = await mcp_client.get_tools()
|
||||
break
|
||||
except Exception as e:
|
||||
if attempt == 11:
|
||||
raise
|
||||
print(f"[agent] MCP not ready (attempt {attempt + 1}/12): {e}. Retrying in 5s...")
|
||||
await asyncio.sleep(5)
|
||||
|
||||
send_tool = next((t for t in mcp_tools if t.name == "send_telegram_message"), None)
|
||||
add_memory_tool = next((t for t in mcp_tools if t.name == "add_memory"), None)
|
||||
agent_tools = [t for t in mcp_tools if t.name not in ("send_telegram_message", "add_memory")]
|
||||
|
||||
searx = SearxSearchWrapper(searx_host=SEARXNG_URL)
|
||||
agent_tools.append(Tool(
|
||||
name="web_search",
|
||||
func=searx.run,
|
||||
description="Search the web for current information",
|
||||
))
|
||||
|
||||
# Build agents (system_prompt filled per-request with user_id)
|
||||
medium_agent = build_medium_agent(
|
||||
model=medium_model,
|
||||
agent_tools=agent_tools,
|
||||
system_prompt=MEDIUM_SYSTEM_PROMPT.format(user_id="{user_id}"),
|
||||
)
|
||||
complex_agent = build_complex_agent(
|
||||
model=complex_model,
|
||||
agent_tools=agent_tools,
|
||||
system_prompt=COMPLEX_SYSTEM_PROMPT.format(user_id="{user_id}"),
|
||||
)
|
||||
|
||||
print(
|
||||
f"[agent] three-tier: router={ROUTER_MODEL} | medium={MEDIUM_MODEL} | complex={COMPLEX_MODEL}",
|
||||
flush=True,
|
||||
)
|
||||
print(f"[agent] agent tools: {[t.name for t in agent_tools]}", flush=True)
|
||||
|
||||
yield
|
||||
|
||||
medium_agent = None
|
||||
complex_agent = None
|
||||
router = None
|
||||
vram_manager = None
|
||||
mcp_client = None
|
||||
send_tool = None
|
||||
add_memory_tool = None
|
||||
|
||||
|
||||
app = FastAPI(lifespan=lifespan)
|
||||
|
||||
|
||||
class ChatRequest(BaseModel):
|
||||
message: str
|
||||
chat_id: str
|
||||
|
||||
|
||||
async def store_memory_async(conversation: str, user_id: str):
|
||||
"""Fire-and-forget: extract and store memories after GPU is free."""
|
||||
t_wait = time.monotonic()
|
||||
while _reply_semaphore.locked():
|
||||
if time.monotonic() - t_wait > 60:
|
||||
print(f"[memory] spin-wait timeout 60s, proceeding for user {user_id}", flush=True)
|
||||
break
|
||||
await asyncio.sleep(0.5)
|
||||
async with _memory_semaphore:
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
await add_memory_tool.ainvoke({"text": conversation, "user_id": user_id})
|
||||
print(f"[memory] stored in {time.monotonic() - t0:.1f}s for user {user_id}", flush=True)
|
||||
except Exception as e:
|
||||
print(f"[memory] error after {time.monotonic() - t0:.1f}s: {e}", flush=True)
|
||||
|
||||
|
||||
def _extract_final_text(result) -> str | None:
|
||||
"""Extract last AIMessage content from agent result."""
|
||||
msgs = result.get("messages", [])
|
||||
for m in reversed(msgs):
|
||||
if type(m).__name__ == "AIMessage" and getattr(m, "content", ""):
|
||||
return m.content
|
||||
# deepagents may return output differently
|
||||
if isinstance(result, dict) and result.get("output"):
|
||||
return result["output"]
|
||||
return None
|
||||
|
||||
|
||||
def _log_messages(result):
|
||||
msgs = result.get("messages", [])
|
||||
for m in msgs:
|
||||
role = type(m).__name__
|
||||
content = getattr(m, "content", "")
|
||||
tool_calls = getattr(m, "tool_calls", [])
|
||||
if content:
|
||||
print(f"[agent] {role}: {str(content)[:150]}", flush=True)
|
||||
for tc in tool_calls:
|
||||
print(f"[agent] {role} → {tc['name']}({tc['args']})", flush=True)
|
||||
|
||||
|
||||
async def run_agent_task(message: str, chat_id: str):
|
||||
print(f"[agent] queued: {message[:80]!r} chat={chat_id}", flush=True)
|
||||
|
||||
# Pre-check: /think prefix forces complex tier
|
||||
force_complex = False
|
||||
clean_message = message
|
||||
if message.startswith("/think "):
|
||||
force_complex = True
|
||||
clean_message = message[len("/think "):]
|
||||
print("[agent] /think prefix → force_complex=True", flush=True)
|
||||
|
||||
async with _reply_semaphore:
|
||||
t0 = time.monotonic()
|
||||
history = _conversation_buffers.get(chat_id, [])
|
||||
print(f"[agent] running: {clean_message[:80]!r}", flush=True)
|
||||
|
||||
# Route the message
|
||||
tier, light_reply = await router.route(clean_message, history, force_complex)
|
||||
print(f"[agent] tier={tier} message={clean_message[:60]!r}", flush=True)
|
||||
|
||||
final_text = None
|
||||
try:
|
||||
if tier == "light":
|
||||
final_text = light_reply
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
print(f"[agent] light path: answered by router", flush=True)
|
||||
|
||||
elif tier == "medium":
|
||||
system_prompt = MEDIUM_SYSTEM_PROMPT.format(user_id=chat_id)
|
||||
result = await medium_agent.ainvoke({
|
||||
"messages": [
|
||||
{"role": "system", "content": system_prompt},
|
||||
*history,
|
||||
{"role": "user", "content": clean_message},
|
||||
]
|
||||
})
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
_log_messages(result)
|
||||
final_text = _extract_final_text(result)
|
||||
|
||||
else: # complex
|
||||
ok = await vram_manager.enter_complex_mode()
|
||||
if not ok:
|
||||
print("[agent] complex→medium fallback (eviction timeout)", flush=True)
|
||||
tier = "medium"
|
||||
system_prompt = MEDIUM_SYSTEM_PROMPT.format(user_id=chat_id)
|
||||
result = await medium_agent.ainvoke({
|
||||
"messages": [
|
||||
{"role": "system", "content": system_prompt},
|
||||
*history,
|
||||
{"role": "user", "content": clean_message},
|
||||
]
|
||||
})
|
||||
else:
|
||||
system_prompt = COMPLEX_SYSTEM_PROMPT.format(user_id=chat_id)
|
||||
result = await complex_agent.ainvoke({
|
||||
"messages": [
|
||||
{"role": "system", "content": system_prompt},
|
||||
*history,
|
||||
{"role": "user", "content": clean_message},
|
||||
]
|
||||
})
|
||||
asyncio.create_task(vram_manager.exit_complex_mode())
|
||||
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
_log_messages(result)
|
||||
final_text = _extract_final_text(result)
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
llm_elapsed = time.monotonic() - t0
|
||||
print(f"[agent] error after {llm_elapsed:.1f}s for chat {chat_id}: {e}", flush=True)
|
||||
traceback.print_exc()
|
||||
|
||||
# Send reply via grammy MCP (split if > Telegram's 4096-char limit)
|
||||
if final_text and send_tool:
|
||||
t1 = time.monotonic()
|
||||
MAX_TG = 4000 # leave headroom below the 4096 hard limit
|
||||
chunks = [final_text[i:i + MAX_TG] for i in range(0, len(final_text), MAX_TG)]
|
||||
for chunk in chunks:
|
||||
await send_tool.ainvoke({"chat_id": chat_id, "text": chunk})
|
||||
send_elapsed = time.monotonic() - t1
|
||||
# Log in format compatible with test_pipeline.py parser
|
||||
print(
|
||||
f"[agent] replied in {time.monotonic() - t0:.1f}s "
|
||||
f"(llm={llm_elapsed:.1f}s, send={send_elapsed:.1f}s) tier={tier}",
|
||||
flush=True,
|
||||
)
|
||||
elif not final_text:
|
||||
print("[agent] warning: no text reply from agent", flush=True)
|
||||
|
||||
# Update conversation buffer
|
||||
if final_text:
|
||||
buf = _conversation_buffers.get(chat_id, [])
|
||||
buf.append({"role": "user", "content": clean_message})
|
||||
buf.append({"role": "assistant", "content": final_text})
|
||||
_conversation_buffers[chat_id] = buf[-(MAX_HISTORY_TURNS * 2):]
|
||||
|
||||
# Async memory storage (fire-and-forget)
|
||||
if add_memory_tool and final_text:
|
||||
conversation = f"User: {clean_message}\nAssistant: {final_text}"
|
||||
asyncio.create_task(store_memory_async(conversation, chat_id))
|
||||
|
||||
|
||||
@app.post("/chat")
|
||||
async def chat(request: ChatRequest, background_tasks: BackgroundTasks):
|
||||
if medium_agent is None:
|
||||
return JSONResponse(status_code=503, content={"error": "Agent not ready"})
|
||||
background_tasks.add_task(run_agent_task, request.message, request.chat_id)
|
||||
return JSONResponse(status_code=202, content={"status": "accepted"})
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "ok", "agent_ready": medium_agent is not None}
|
||||
@@ -1,54 +0,0 @@
|
||||
from deepagents import create_deep_agent, SubAgent
|
||||
|
||||
|
||||
def build_medium_agent(model, agent_tools: list, system_prompt: str):
|
||||
"""Medium agent: create_deep_agent with TodoList planning, no subagents."""
|
||||
return create_deep_agent(
|
||||
model=model,
|
||||
tools=agent_tools,
|
||||
system_prompt=system_prompt,
|
||||
)
|
||||
|
||||
|
||||
def build_complex_agent(model, agent_tools: list, system_prompt: str):
|
||||
"""Complex agent: create_deep_agent with TodoList planning + research/memory subagents."""
|
||||
web_tools = [t for t in agent_tools if getattr(t, "name", "") == "web_search"]
|
||||
memory_tools = [
|
||||
t for t in agent_tools
|
||||
if getattr(t, "name", "") in ("search_memory", "get_all_memories")
|
||||
]
|
||||
|
||||
research_sub: SubAgent = {
|
||||
"name": "research",
|
||||
"description": (
|
||||
"Runs multiple web searches in parallel and synthesizes findings. "
|
||||
"Use for thorough research tasks requiring several queries."
|
||||
),
|
||||
"system_prompt": (
|
||||
"You are a research specialist. Search the web thoroughly using multiple queries. "
|
||||
"Cite sources and synthesize information into a clear summary."
|
||||
),
|
||||
"tools": web_tools,
|
||||
"model": model,
|
||||
}
|
||||
|
||||
memory_sub: SubAgent = {
|
||||
"name": "memory",
|
||||
"description": (
|
||||
"Searches and retrieves all relevant memories about the user comprehensively. "
|
||||
"Use to gather full context from past conversations."
|
||||
),
|
||||
"system_prompt": (
|
||||
"You are a memory specialist. Search broadly using multiple queries. "
|
||||
"Return all relevant facts and context you find."
|
||||
),
|
||||
"tools": memory_tools,
|
||||
"model": model,
|
||||
}
|
||||
|
||||
return create_deep_agent(
|
||||
model=model,
|
||||
tools=agent_tools,
|
||||
system_prompt=system_prompt,
|
||||
subagents=[research_sub, memory_sub],
|
||||
)
|
||||
@@ -1,43 +0,0 @@
|
||||
services:
|
||||
deepagents:
|
||||
build: .
|
||||
container_name: deepagents
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
- OLLAMA_BASE_URL=http://host.docker.internal:11436
|
||||
- DEEPAGENTS_MODEL=qwen3:4b
|
||||
- DEEPAGENTS_COMPLEX_MODEL=qwen3:8b
|
||||
- DEEPAGENTS_ROUTER_MODEL=qwen2.5:1.5b
|
||||
- SEARXNG_URL=http://host.docker.internal:11437
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
depends_on:
|
||||
- openmemory
|
||||
- grammy
|
||||
restart: unless-stopped
|
||||
|
||||
openmemory:
|
||||
build: ./openmemory
|
||||
container_name: openmemory
|
||||
ports:
|
||||
- "8765:8765"
|
||||
environment:
|
||||
# Extraction LLM (qwen2.5:1.5b) runs on GPU after reply — fast 2-5s extraction
|
||||
- OLLAMA_GPU_URL=http://host.docker.internal:11436
|
||||
# Embedding (nomic-embed-text) runs on CPU — fast enough for search (50-150ms)
|
||||
- OLLAMA_CPU_URL=http://host.docker.internal:11435
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
restart: unless-stopped
|
||||
|
||||
grammy:
|
||||
build: ./grammy
|
||||
container_name: grammy
|
||||
ports:
|
||||
- "3001:3001"
|
||||
environment:
|
||||
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
|
||||
- DEEPAGENTS_URL=http://deepagents:8000
|
||||
restart: unless-stopped
|
||||
@@ -1,62 +0,0 @@
|
||||
import os
|
||||
from mcp.server.fastmcp import FastMCP
|
||||
from mem0 import Memory
|
||||
|
||||
OLLAMA_CPU_URL = os.getenv("OLLAMA_CPU_URL", "http://host.docker.internal:11435")
|
||||
QDRANT_HOST = os.getenv("QDRANT_HOST", "host.docker.internal")
|
||||
QDRANT_PORT = int(os.getenv("QDRANT_PORT", "6333"))
|
||||
|
||||
config = {
|
||||
"llm": {
|
||||
"provider": "ollama",
|
||||
"config": {
|
||||
"model": "qwen2.5:1.5b",
|
||||
"ollama_base_url": OLLAMA_CPU_URL,
|
||||
},
|
||||
},
|
||||
"embedder": {
|
||||
"provider": "ollama",
|
||||
"config": {
|
||||
"model": "nomic-embed-text",
|
||||
"ollama_base_url": OLLAMA_CPU_URL,
|
||||
},
|
||||
},
|
||||
"vector_store": {
|
||||
"provider": "qdrant",
|
||||
"config": {
|
||||
"collection_name": "adolf_memories",
|
||||
"embedding_model_dims": 768,
|
||||
"host": QDRANT_HOST,
|
||||
"port": QDRANT_PORT,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
memory = Memory.from_config(config)
|
||||
|
||||
mcp = FastMCP("openmemory", host="0.0.0.0", port=8765)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def add_memory(text: str, user_id: str = "default") -> str:
|
||||
"""Store a memory for a user."""
|
||||
result = memory.add(text, user_id=user_id)
|
||||
return str(result)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def search_memory(query: str, user_id: str = "default") -> str:
|
||||
"""Search memories for a user using semantic similarity."""
|
||||
results = memory.search(query, user_id=user_id)
|
||||
return str(results)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_all_memories(user_id: str = "default") -> str:
|
||||
"""Get all stored memories for a user."""
|
||||
results = memory.get_all(user_id=user_id)
|
||||
return str(results)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run(transport="sse")
|
||||
@@ -1,13 +0,0 @@
|
||||
# Potential Directions
|
||||
|
||||
## CPU Extraction Model Candidates (mem0 / openmemory)
|
||||
|
||||
Replacing `gemma3:1b` — documented JSON/structured output failures make it unreliable for mem0's extraction pipeline.
|
||||
|
||||
| Rank | Model | Size | CPU speed | JSON reliability | Notes |
|
||||
|------|-------|------|-----------|-----------------|-------|
|
||||
| 1 | `qwen2.5:1.5b` | ~934 MB | 25–40 tok/s | Excellent | Best fit: fast + structured output, 18T token training |
|
||||
| 2 | `qwen2.5:3b` | ~1.9 GB | 15–25 tok/s | Excellent | Quality upgrade, same family |
|
||||
| 3 | `llama3.2:3b` | ~2 GB | 15–25 tok/s | Good | Highest IFEval score (77.4) in class |
|
||||
| 4 | `smollm2:1.7b` | ~1.1 GB | 25–35 tok/s | Moderate | Use temp=0; NuExtract-1.5-smol is fine-tuned variant |
|
||||
| 5 | `phi4-mini` | ~2.5 GB | 10–17 tok/s | Good | Function calling support, borderline CPU speed |
|
||||
138
adolf/router.py
138
adolf/router.py
@@ -1,138 +0,0 @@
|
||||
import re
|
||||
from typing import Optional
|
||||
from langchain_core.messages import SystemMessage, HumanMessage
|
||||
|
||||
# ── Regex pre-classifier ──────────────────────────────────────────────────────
|
||||
# Catches obvious light-tier patterns before calling the LLM.
|
||||
# Keyed by regex → compiled pattern.
|
||||
_LIGHT_PATTERNS = re.compile(
|
||||
r"^("
|
||||
# Greetings / farewells
|
||||
r"hi|hello|hey|yo|sup|howdy|good morning|good evening|good night|good afternoon"
|
||||
r"|bye|goodbye|see you|cya|later|ttyl"
|
||||
# Acknowledgements / small talk
|
||||
r"|thanks?|thank you|thx|ty|ok|okay|k|cool|great|awesome|perfect|sounds good|got it|nice|sure"
|
||||
r"|how are you|how are you\?|how are you doing(\s+today)?[?!.]*"
|
||||
r"|what.?s up"
|
||||
# Calendar facts: "what day comes after X?" / "what comes after X?"
|
||||
r"|what\s+day\s+(comes\s+after|follows|is\s+after)\s+\w+[?!.]*"
|
||||
r"|what\s+comes\s+after\s+\w+[?!.]*"
|
||||
# Acronym expansions: "what does X stand for?"
|
||||
r"|what\s+does\s+\w+\s+stand\s+for[?!.]*"
|
||||
r")[\s!.?]*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# ── LLM classification prompt ─────────────────────────────────────────────────
|
||||
CLASSIFY_PROMPT = """Classify the message. Output ONLY one word: light, medium, or complex.
|
||||
|
||||
LIGHT = answerable from general knowledge, no internet needed:
|
||||
what is 2+2 / what is the capital of France / name the three primary colors
|
||||
tell me a short joke / is the sky blue / is water wet
|
||||
|
||||
MEDIUM = requires web search or the user's stored memories:
|
||||
current weather / today's news / Bitcoin price / what did we talk about
|
||||
|
||||
COMPLEX = /think prefix only:
|
||||
/think compare frameworks / /think plan a trip
|
||||
|
||||
Message: {message}
|
||||
Output (one word only — light, medium, or complex):"""
|
||||
|
||||
LIGHT_REPLY_PROMPT = """You are a helpful Telegram assistant. Answer briefly and naturally (1-3 sentences). Be friendly."""
|
||||
|
||||
|
||||
def _format_history(history: list[dict]) -> str:
|
||||
if not history:
|
||||
return "(none)"
|
||||
lines = []
|
||||
for msg in history:
|
||||
role = msg.get("role", "?")
|
||||
content = str(msg.get("content", ""))[:200]
|
||||
lines.append(f"{role}: {content}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _parse_tier(text: str) -> str:
|
||||
"""Extract tier from raw model output. Default to medium."""
|
||||
t = text.strip().lower()
|
||||
snippet = t[:60]
|
||||
if "complex" in snippet:
|
||||
return "complex"
|
||||
if "medium" in snippet:
|
||||
return "medium"
|
||||
if "light" in snippet:
|
||||
return "light"
|
||||
# Model invented a descriptive category (e.g. "simplefact", "trivial", "basic") →
|
||||
# treat as light since it recognised the question doesn't need tools
|
||||
if any(w in snippet for w in ("simple", "fact", "trivial", "basic", "easy", "general")):
|
||||
return "light"
|
||||
return "medium" # safe default
|
||||
|
||||
|
||||
class Router:
|
||||
def __init__(self, model):
|
||||
self.model = model
|
||||
|
||||
async def route(
|
||||
self,
|
||||
message: str,
|
||||
history: list[dict],
|
||||
force_complex: bool = False,
|
||||
) -> tuple[str, Optional[str]]:
|
||||
"""
|
||||
Returns (tier, reply_or_None).
|
||||
For light tier: also generates the reply with a second call.
|
||||
For medium/complex: reply is None.
|
||||
"""
|
||||
if force_complex:
|
||||
return "complex", None
|
||||
|
||||
# Step 0: regex pre-classification for obvious light patterns
|
||||
if _LIGHT_PATTERNS.match(message.strip()):
|
||||
print(f"[router] regex→light", flush=True)
|
||||
return await self._generate_light_reply(message, history)
|
||||
|
||||
# Step 1: LLM classification with raw text output
|
||||
try:
|
||||
classify_response = await self.model.ainvoke([
|
||||
HumanMessage(content=CLASSIFY_PROMPT.format(message=message)),
|
||||
])
|
||||
raw = classify_response.content or ""
|
||||
raw = re.sub(r"<think>.*?</think>", "", raw, flags=re.DOTALL).strip()
|
||||
tier = _parse_tier(raw)
|
||||
|
||||
if tier == "complex" and not message.startswith("/think"):
|
||||
tier = "medium"
|
||||
|
||||
print(f"[router] raw={raw[:30]!r} → tier={tier}", flush=True)
|
||||
except Exception as e:
|
||||
print(f"[router] classify error, defaulting to medium: {e}", flush=True)
|
||||
return "medium", None
|
||||
|
||||
if tier != "light":
|
||||
return tier, None
|
||||
|
||||
return await self._generate_light_reply(message, history)
|
||||
|
||||
async def _generate_light_reply(
|
||||
self, message: str, history: list[dict]
|
||||
) -> tuple[str, Optional[str]]:
|
||||
"""Generate a short reply using the router model for light-tier messages."""
|
||||
history_text = _format_history(history)
|
||||
context = f"\nConversation history:\n{history_text}" if history else ""
|
||||
try:
|
||||
reply_response = await self.model.ainvoke([
|
||||
SystemMessage(content=LIGHT_REPLY_PROMPT + context),
|
||||
HumanMessage(content=message),
|
||||
])
|
||||
reply_text = reply_response.content or ""
|
||||
reply_text = re.sub(r"<think>.*?</think>", "", reply_text, flags=re.DOTALL).strip()
|
||||
if not reply_text:
|
||||
print("[router] light reply empty, falling back to medium", flush=True)
|
||||
return "medium", None
|
||||
print(f"[router] light reply: {len(reply_text)} chars", flush=True)
|
||||
return "light", reply_text
|
||||
except Exception as e:
|
||||
print(f"[router] light reply error, falling back to medium: {e}", flush=True)
|
||||
return "medium", None
|
||||
@@ -1,905 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Adolf pipeline integration test with end-to-end timing profiling.
|
||||
|
||||
Tests:
|
||||
1. Service health (deepagents, openmemory, grammy MCP SSE)
|
||||
2. GPU Ollama models
|
||||
3. CPU Ollama models
|
||||
4. Qdrant collection + vector dims
|
||||
5. SearXNG
|
||||
6. Name store — "remember that your name is <RandomName>"
|
||||
7. Qdrant point added after store
|
||||
8. Name recall — "what is your name?" → reply contains <RandomName>
|
||||
9. Timing profile + bottleneck report
|
||||
10. Easy benchmark — 10 easy questions → all must route to light
|
||||
11. Medium benchmark — 10 medium questions → must route to medium (or light, never complex)
|
||||
12. Hard benchmark — 10 /think questions → all must route to complex; VRAM flush verified
|
||||
|
||||
Usage:
|
||||
python3 test_pipeline.py [--chat-id CHAT_ID]
|
||||
[--bench-only] skip sections 1-9, run 10+11+12
|
||||
[--easy-only] skip 1-9 and 11+12, run only section 10
|
||||
[--medium-only] skip 1-9 and 10+12, run only section 11
|
||||
[--hard-only] skip 1-9 and 10+11, run only section 12
|
||||
[--no-bench] skip sections 10-12
|
||||
|
||||
Timing is extracted from deepagents container logs, not estimated from sleeps.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import http.client
|
||||
import json
|
||||
import random
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import urllib.request
|
||||
|
||||
# ── config ────────────────────────────────────────────────────────────────────
|
||||
DEEPAGENTS = "http://localhost:8000"
|
||||
OPENMEMORY = "http://localhost:8765"
|
||||
GRAMMY_HOST = "localhost"
|
||||
GRAMMY_PORT = 3001
|
||||
OLLAMA_GPU = "http://localhost:11436"
|
||||
OLLAMA_CPU = "http://localhost:11435"
|
||||
QDRANT = "http://localhost:6333"
|
||||
SEARXNG = "http://localhost:11437"
|
||||
COMPOSE_FILE = "/home/alvis/agap_git/adolf/docker-compose.yml"
|
||||
DEFAULT_CHAT_ID = "346967270"
|
||||
|
||||
NAMES = [
|
||||
"Maximilian", "Cornelius", "Zephyr", "Archibald", "Balthazar",
|
||||
"Ignatius", "Lysander", "Octavian", "Reginald", "Sylvester",
|
||||
]
|
||||
|
||||
# ── benchmark questions ───────────────────────────────────────────────────────
|
||||
BENCHMARK = {
|
||||
"easy": [
|
||||
"hi",
|
||||
"what is 2+2?",
|
||||
"what is the capital of France?",
|
||||
"tell me a short joke",
|
||||
"how are you doing today?",
|
||||
"thanks!",
|
||||
"what day comes after Wednesday?",
|
||||
"name the three primary colors",
|
||||
"is the sky blue?",
|
||||
"what does CPU stand for?",
|
||||
],
|
||||
"medium": [
|
||||
"what is the current weather in Berlin?",
|
||||
"find the latest news about artificial intelligence",
|
||||
"what is the current price of Bitcoin?",
|
||||
"search for a good pasta carbonara recipe",
|
||||
"what movies are in theaters this week?",
|
||||
"find Python tutorials for beginners",
|
||||
"who won the last FIFA World Cup?",
|
||||
"do you remember what we talked about before?",
|
||||
"search for the best coffee shops in Tokyo",
|
||||
"what is happening in the tech industry this week?",
|
||||
],
|
||||
"hard": [
|
||||
"/think compare the top 3 Python web frameworks (Django, FastAPI, Flask) and recommend one for a production REST API",
|
||||
"/think research the history of artificial intelligence and create a timeline of key milestones",
|
||||
"/think plan a 7-day trip to Japan with daily itinerary, accommodation suggestions, and budget breakdown",
|
||||
"/think analyze microservices vs monolithic architecture: pros, cons, and when to choose each",
|
||||
"/think write a Python script that reads a CSV file, cleans the data, and generates summary statistics",
|
||||
"/think research quantum computing: explain the key concepts and how it differs from classical computing",
|
||||
"/think compare PostgreSQL, MongoDB, and Redis — when to use each and what are the trade-offs?",
|
||||
"/think create a comprehensive Docker deployment guide covering best practices for production",
|
||||
"/think research climate change: summarize the latest IPCC findings and key data points",
|
||||
"/think design a REST API with authentication, rate limiting, and proper error handling — provide architecture and code outline",
|
||||
],
|
||||
}
|
||||
|
||||
PASS = "\033[32mPASS\033[0m"
|
||||
FAIL = "\033[31mFAIL\033[0m"
|
||||
INFO = "\033[36mINFO\033[0m"
|
||||
WARN = "\033[33mWARN\033[0m"
|
||||
|
||||
results = []
|
||||
timings = {} # label → float seconds | None
|
||||
|
||||
|
||||
# ── helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
def report(name, ok, detail=""):
|
||||
tag = PASS if ok else FAIL
|
||||
print(f" [{tag}] {name}" + (f" — {detail}" if detail else ""))
|
||||
results.append((name, ok))
|
||||
|
||||
|
||||
def tf(v):
|
||||
"""Format timing value."""
|
||||
return f"{v:6.2f}s" if v is not None else " n/a"
|
||||
|
||||
|
||||
def get(url, timeout=5):
|
||||
with urllib.request.urlopen(urllib.request.Request(url), timeout=timeout) as r:
|
||||
return r.status, r.read().decode()
|
||||
|
||||
|
||||
def post_json(url, payload, timeout=10):
|
||||
data = json.dumps(payload).encode()
|
||||
req = urllib.request.Request(url, data=data,
|
||||
headers={"Content-Type": "application/json"},
|
||||
method="POST")
|
||||
with urllib.request.urlopen(req, timeout=timeout) as r:
|
||||
return r.status, json.loads(r.read().decode())
|
||||
|
||||
|
||||
def check_sse(host, port, path):
|
||||
try:
|
||||
conn = http.client.HTTPConnection(host, port, timeout=5)
|
||||
conn.request("GET", path, headers={"Accept": "text/event-stream"})
|
||||
r = conn.getresponse()
|
||||
conn.close()
|
||||
return r.status == 200, f"HTTP {r.status}"
|
||||
except Exception as e:
|
||||
return False, str(e)
|
||||
|
||||
|
||||
def qdrant_count():
|
||||
try:
|
||||
_, body = get(f"{QDRANT}/collections/adolf_memories")
|
||||
return json.loads(body).get("result", {}).get("points_count", 0)
|
||||
except Exception:
|
||||
return 0
|
||||
|
||||
|
||||
def fetch_logs(since_s=600):
|
||||
"""Return deepagents log lines from the last since_s seconds."""
|
||||
try:
|
||||
r = subprocess.run(
|
||||
["docker", "compose", "-f", COMPOSE_FILE, "logs", "deepagents",
|
||||
f"--since={int(since_s)}s", "--no-log-prefix"],
|
||||
capture_output=True, text=True, timeout=15,
|
||||
)
|
||||
return r.stdout.splitlines()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
def parse_run_block(lines, msg_prefix):
|
||||
"""
|
||||
Scan log lines for the LAST '[agent] running: <msg_prefix>' block.
|
||||
Extracts reply timing, tier, and memory timing from that block.
|
||||
|
||||
Returns dict or None if the reply has not appeared in logs yet.
|
||||
Dict keys:
|
||||
reply_total, llm, send, tier, reply_text — from "[agent] replied in ..."
|
||||
memory_s — from "[memory] stored in ..."
|
||||
memory_error — True if "[memory] error" found
|
||||
"""
|
||||
search = msg_prefix[:50]
|
||||
start_idx = None
|
||||
for i, line in enumerate(lines):
|
||||
if "[agent] running:" in line and search in line:
|
||||
start_idx = i # keep updating — we want the LAST occurrence
|
||||
|
||||
if start_idx is None:
|
||||
return None
|
||||
|
||||
block = lines[start_idx:]
|
||||
last_ai_text = None
|
||||
reply_data = None
|
||||
|
||||
for j, line in enumerate(block):
|
||||
# Track last non-tool AIMessage (the final reply)
|
||||
if "AIMessage:" in line and "→" not in line:
|
||||
txt = line.split("AIMessage:", 1)[-1].strip()
|
||||
if txt:
|
||||
last_ai_text = txt
|
||||
|
||||
# For light tier: router reply is stored in _conversation_buffers directly
|
||||
# so there may be no AIMessage log — grab from tier=light line
|
||||
if "[agent] tier=light" in line and "message=" in line:
|
||||
# Extract preview text logged elsewhere if available
|
||||
pass
|
||||
|
||||
m = re.search(r"replied in ([\d.]+)s \(llm=([\d.]+)s, send=([\d.]+)s\)", line)
|
||||
if m:
|
||||
# Extract optional tier tag at end of line
|
||||
tier_m = re.search(r"\btier=(\w+)", line)
|
||||
tier = tier_m.group(1) if tier_m else "unknown"
|
||||
reply_data = {
|
||||
"reply_total": float(m.group(1)),
|
||||
"llm": float(m.group(2)),
|
||||
"send": float(m.group(3)),
|
||||
"tier": tier,
|
||||
"reply_text": last_ai_text,
|
||||
"memory_s": None,
|
||||
"memory_error": False,
|
||||
"_j": j,
|
||||
}
|
||||
break
|
||||
|
||||
if reply_data is None:
|
||||
return None # reply not in logs yet
|
||||
|
||||
# Memory line can appear after the next "[agent] running:" — no stop condition
|
||||
for line in block[reply_data["_j"] + 1:]:
|
||||
mm = re.search(r"\[memory\] stored in ([\d.]+)s", line)
|
||||
if mm:
|
||||
reply_data["memory_s"] = float(mm.group(1))
|
||||
break
|
||||
if "[memory] error" in line:
|
||||
reply_data["memory_error"] = True
|
||||
break
|
||||
|
||||
return reply_data
|
||||
|
||||
|
||||
def wait_for(label, msg_prefix, timeout_s=200, need_memory=True):
|
||||
"""
|
||||
Poll deepagents logs until the message is fully processed.
|
||||
Shows a live progress line.
|
||||
Returns timing dict or None on timeout.
|
||||
"""
|
||||
t_start = time.monotonic()
|
||||
deadline = t_start + timeout_s
|
||||
tick = 0
|
||||
last_result = None
|
||||
|
||||
while time.monotonic() < deadline:
|
||||
# Window grows with elapsed time — never miss a line that appeared late
|
||||
since = int(time.monotonic() - t_start) + 90
|
||||
lines = fetch_logs(since_s=since)
|
||||
result = parse_run_block(lines, msg_prefix)
|
||||
|
||||
if result:
|
||||
last_result = result
|
||||
has_mem = result["memory_s"] is not None or result["memory_error"]
|
||||
if (not need_memory) or has_mem:
|
||||
elapsed = time.monotonic() - t_start
|
||||
print(f"\r [{label}] done after {elapsed:.0f}s{' ' * 30}")
|
||||
return result
|
||||
|
||||
time.sleep(4)
|
||||
tick += 1
|
||||
rem = int(deadline - time.monotonic())
|
||||
if last_result:
|
||||
phase = "waiting for memory..." if need_memory else "done"
|
||||
else:
|
||||
phase = "waiting for LLM reply..."
|
||||
print(f"\r [{label}] {tick*4}s elapsed, {rem}s left — {phase} ", end="", flush=True)
|
||||
|
||||
print(f"\r [{label}] TIMEOUT after {timeout_s}s{' ' * 30}")
|
||||
return None
|
||||
|
||||
|
||||
# ── args ──────────────────────────────────────────────────────────────────────
|
||||
parser = argparse.ArgumentParser(description="Adolf pipeline test")
|
||||
parser.add_argument("--chat-id", default=DEFAULT_CHAT_ID)
|
||||
parser.add_argument("--bench-only", action="store_true",
|
||||
help="Skip sections 1-9, run sections 10+11 (both benchmarks)")
|
||||
parser.add_argument("--easy-only", action="store_true",
|
||||
help="Skip sections 1-9 and 11, run only section 10 (easy benchmark)")
|
||||
parser.add_argument("--medium-only", action="store_true",
|
||||
help="Skip sections 1-9 and 10, run only section 11 (medium benchmark)")
|
||||
parser.add_argument("--hard-only", action="store_true",
|
||||
help="Skip sections 1-9 and 10+11, run only section 12 (hard benchmark)")
|
||||
parser.add_argument("--no-bench", action="store_true",
|
||||
help="Skip sections 10-12 (all benchmarks)")
|
||||
args = parser.parse_args()
|
||||
CHAT_ID = args.chat_id
|
||||
|
||||
# Derived flags for readability
|
||||
_skip_pipeline = args.bench_only or args.easy_only or args.medium_only or args.hard_only
|
||||
_run_easy = not args.no_bench and not args.medium_only and not args.hard_only
|
||||
_run_medium = not args.no_bench and not args.easy_only and not args.hard_only
|
||||
_run_hard = not args.no_bench and not args.easy_only and not args.medium_only
|
||||
|
||||
random_name = random.choice(NAMES)
|
||||
|
||||
if not _skip_pipeline:
|
||||
print(f"\n Test name : \033[1m{random_name}\033[0m")
|
||||
print(f" Chat ID : {CHAT_ID}")
|
||||
|
||||
|
||||
# ── 1. service health ─────────────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 1. Service health")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{DEEPAGENTS}/health")
|
||||
data = json.loads(body)
|
||||
ok = status == 200 and data.get("agent_ready") is True
|
||||
report("deepagents /health — agent_ready", ok, f"agent_ready={data.get('agent_ready')}")
|
||||
except Exception as e:
|
||||
report("deepagents /health", False, str(e))
|
||||
|
||||
ok, detail = check_sse("localhost", 8765, "/sse")
|
||||
report("openmemory /sse reachable", ok, detail)
|
||||
|
||||
ok, detail = check_sse(GRAMMY_HOST, GRAMMY_PORT, "/sse")
|
||||
report("grammy /sse reachable", ok, detail)
|
||||
|
||||
timings["health_check"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 2. GPU Ollama ─────────────────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 2. GPU Ollama (port 11436)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{OLLAMA_GPU}/api/tags")
|
||||
models = [m["name"] for m in json.loads(body).get("models", [])]
|
||||
has_qwen = any("qwen3" in m for m in models)
|
||||
report("GPU Ollama reachable", True, f"models: {models}")
|
||||
report("qwen3:8b present", has_qwen)
|
||||
except Exception as e:
|
||||
report("GPU Ollama reachable", False, str(e))
|
||||
report("qwen3:8b present", False, "skipped")
|
||||
|
||||
timings["gpu_ollama_ping"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 3. CPU Ollama ─────────────────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 3. CPU Ollama (port 11435)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{OLLAMA_CPU}/api/tags")
|
||||
models = [m["name"] for m in json.loads(body).get("models", [])]
|
||||
has_embed = any("nomic-embed-text" in m for m in models)
|
||||
report("CPU Ollama reachable", True, f"models: {models}")
|
||||
report("nomic-embed-text present", has_embed)
|
||||
except Exception as e:
|
||||
report("CPU Ollama reachable", False, str(e))
|
||||
report("nomic-embed-text present", False, "skipped")
|
||||
|
||||
timings["cpu_ollama_ping"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 4. Qdrant ─────────────────────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 4. Qdrant (port 6333)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{QDRANT}/collections")
|
||||
cols = [c["name"] for c in json.loads(body).get("result", {}).get("collections", [])]
|
||||
report("Qdrant reachable", True, f"collections: {cols}")
|
||||
report("adolf_memories collection exists", "adolf_memories" in cols)
|
||||
except Exception as e:
|
||||
report("Qdrant reachable", False, str(e))
|
||||
report("adolf_memories collection exists", False, "skipped")
|
||||
|
||||
try:
|
||||
status, body = get(f"{QDRANT}/collections/adolf_memories")
|
||||
info = json.loads(body).get("result", {})
|
||||
dims = info.get("config", {}).get("params", {}).get("vectors", {}).get("size")
|
||||
report("vector dims = 768", dims == 768, f"got {dims}")
|
||||
except Exception as e:
|
||||
report("adolf_memories collection info", False, str(e))
|
||||
|
||||
timings["qdrant_ping"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 5. SearXNG ────────────────────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 5. SearXNG (port 11437)")
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
status, body = get(f"{SEARXNG}/search?q=test&format=json", timeout=15)
|
||||
elapsed = time.monotonic() - t0
|
||||
n = len(json.loads(body).get("results", []))
|
||||
report("SearXNG reachable + JSON results", status == 200 and n > 0, f"{n} results in {elapsed:.1f}s")
|
||||
report("SearXNG response < 5s", elapsed < 5, f"{elapsed:.2f}s")
|
||||
timings["searxng_latency"] = elapsed
|
||||
except Exception as e:
|
||||
report("SearXNG reachable", False, str(e))
|
||||
report("SearXNG response < 5s", False, "skipped")
|
||||
timings["searxng_latency"] = None
|
||||
|
||||
timings["searxng_check"] = time.monotonic() - t0
|
||||
|
||||
|
||||
# ── 6–8. Name memory pipeline ─────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 6–8. Name memory pipeline")
|
||||
print(f" chat_id={CHAT_ID} name={random_name}")
|
||||
|
||||
store_msg = f"remember that your name is {random_name}"
|
||||
recall_msg = "what is your name?"
|
||||
|
||||
pts_before = qdrant_count()
|
||||
print(f" Qdrant points before: {pts_before}")
|
||||
|
||||
# ── 6. Send store message ─────────────────────────────────────────────────────
|
||||
print(f"\n [store] '{store_msg}'")
|
||||
t_store = time.monotonic()
|
||||
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": store_msg, "chat_id": CHAT_ID}, timeout=5)
|
||||
t_accept = time.monotonic() - t_store
|
||||
report("POST /chat (store) returns 202 immediately",
|
||||
status == 202 and t_accept < 1, f"status={status}, t={t_accept:.3f}s")
|
||||
timings["store_http_accept"] = t_accept
|
||||
except Exception as e:
|
||||
report("POST /chat (store)", False, str(e))
|
||||
sys.exit(1)
|
||||
|
||||
store = wait_for("store", store_msg, timeout_s=220, need_memory=True)
|
||||
|
||||
if store:
|
||||
timings["store_llm"] = store["llm"]
|
||||
timings["store_send"] = store["send"]
|
||||
timings["store_reply"] = store["reply_total"]
|
||||
timings["store_memory"] = store["memory_s"]
|
||||
report("Agent replied to store message", True,
|
||||
f"{store['reply_total']:.1f}s total llm={store['llm']:.1f}s send={store['send']:.1f}s tier={store['tier']}")
|
||||
if store["memory_s"] is not None:
|
||||
report("Memory stored without error", True, f"{store['memory_s']:.1f}s")
|
||||
elif store["memory_error"]:
|
||||
report("Memory stored without error", False, "error in [memory] log")
|
||||
else:
|
||||
report("Memory stored without error", False, "not found in logs (still running?)")
|
||||
print(f" Store reply: {store['reply_text']!r}")
|
||||
else:
|
||||
report("Agent replied to store message", False, "timeout")
|
||||
report("Memory stored without error", False, "timeout")
|
||||
sys.exit(1)
|
||||
|
||||
# ── 7. Verify Qdrant ──────────────────────────────────────────────────────────
|
||||
pts_after = qdrant_count()
|
||||
new_pts = pts_after - pts_before
|
||||
report("New memory point(s) added to Qdrant", new_pts > 0,
|
||||
f"{pts_before} → {pts_after} (+{new_pts})")
|
||||
timings["qdrant_new_points"] = new_pts
|
||||
|
||||
# ── 8. Send recall message ────────────────────────────────────────────────────
|
||||
print(f"\n [recall] '{recall_msg}'")
|
||||
t_recall = time.monotonic()
|
||||
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": recall_msg, "chat_id": CHAT_ID}, timeout=5)
|
||||
t_accept2 = time.monotonic() - t_recall
|
||||
report("POST /chat (recall) returns 202 immediately",
|
||||
status == 202 and t_accept2 < 1, f"status={status}, t={t_accept2:.3f}s")
|
||||
timings["recall_http_accept"] = t_accept2
|
||||
except Exception as e:
|
||||
report("POST /chat (recall)", False, str(e))
|
||||
|
||||
recall = wait_for("recall", recall_msg, timeout_s=160, need_memory=False)
|
||||
|
||||
if recall:
|
||||
timings["recall_llm"] = recall["llm"]
|
||||
timings["recall_send"] = recall["send"]
|
||||
timings["recall_reply"] = recall["reply_total"]
|
||||
report("Agent replied to recall message", True,
|
||||
f"{recall['reply_total']:.1f}s total llm={recall['llm']:.1f}s send={recall['send']:.1f}s tier={recall['tier']}")
|
||||
reply_text = recall["reply_text"] or ""
|
||||
name_in_reply = random_name.lower() in reply_text.lower()
|
||||
report(f"Reply contains '{random_name}'", name_in_reply,
|
||||
f"reply: {reply_text[:120]!r}")
|
||||
else:
|
||||
report("Agent replied to recall message", False, "timeout")
|
||||
report(f"Reply contains '{random_name}'", False, "no reply")
|
||||
|
||||
|
||||
# ── 9. Timing profile ─────────────────────────────────────────────────────────
|
||||
if not _skip_pipeline:
|
||||
print(f"\n[{INFO}] 9. Timing profile")
|
||||
|
||||
W = 36
|
||||
|
||||
print(f"\n {'Stage':<{W}} {'Time':>8}")
|
||||
print(f" {'─'*W} {'─'*8}")
|
||||
|
||||
rows_store = [
|
||||
("[GPU] HTTP accept — store turn", "store_http_accept"),
|
||||
("[GPU] qwen3:Xb inference — store turn","store_llm"),
|
||||
("[GPU] Telegram send — store turn", "store_send"),
|
||||
("[GPU] Total reply latency — store", "store_reply"),
|
||||
("[GPU] qwen2.5:1.5b+embed — async mem", "store_memory"),
|
||||
]
|
||||
rows_recall = [
|
||||
("[GPU] HTTP accept — recall turn", "recall_http_accept"),
|
||||
("[GPU] qwen3:Xb inference — recall", "recall_llm"),
|
||||
("[GPU] Telegram send — recall turn", "recall_send"),
|
||||
("[GPU] Total reply latency — recall", "recall_reply"),
|
||||
]
|
||||
|
||||
for label, key in rows_store:
|
||||
v = timings.get(key)
|
||||
print(f" {label:<{W}} {tf(v):>8}")
|
||||
|
||||
print(f" {'─'*W} {'─'*8}")
|
||||
|
||||
for label, key in rows_recall:
|
||||
v = timings.get(key)
|
||||
print(f" {label:<{W}} {tf(v):>8}")
|
||||
|
||||
# Bottleneck bar chart
|
||||
print(f"\n Bottleneck analysis (each █ ≈ 5s):")
|
||||
print(f" {'─'*(W+12)}")
|
||||
|
||||
candidates = [
|
||||
("[GPU] qwen3:Xb — store reply ", timings.get("store_llm") or 0),
|
||||
("[GPU] qwen3:Xb — recall reply", timings.get("recall_llm") or 0),
|
||||
("[GPU] qwen2.5:1.5b+embed (async)", timings.get("store_memory") or 0),
|
||||
("[net] SearXNG ", timings.get("searxng_latency") or 0),
|
||||
]
|
||||
candidates.sort(key=lambda x: x[1], reverse=True)
|
||||
|
||||
for label, t in candidates:
|
||||
bar = "█" * min(int(t / 5), 24)
|
||||
pct = ""
|
||||
total_pipeline = (timings.get("store_reply") or 0) + (timings.get("store_memory") or 0)
|
||||
if total_pipeline > 0:
|
||||
pct = f" {t/total_pipeline*100:4.0f}%"
|
||||
print(f" {label} {t:6.1f}s {bar}{pct}")
|
||||
|
||||
print()
|
||||
|
||||
|
||||
# ── 10. Tier routing benchmark — easy questions → light path ──────────────────
|
||||
if _run_easy:
|
||||
print(f"\n[{INFO}] 10. Tier routing benchmark")
|
||||
print(f" Sending {len(BENCHMARK['easy'])} easy questions — all must route to 'light'")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
bench_results = [] # list of (question, tier, latency_s, ok)
|
||||
LIGHT_TIMEOUT = 60 # seconds — light is fast but may queue behind prior messages
|
||||
|
||||
for i, question in enumerate(BENCHMARK["easy"], 1):
|
||||
tag = f"easy-{i:02d}"
|
||||
short_q = question[:55]
|
||||
print(f" [{tag}] {short_q!r}")
|
||||
|
||||
# Send
|
||||
t_send = time.monotonic()
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
bench_results.append((question, "?", None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
bench_results.append((question, "?", None, False))
|
||||
continue
|
||||
|
||||
# Poll for reply
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < LIGHT_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 30
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question)
|
||||
if found:
|
||||
break
|
||||
time.sleep(1)
|
||||
|
||||
elapsed = time.monotonic() - t_send
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] no reply within {LIGHT_TIMEOUT}s")
|
||||
bench_results.append((question, "timeout", None, False))
|
||||
continue
|
||||
|
||||
tier = found.get("tier", "unknown")
|
||||
is_light = (tier == "light")
|
||||
tag_str = PASS if is_light else FAIL
|
||||
print(f" → [{tag_str}] tier={tier} latency={found['reply_total']:.1f}s llm={found['llm']:.1f}s")
|
||||
bench_results.append((question, tier, found["reply_total"], is_light))
|
||||
|
||||
# Brief pause between questions to keep logs clean
|
||||
time.sleep(1)
|
||||
|
||||
# Summary table
|
||||
print(f"\n {'#':<4} {'Tier':<8} {'Latency':>8} {'Question'}")
|
||||
print(f" {'─'*4} {'─'*8} {'─'*8} {'─'*50}")
|
||||
for idx, (q, tier, lat, ok) in enumerate(bench_results, 1):
|
||||
lat_str = f"{lat:.1f}s" if lat is not None else "timeout"
|
||||
ok_str = "✓" if ok else "✗"
|
||||
print(f" {ok_str} {idx:<3} {tier:<8} {lat_str:>8} {q[:50]!r}")
|
||||
|
||||
light_count = sum(1 for _, _, _, ok in bench_results if ok)
|
||||
total_bench = len(bench_results)
|
||||
lats = [lat for _, _, lat, ok in bench_results if ok and lat is not None]
|
||||
avg_lat = sum(lats) / len(lats) if lats else 0
|
||||
|
||||
print(f"\n Light-path score: {light_count}/{total_bench}")
|
||||
if lats:
|
||||
print(f" Avg latency (light): {avg_lat:.1f}s "
|
||||
f"min={min(lats):.1f}s max={max(lats):.1f}s")
|
||||
|
||||
report(f"All easy questions routed to light ({light_count}/{total_bench})",
|
||||
light_count == total_bench,
|
||||
f"{light_count}/{total_bench} via light path, avg {avg_lat:.1f}s")
|
||||
|
||||
|
||||
# ── 11. Medium benchmark — medium questions → medium or light, never complex ──
|
||||
if _run_medium:
|
||||
print(f"\n[{INFO}] 11. Medium routing benchmark")
|
||||
print(f" Sending {len(BENCHMARK['medium'])} medium questions")
|
||||
print(f" Expected: tier=medium (needs tools). Light is acceptable for factual questions.")
|
||||
print(f" Fail condition: tier=complex or timeout.")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
# Questions where light is a valid alternative (model may know from training data)
|
||||
LIGHT_ACCEPTABLE = {
|
||||
"who won the last FIFA World Cup?",
|
||||
"search for a good pasta carbonara recipe",
|
||||
"find Python tutorials for beginners",
|
||||
"search for the best coffee shops in Tokyo",
|
||||
}
|
||||
|
||||
med_results = [] # list of (question, tier, latency_s, correct)
|
||||
MEDIUM_TIMEOUT = 120 # seconds — medium takes 20-100s, allow for queue buildup
|
||||
|
||||
for i, question in enumerate(BENCHMARK["medium"], 1):
|
||||
tag = f"med-{i:02d}"
|
||||
short_q = question[:60]
|
||||
print(f" [{tag}] {short_q!r}")
|
||||
|
||||
# Send
|
||||
t_send = time.monotonic()
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
med_results.append((question, "?", None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
med_results.append((question, "?", None, False))
|
||||
continue
|
||||
|
||||
# Poll for reply
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < MEDIUM_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 60
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question)
|
||||
if found:
|
||||
break
|
||||
time.sleep(3)
|
||||
|
||||
elapsed = time.monotonic() - t_send
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] no reply within {MEDIUM_TIMEOUT}s")
|
||||
med_results.append((question, "timeout", None, False))
|
||||
continue
|
||||
|
||||
tier = found.get("tier", "unknown")
|
||||
light_ok = question in LIGHT_ACCEPTABLE
|
||||
|
||||
if tier == "medium":
|
||||
correct = True
|
||||
label = PASS
|
||||
note = "medium ✓"
|
||||
elif tier == "light":
|
||||
correct = light_ok # light is only acceptable for certain questions
|
||||
label = WARN if not light_ok else PASS
|
||||
note = "light (acceptable)" if light_ok else "light (should be medium)"
|
||||
elif tier == "complex":
|
||||
correct = False
|
||||
label = FAIL
|
||||
note = "complex — wrong escalation"
|
||||
else:
|
||||
correct = False
|
||||
label = FAIL
|
||||
note = f"unknown tier {tier!r}"
|
||||
|
||||
print(f" → [{label}] {note} latency={found['reply_total']:.1f}s llm={found['llm']:.1f}s")
|
||||
med_results.append((question, tier, found["reply_total"], correct))
|
||||
|
||||
# Brief pause between questions
|
||||
time.sleep(1)
|
||||
|
||||
# Summary table
|
||||
print(f"\n {'#':<4} {'Tier':<8} {'Latency':>8} {'Question'}")
|
||||
print(f" {'─'*4} {'─'*8} {'─'*8} {'─'*55}")
|
||||
for idx, (q, tier, lat, ok) in enumerate(med_results, 1):
|
||||
lat_str = f"{lat:.1f}s" if lat is not None else "timeout"
|
||||
ok_str = "✓" if ok else ("~" if tier == "light" else "✗")
|
||||
print(f" {ok_str} {idx:<3} {tier:<8} {lat_str:>8} {q[:55]!r}")
|
||||
|
||||
total_med = len(med_results)
|
||||
medium_count = sum(1 for _, tier, _, _ in med_results if tier == "medium")
|
||||
light_count = sum(1 for _, tier, _, _ in med_results if tier == "light")
|
||||
complex_count = sum(1 for _, tier, _, _ in med_results if tier == "complex")
|
||||
timeout_count = sum(1 for _, tier, _, _ in med_results if tier == "timeout")
|
||||
light_misroute = sum(
|
||||
1 for q, tier, _, _ in med_results
|
||||
if tier == "light" and q not in LIGHT_ACCEPTABLE
|
||||
)
|
||||
lats = [lat for _, _, lat, _ in med_results if lat is not None]
|
||||
correct_count = medium_count + (light_count - light_misroute)
|
||||
|
||||
print(f"\n Breakdown: medium={medium_count} light={light_count} complex={complex_count} timeout={timeout_count}")
|
||||
if light_misroute:
|
||||
print(f" [{WARN}] {light_misroute} question(s) answered via light when medium expected (check reply quality)")
|
||||
if lats:
|
||||
print(f" Avg latency: {sum(lats)/len(lats):.1f}s min={min(lats):.1f}s max={max(lats):.1f}s")
|
||||
|
||||
no_complex = complex_count == 0
|
||||
no_timeout = timeout_count == 0
|
||||
all_ok = no_complex and no_timeout
|
||||
|
||||
report(
|
||||
f"Medium questions: no complex escalation ({medium_count + light_count}/{total_med} routed)",
|
||||
no_complex,
|
||||
f"medium={medium_count} light={light_count} complex={complex_count} timeout={timeout_count}",
|
||||
)
|
||||
if not no_timeout:
|
||||
report(
|
||||
f"Medium questions: all completed within {MEDIUM_TIMEOUT}s",
|
||||
False,
|
||||
f"{timeout_count} question(s) timed out (increase MEDIUM_TIMEOUT or check agent logs)",
|
||||
)
|
||||
|
||||
|
||||
# ── 12. Hard benchmark — /think questions → complex tier + VRAM flush verified ─
|
||||
if _run_hard:
|
||||
print(f"\n[{INFO}] 12. Hard routing benchmark")
|
||||
print(f" Sending {len(BENCHMARK['hard'])} /think questions — all must route to 'complex'")
|
||||
print(f" Verifies: /think prefix → force_complex=True → VRAM flush → qwen3:8b inference")
|
||||
print(f" Acceptable fallback: 'medium' if VRAM eviction timed out (logged warning)")
|
||||
print(f" Fail condition: tier=light or timeout")
|
||||
print(f" Chat ID: {CHAT_ID}")
|
||||
print()
|
||||
|
||||
hard_results = [] # list of (question, tier, latency_s, ok)
|
||||
COMPLEX_TIMEOUT = 300 # seconds — complex takes 60-180s + VRAM flush overhead
|
||||
|
||||
# Log markers we expect to see for complex path
|
||||
_VRAM_ENTER = "[vram] enter_complex_mode"
|
||||
_VRAM_EXIT = "[vram] exit_complex_mode"
|
||||
|
||||
for i, question in enumerate(BENCHMARK["hard"], 1):
|
||||
tag = f"hard-{i:02d}"
|
||||
# Strip /think prefix for display
|
||||
short_q = question[len("/think "):].strip()[:60]
|
||||
print(f" [{tag}] /think {short_q!r}")
|
||||
|
||||
# Snapshot log window start time
|
||||
t_send = time.monotonic()
|
||||
try:
|
||||
status, _ = post_json(f"{DEEPAGENTS}/chat",
|
||||
{"message": question, "chat_id": CHAT_ID}, timeout=5)
|
||||
if status != 202:
|
||||
print(f" → [{FAIL}] POST returned {status}")
|
||||
hard_results.append((question, "?", None, False))
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f" → [{FAIL}] POST error: {e}")
|
||||
hard_results.append((question, "?", None, False))
|
||||
continue
|
||||
|
||||
# Poll for reply
|
||||
t_start = time.monotonic()
|
||||
found = None
|
||||
while time.monotonic() - t_start < COMPLEX_TIMEOUT:
|
||||
since = int(time.monotonic() - t_start) + 90
|
||||
lines = fetch_logs(since_s=since)
|
||||
found = parse_run_block(lines, question[len("/think "):].strip())
|
||||
if found:
|
||||
break
|
||||
time.sleep(5)
|
||||
|
||||
elapsed = time.monotonic() - t_send
|
||||
|
||||
if not found:
|
||||
print(f" → [{FAIL}] no reply within {COMPLEX_TIMEOUT}s")
|
||||
hard_results.append((question, "timeout", None, False))
|
||||
continue
|
||||
|
||||
tier = found.get("tier", "unknown")
|
||||
|
||||
if tier == "complex":
|
||||
ok = True
|
||||
label = PASS
|
||||
note = "complex ✓"
|
||||
elif tier == "medium":
|
||||
# Acceptable fallback if VRAM eviction timed out
|
||||
ok = True
|
||||
label = WARN
|
||||
note = "medium (VRAM fallback — check [vram] logs)"
|
||||
else:
|
||||
ok = False
|
||||
label = FAIL
|
||||
note = f"tier={tier} — unexpected"
|
||||
|
||||
# Check if VRAM enter/exit were logged for this block
|
||||
lines_block = fetch_logs(since_s=int(elapsed) + 120)
|
||||
msg_key = question[len("/think "):].strip()[:40]
|
||||
vram_enter_seen = any(_VRAM_ENTER in ln for ln in lines_block
|
||||
if msg_key in ln or
|
||||
any(msg_key[:15] in prev_ln
|
||||
for prev_ln in lines_block[max(0, lines_block.index(ln)-10):lines_block.index(ln)]))
|
||||
|
||||
# Simpler: just check the recent log window for enter/exit markers
|
||||
recent = "\n".join(lines_block[-200:])
|
||||
vram_enter_seen = _VRAM_ENTER in recent
|
||||
vram_exit_seen = _VRAM_EXIT in recent
|
||||
|
||||
vram_note = ""
|
||||
if tier == "complex":
|
||||
if vram_enter_seen:
|
||||
vram_note = " [vram:flush✓]"
|
||||
else:
|
||||
vram_note = f" [{WARN}:no vram flush log]"
|
||||
|
||||
print(f" → [{label}] {note} latency={found['reply_total']:.1f}s llm={found['llm']:.1f}s{vram_note}")
|
||||
hard_results.append((question, tier, found["reply_total"], ok))
|
||||
|
||||
# Pause to let exit_complex_mode background task complete before next question
|
||||
# (flushes qwen3:8b and pre-warms 4b+router — avoids VRAM conflict on next enter)
|
||||
time.sleep(5)
|
||||
|
||||
# Summary table
|
||||
print(f"\n {'#':<4} {'Tier':<8} {'Latency':>8} {'Question (/think ...)'}")
|
||||
print(f" {'─'*4} {'─'*8} {'─'*8} {'─'*55}")
|
||||
for idx, (q, tier, lat, ok) in enumerate(hard_results, 1):
|
||||
lat_str = f"{lat:.1f}s" if lat is not None else "timeout"
|
||||
ok_str = "✓" if tier == "complex" else ("~" if tier == "medium" else "✗")
|
||||
short = q[len("/think "):].strip()[:55]
|
||||
print(f" {ok_str} {idx:<3} {tier:<8} {lat_str:>8} {short!r}")
|
||||
|
||||
total_hard = len(hard_results)
|
||||
complex_count = sum(1 for _, t, _, _ in hard_results if t == "complex")
|
||||
medium_fb = sum(1 for _, t, _, _ in hard_results if t == "medium")
|
||||
light_count = sum(1 for _, t, _, _ in hard_results if t == "light")
|
||||
timeout_count = sum(1 for _, t, _, _ in hard_results if t == "timeout")
|
||||
lats = [lat for _, _, lat, _ in hard_results if lat is not None]
|
||||
|
||||
print(f"\n Breakdown: complex={complex_count} medium(fallback)={medium_fb} light={light_count} timeout={timeout_count}")
|
||||
if medium_fb:
|
||||
print(f" [{WARN}] {medium_fb} question(s) fell back to medium (VRAM eviction timeout)")
|
||||
if light_count:
|
||||
print(f" [{FAIL}] {light_count} question(s) routed to light — /think prefix not detected")
|
||||
if lats:
|
||||
print(f" Avg latency: {sum(lats)/len(lats):.1f}s min={min(lats):.1f}s max={max(lats):.1f}s")
|
||||
|
||||
no_light = light_count == 0
|
||||
no_timeout = timeout_count == 0
|
||||
|
||||
report(
|
||||
f"Hard questions routed to complex (not light) ({complex_count + medium_fb}/{total_hard})",
|
||||
no_light and no_timeout,
|
||||
f"complex={complex_count} medium_fallback={medium_fb} light={light_count} timeout={timeout_count}",
|
||||
)
|
||||
|
||||
|
||||
# ── summary ───────────────────────────────────────────────────────────────────
|
||||
print(f"\n{'─'*55}")
|
||||
total = len(results)
|
||||
passed = sum(1 for _, ok in results if ok)
|
||||
failed = total - passed
|
||||
print(f"Results: {passed}/{total} passed", end="")
|
||||
if failed:
|
||||
print(f" ({failed} failed)\n")
|
||||
print("Failed checks:")
|
||||
for name, ok in results:
|
||||
if not ok:
|
||||
print(f" - {name}")
|
||||
else:
|
||||
print(" — all good")
|
||||
print()
|
||||
|
||||
# Print benchmark reference
|
||||
print(f"[{INFO}] Benchmark questions reference:")
|
||||
for tier_name, questions in BENCHMARK.items():
|
||||
print(f"\n {tier_name.upper()} ({len(questions)} questions):")
|
||||
for j, q in enumerate(questions, 1):
|
||||
print(f" {j:2d}. {q}")
|
||||
print()
|
||||
@@ -1,71 +0,0 @@
|
||||
import asyncio
|
||||
import os
|
||||
import httpx
|
||||
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
|
||||
|
||||
|
||||
class VRAMManager:
|
||||
MEDIUM_MODELS = ["qwen3:4b", "qwen2.5:1.5b"]
|
||||
COMPLEX_MODEL = "qwen3:8b"
|
||||
|
||||
def __init__(self, base_url: str = OLLAMA_BASE_URL):
|
||||
self.base_url = base_url
|
||||
|
||||
async def enter_complex_mode(self) -> bool:
|
||||
"""Flush medium models before loading 8b. Returns False if eviction timed out."""
|
||||
print("[vram] enter_complex_mode: flushing medium models", flush=True)
|
||||
await asyncio.gather(*[self._flush(m) for m in self.MEDIUM_MODELS])
|
||||
ok = await self._poll_evicted(self.MEDIUM_MODELS, timeout=15)
|
||||
if ok:
|
||||
print("[vram] enter_complex_mode: eviction confirmed, loading qwen3:8b", flush=True)
|
||||
else:
|
||||
print("[vram] enter_complex_mode: eviction timeout — falling back to medium", flush=True)
|
||||
return ok
|
||||
|
||||
async def exit_complex_mode(self):
|
||||
"""Flush 8b and pre-warm medium models. Run as background task after complex reply."""
|
||||
print("[vram] exit_complex_mode: flushing qwen3:8b", flush=True)
|
||||
await self._flush(self.COMPLEX_MODEL)
|
||||
print("[vram] exit_complex_mode: pre-warming medium models", flush=True)
|
||||
await asyncio.gather(*[self._prewarm(m) for m in self.MEDIUM_MODELS])
|
||||
print("[vram] exit_complex_mode: done", flush=True)
|
||||
|
||||
async def _flush(self, model: str):
|
||||
"""Send keep_alive=0 to force immediate unload from VRAM."""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
await client.post(
|
||||
f"{self.base_url}/api/generate",
|
||||
json={"model": model, "prompt": "", "keep_alive": 0},
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"[vram] flush {model} error: {e}", flush=True)
|
||||
|
||||
async def _poll_evicted(self, models: list[str], timeout: float) -> bool:
|
||||
"""Poll /api/ps until none of the given models appear (or timeout)."""
|
||||
deadline = asyncio.get_event_loop().time() + timeout
|
||||
while asyncio.get_event_loop().time() < deadline:
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(f"{self.base_url}/api/ps")
|
||||
data = resp.json()
|
||||
loaded = {m.get("name", "") for m in data.get("models", [])}
|
||||
if not any(m in loaded for m in models):
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"[vram] poll_evicted error: {e}", flush=True)
|
||||
await asyncio.sleep(0.5)
|
||||
return False
|
||||
|
||||
async def _prewarm(self, model: str):
|
||||
"""Load model into VRAM with keep_alive=300 (5 min)."""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=60.0) as client:
|
||||
await client.post(
|
||||
f"{self.base_url}/api/generate",
|
||||
json={"model": model, "prompt": "", "keep_alive": 300},
|
||||
)
|
||||
print(f"[vram] pre-warmed {model}", flush=True)
|
||||
except Exception as e:
|
||||
print(f"[vram] prewarm {model} error: {e}", flush=True)
|
||||
191
haos/CLAUDE.md
Normal file
191
haos/CLAUDE.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# Home Assistant REST API
|
||||
|
||||
## Connection
|
||||
|
||||
- **Base URL**: `http://<HA_IP>:8123/api/`
|
||||
- **Auth header**: `Authorization: Bearer <TOKEN>`
|
||||
- **Token**: Generate at `http://<HA_IP>:8123/profile` → Long-Lived Access Tokens
|
||||
- **Response format**: JSON (except `/api/error_log` which is plaintext)
|
||||
|
||||
Store token in env var, never hardcode:
|
||||
```bash
|
||||
export HA_TOKEN="your_token_here"
|
||||
export HA_URL="http://<HA_IP>:8123"
|
||||
```
|
||||
|
||||
## Status Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 200 | Success (existing resource) |
|
||||
| 201 | Created (new resource) |
|
||||
| 400 | Bad request |
|
||||
| 401 | Unauthorized |
|
||||
| 404 | Not found |
|
||||
| 405 | Method not allowed |
|
||||
|
||||
## GET Endpoints
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
GET /api/
|
||||
|
||||
# Current HA configuration
|
||||
GET /api/config
|
||||
|
||||
# Loaded components
|
||||
GET /api/components
|
||||
|
||||
# All entity states
|
||||
GET /api/states
|
||||
|
||||
# Specific entity state
|
||||
GET /api/states/<entity_id>
|
||||
|
||||
# Available services
|
||||
GET /api/services
|
||||
|
||||
# Available events
|
||||
GET /api/events
|
||||
|
||||
# Error log (plaintext)
|
||||
GET /api/error_log
|
||||
|
||||
# Camera image
|
||||
GET /api/camera_proxy/<camera_entity_id>
|
||||
|
||||
# All calendar entities
|
||||
GET /api/calendars
|
||||
|
||||
# Calendar events (start and end are required ISO timestamps)
|
||||
GET /api/calendars/<calendar_entity_id>?start=<ISO>&end=<ISO>
|
||||
|
||||
# Historical state changes
|
||||
GET /api/history/period/<ISO_timestamp>?filter_entity_id=<entity_id>
|
||||
# Optional params: end_time, minimal_response, no_attributes, significant_changes_only
|
||||
|
||||
# Logbook entries
|
||||
GET /api/logbook/<ISO_timestamp>
|
||||
# Optional params: entity=<entity_id>, end_time=<ISO>
|
||||
```
|
||||
|
||||
## POST Endpoints
|
||||
|
||||
```bash
|
||||
# Create or update entity state (virtual, not device)
|
||||
POST /api/states/<entity_id>
|
||||
{"state": "on", "attributes": {"brightness": 255}}
|
||||
|
||||
# Fire an event
|
||||
POST /api/events/<event_type>
|
||||
{"optional": "event_data"}
|
||||
|
||||
# Call a service
|
||||
POST /api/services/<domain>/<service>
|
||||
{"entity_id": "light.living_room"}
|
||||
|
||||
# Call service and get its response
|
||||
POST /api/services/<domain>/<service>?return_response
|
||||
{"entity_id": "..."}
|
||||
|
||||
# Render a Jinja2 template
|
||||
POST /api/template
|
||||
{"template": "{{ states('sensor.temperature') }}"}
|
||||
|
||||
# Validate configuration
|
||||
POST /api/config/core/check_config
|
||||
|
||||
# Handle an intent
|
||||
POST /api/intent/handle
|
||||
{"name": "HassTurnOn", "data": {"name": "lights"}}
|
||||
```
|
||||
|
||||
## DELETE Endpoints
|
||||
|
||||
```bash
|
||||
# Remove an entity
|
||||
DELETE /api/states/<entity_id>
|
||||
```
|
||||
|
||||
## Example curl Usage
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl -s -H "Authorization: Bearer $HA_TOKEN" $HA_URL/api/
|
||||
|
||||
# Get all states
|
||||
curl -s -H "Authorization: Bearer $HA_TOKEN" $HA_URL/api/states | jq .
|
||||
|
||||
# Get specific entity
|
||||
curl -s -H "Authorization: Bearer $HA_TOKEN" $HA_URL/api/states/light.living_room
|
||||
|
||||
# Turn on a light
|
||||
curl -s -X POST \
|
||||
-H "Authorization: Bearer $HA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"entity_id": "light.living_room"}' \
|
||||
$HA_URL/api/services/light/turn_on
|
||||
|
||||
# Render template
|
||||
curl -s -X POST \
|
||||
-H "Authorization: Bearer $HA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"template": "{{ states(\"sensor.temperature\") }}"}' \
|
||||
$HA_URL/api/template
|
||||
```
|
||||
|
||||
## Devices
|
||||
|
||||
### Lights
|
||||
4x Zigbee Tuya lights (TZ3210 TS0505B):
|
||||
- `light.tz3210_r5afgmkl_ts0505b` (G2)
|
||||
- `light.tz3210_r5afgmkl_ts0505b_g2` (G22)
|
||||
- `light.tz3210_r5afgmkl_ts0505b_2`
|
||||
- `light.tz3210_r5afgmkl_ts0505b_3`
|
||||
|
||||
Support: color_temp (2000-6535K), xy color mode, brightness (0-254)
|
||||
|
||||
### Vacuum Cleaner
|
||||
**Entity**: `vacuum.xiaomi_ru_1173505785_ov71gl` (Петя Петя)
|
||||
**Status**: Docked
|
||||
**Type**: Xiaomi robot vacuum with mop
|
||||
|
||||
**Rooms** (from `sensor.xiaomi_ru_1173505785_ov71gl_room_information_p_2_16`):
|
||||
- ID 4: Спальня (Bedroom)
|
||||
- ID 3: Гостиная (Living Room)
|
||||
- ID 5: Кухня (Kitchen)
|
||||
- ID 6: Прихожая (Hallway)
|
||||
- ID 7: Ванная комната (Bathroom)
|
||||
|
||||
**Services**:
|
||||
- `vacuum.start` — Start cleaning
|
||||
- `vacuum.pause` — Pause
|
||||
- `vacuum.stop` — Stop
|
||||
- `vacuum.return_to_base` — Dock
|
||||
- `vacuum.clean_spot` — Clean spot
|
||||
- `vacuum.set_fan_speed` — Set fan (param: `fan_speed`)
|
||||
- `vacuum.send_command` — Raw command (params: `command`, `params`)
|
||||
- Room-aware: `start_vacuum_room_sweep`, `start_zone_sweep`, `get_room_configs`, `set_room_clean_configs`
|
||||
|
||||
**Key attributes**:
|
||||
- `sensor.xiaomi_ru_1173505785_ov71gl_room_information_p_2_16` — Room data (JSON)
|
||||
- `sensor.xiaomi_ru_1173505785_ov71gl_zone_ids_p_2_12` — Zone IDs
|
||||
- `button.xiaomi_ru_1173505785_ov71gl_auto_room_partition_a_10_5` — Auto-detect room boundaries
|
||||
|
||||
### Water Leak Sensors
|
||||
3x HOBEIAN ZG-222Z Zigbee moisture sensors:
|
||||
- `binary_sensor.hobeian_zg_222z` — Kitchen
|
||||
- `binary_sensor.hobeian_zg_222z_2` — Bathroom
|
||||
- `binary_sensor.hobeian_zg_222z_3` — Laundry
|
||||
|
||||
Battery sensors: `sensor.hobeian_zg_222z_battery`, `_2`, `_3`
|
||||
|
||||
**Automations** (push to Zabbix via `rest_command`):
|
||||
- "Water Leak Alert" (`water_leak_alert`) — any sensor ON → `rest_command.zabbix_water_leak` with room name
|
||||
- "Water Leak Clear" (`water_leak_clear`) — all sensors OFF → `rest_command.zabbix_water_leak_clear`
|
||||
|
||||
## Notes
|
||||
|
||||
- `POST /api/states/<entity_id>` creates a virtual state representation only — it does NOT control physical devices. Use `POST /api/services/...` for actual device control.
|
||||
- Timestamp format: `YYYY-MM-DDThh:mm:ssTZD` (ISO 8601)
|
||||
- Using `?return_response` on a service that doesn't support it returns a 400 error
|
||||
30
immich-app/backup.sh
Executable file
30
immich-app/backup.sh
Executable file
@@ -0,0 +1,30 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
BACKUP_DIR=/mnt/backups/media
|
||||
DB_BACKUP_DIR="$BACKUP_DIR/backups"
|
||||
LOG="$BACKUP_DIR/backup.log"
|
||||
RETAIN_DAYS=14
|
||||
|
||||
mkdir -p "$DB_BACKUP_DIR"
|
||||
|
||||
echo "[$(date)] Starting Immich backup" >> "$LOG"
|
||||
|
||||
# 1. Database dump (must come before file sync)
|
||||
DUMP_FILE="$DB_BACKUP_DIR/immich-db-$(date +%Y%m%dT%H%M%S).sql.gz"
|
||||
docker exec immich_postgres pg_dump --clean --if-exists \
|
||||
--dbname=immich --username=postgres | gzip > "$DUMP_FILE"
|
||||
echo "[$(date)] DB dump: $DUMP_FILE" >> "$LOG"
|
||||
|
||||
# 2. Rsync critical asset folders (skip thumbs and encoded-video — regeneratable)
|
||||
for DIR in library upload profile; do
|
||||
rsync -a --delete /mnt/media/upload/$DIR/ "$BACKUP_DIR/$DIR/" >> "$LOG" 2>&1
|
||||
echo "[$(date)] Synced $DIR" >> "$LOG"
|
||||
done
|
||||
|
||||
# 3. Remove old DB dumps
|
||||
find "$DB_BACKUP_DIR" -name "immich-db-*.sql.gz" -mtime +$RETAIN_DAYS -delete
|
||||
echo "[$(date)] Cleaned dumps older than ${RETAIN_DAYS}d" >> "$LOG"
|
||||
|
||||
touch "$BACKUP_DIR/.last_sync"
|
||||
echo "[$(date)] Immich backup complete" >> "$LOG"
|
||||
7
matrix/.env
Normal file
7
matrix/.env
Normal file
@@ -0,0 +1,7 @@
|
||||
SYNAPSE_DATA=./data/synapse
|
||||
POSTGRES_DATA=./data/postgres
|
||||
POSTGRES_USER=synapse
|
||||
POSTGRES_PASSWORD=OimW4JUSXhZBCtLHE1kFnZ7cWVbESsxynapnJ+PSw/4=
|
||||
POSTGRES_DB=synapse
|
||||
LIVEKIT_KEY=devkey
|
||||
LIVEKIT_SECRET=ef3ef4b903ca8469b09b2dd7ab6af529c4d2f3c95668f53832fc351cf67777a9
|
||||
1
matrix/.gitignore
vendored
Normal file
1
matrix/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
||||
data/
|
||||
105
matrix/README.md
Normal file
105
matrix/README.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# Matrix Home Server
|
||||
|
||||
Self-hosted Matrix homeserver running on `mtx.alogins.net`.
|
||||
|
||||
## Stack
|
||||
|
||||
| Service | Purpose |
|
||||
|---------|---------|
|
||||
| Synapse | Matrix homeserver |
|
||||
| PostgreSQL | Synapse database |
|
||||
| LiveKit | MatrixRTC media server (calls) |
|
||||
| lk-jwt-service | LiveKit JWT auth for Matrix users |
|
||||
| coturn | TURN/STUN server (ICE fallback) |
|
||||
|
||||
## Clients
|
||||
|
||||
- **Element X** (Android/iOS) — recommended, full call support
|
||||
- **FluffyChat** — messaging only, calls not supported
|
||||
|
||||
Connect clients to: `https://mtx.alogins.net`
|
||||
|
||||
## Users
|
||||
|
||||
| Username | Admin |
|
||||
|----------|-------|
|
||||
| admin | yes |
|
||||
| elizaveta | no |
|
||||
| aleksandra | no |
|
||||
|
||||
## Managing Users
|
||||
|
||||
```bash
|
||||
# Add user
|
||||
docker exec synapse register_new_matrix_user \
|
||||
-c /data/homeserver.yaml \
|
||||
-u <username> -p <password> --no-admin \
|
||||
http://localhost:8008
|
||||
|
||||
# Add admin
|
||||
docker exec synapse register_new_matrix_user \
|
||||
-c /data/homeserver.yaml \
|
||||
-u <username> -p <password> -a \
|
||||
http://localhost:8008
|
||||
```
|
||||
|
||||
## Start / Stop
|
||||
|
||||
```bash
|
||||
cd /home/alvis/agap_git/matrix
|
||||
|
||||
docker compose up -d # start all
|
||||
docker compose down # stop all
|
||||
docker compose restart # restart all
|
||||
docker compose ps # status
|
||||
docker compose logs -f # logs
|
||||
```
|
||||
|
||||
## Caddy
|
||||
|
||||
Entries in `/home/alvis/agap_git/Caddyfile`:
|
||||
|
||||
| Domain | Purpose |
|
||||
|--------|---------|
|
||||
| `mtx.alogins.net` | Synapse + well-known |
|
||||
| `lk.alogins.net` | LiveKit SFU |
|
||||
| `lkjwt.alogins.net` | LiveKit JWT service |
|
||||
|
||||
Deploy Caddyfile changes:
|
||||
```bash
|
||||
sudo cp /home/alvis/agap_git/Caddyfile /etc/caddy/Caddyfile && sudo systemctl reload caddy
|
||||
```
|
||||
|
||||
## Firewall Ports Required
|
||||
|
||||
| Port | Protocol | Service |
|
||||
|------|----------|---------|
|
||||
| 443 | TCP | Caddy (HTTPS) |
|
||||
| 3478 | UDP+TCP | coturn TURN |
|
||||
| 5349 | UDP+TCP | coturn TURNS |
|
||||
| 7881 | TCP | LiveKit |
|
||||
| 49152-65535 | UDP | coturn relay |
|
||||
| 50100-50200 | UDP | LiveKit media |
|
||||
|
||||
## Data Locations
|
||||
|
||||
| Data | Path |
|
||||
|------|------|
|
||||
| Synapse config & media | `./data/synapse/` |
|
||||
| PostgreSQL data | `./data/postgres/` |
|
||||
| LiveKit config | `./livekit/livekit.yaml` |
|
||||
| coturn config | `./coturn/turnserver.conf` |
|
||||
|
||||
## First-Time Setup (reference)
|
||||
|
||||
```bash
|
||||
# Generate Synapse config
|
||||
docker run --rm \
|
||||
-v ./data/synapse:/data \
|
||||
-e SYNAPSE_SERVER_NAME=mtx.alogins.net \
|
||||
-e SYNAPSE_REPORT_STATS=no \
|
||||
matrixdotorg/synapse:latest generate
|
||||
|
||||
# Edit database section in data/synapse/homeserver.yaml, then:
|
||||
docker compose up -d
|
||||
```
|
||||
18
matrix/coturn/turnserver.conf
Normal file
18
matrix/coturn/turnserver.conf
Normal file
@@ -0,0 +1,18 @@
|
||||
listening-port=3478
|
||||
tls-listening-port=5349
|
||||
|
||||
external-ip=83.99.190.32/192.168.1.3
|
||||
|
||||
realm=mtx.alogins.net
|
||||
server-name=mtx.alogins.net
|
||||
|
||||
use-auth-secret
|
||||
static-auth-secret=144152cc09030796a4fd0109437dfc2089db2d5181b848d38d20c646c1d7a14b
|
||||
|
||||
no-multicast-peers
|
||||
denied-peer-ip=10.0.0.0-10.255.255.255
|
||||
denied-peer-ip=172.16.0.0-172.31.255.255
|
||||
denied-peer-ip=192.168.0.0-192.168.255.255
|
||||
|
||||
log-file=stdout
|
||||
no-software-attribute
|
||||
73
matrix/docker-compose.yml
Normal file
73
matrix/docker-compose.yml
Normal file
@@ -0,0 +1,73 @@
|
||||
services:
|
||||
synapse:
|
||||
image: matrixdotorg/synapse:latest
|
||||
container_name: synapse
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- ${SYNAPSE_DATA}:/data
|
||||
- /etc/localtime:/etc/localtime:ro
|
||||
environment:
|
||||
- SYNAPSE_CONFIG_PATH=/data/homeserver.yaml
|
||||
ports:
|
||||
- "127.0.0.1:8008:8008"
|
||||
depends_on:
|
||||
- db
|
||||
networks:
|
||||
- matrix
|
||||
- frontend
|
||||
|
||||
db:
|
||||
image: postgres:16-alpine
|
||||
container_name: synapse-db
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- POSTGRES_USER=${POSTGRES_USER}
|
||||
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
|
||||
- POSTGRES_DB=${POSTGRES_DB}
|
||||
- POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=C --lc-ctype=C
|
||||
volumes:
|
||||
- ${POSTGRES_DATA}:/var/lib/postgresql/data
|
||||
- /etc/localtime:/etc/localtime:ro
|
||||
networks:
|
||||
- matrix
|
||||
|
||||
lk-jwt-service:
|
||||
image: ghcr.io/element-hq/lk-jwt-service:latest
|
||||
container_name: lk-jwt-service
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "127.0.0.1:8009:8080"
|
||||
environment:
|
||||
- LIVEKIT_JWT_BIND=:8080
|
||||
- LIVEKIT_URL=wss://lk.alogins.net
|
||||
- LIVEKIT_KEY=${LIVEKIT_KEY}
|
||||
- LIVEKIT_SECRET=${LIVEKIT_SECRET}
|
||||
- LIVEKIT_FULL_ACCESS_HOMESERVERS=mtx.alogins.net
|
||||
extra_hosts:
|
||||
- "mtx.alogins.net:host-gateway"
|
||||
- "lk.alogins.net:host-gateway"
|
||||
|
||||
livekit:
|
||||
image: livekit/livekit-server:latest
|
||||
container_name: livekit
|
||||
restart: unless-stopped
|
||||
network_mode: host
|
||||
volumes:
|
||||
- ./livekit/livekit.yaml:/etc/livekit.yaml:ro
|
||||
command: --config /etc/livekit.yaml
|
||||
|
||||
coturn:
|
||||
image: coturn/coturn:latest
|
||||
container_name: coturn
|
||||
restart: unless-stopped
|
||||
network_mode: host
|
||||
volumes:
|
||||
- ./coturn/turnserver.conf:/etc/coturn/turnserver.conf:ro
|
||||
- /etc/localtime:/etc/localtime:ro
|
||||
|
||||
networks:
|
||||
matrix:
|
||||
driver: bridge
|
||||
internal: true
|
||||
frontend:
|
||||
driver: bridge
|
||||
15
matrix/livekit/livekit.yaml
Normal file
15
matrix/livekit/livekit.yaml
Normal file
@@ -0,0 +1,15 @@
|
||||
port: 7880
|
||||
rtc:
|
||||
tcp_port: 7881
|
||||
port_range_start: 50100
|
||||
port_range_end: 50200
|
||||
use_external_ip: true
|
||||
|
||||
keys:
|
||||
devkey: ef3ef4b903ca8469b09b2dd7ab6af529c4d2f3c95668f53832fc351cf67777a9
|
||||
|
||||
room:
|
||||
auto_create: false
|
||||
|
||||
logging:
|
||||
level: info
|
||||
16
ntfy/docker-compose.yml
Normal file
16
ntfy/docker-compose.yml
Normal file
@@ -0,0 +1,16 @@
|
||||
services:
|
||||
ntfy:
|
||||
image: binwiederhier/ntfy
|
||||
container_name: ntfy
|
||||
command: serve
|
||||
environment:
|
||||
- NTFY_BASE_URL=https://ntfy.alogins.net
|
||||
- NTFY_CACHE_FILE=/var/lib/ntfy/cache.db
|
||||
- NTFY_AUTH_FILE=/var/lib/ntfy/auth.db
|
||||
- NTFY_AUTH_DEFAULT_ACCESS=deny-all
|
||||
- NTFY_BEHIND_PROXY=true
|
||||
volumes:
|
||||
- /mnt/misc/ntfy:/var/lib/ntfy
|
||||
ports:
|
||||
- "8840:80"
|
||||
restart: unless-stopped
|
||||
@@ -1,12 +1,42 @@
|
||||
services:
|
||||
ollama:
|
||||
image: ollama/ollama
|
||||
container_name: ollama
|
||||
ports:
|
||||
- "11436:11434"
|
||||
volumes:
|
||||
- /mnt/ssd/ai/ollama:/root/.ollama
|
||||
- /mnt/ssd/ai/open-webui:/app/backend/data
|
||||
restart: always
|
||||
environment:
|
||||
# Allow qwen3:8b + qwen2.5:1.5b to coexist in VRAM (~6.7-7.7 GB on 8 GB GPU)
|
||||
- OLLAMA_MAX_LOADED_MODELS=2
|
||||
# One GPU inference at a time — prevents compute contention between models
|
||||
- OLLAMA_NUM_PARALLEL=1
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: all
|
||||
capabilities: [gpu]
|
||||
|
||||
ollama-cpu:
|
||||
image: ollama/ollama
|
||||
container_name: ollama-cpu
|
||||
ports:
|
||||
- "11435:11434"
|
||||
volumes:
|
||||
- /mnt/ssd/ai/ollama-cpu:/root/.ollama
|
||||
restart: always
|
||||
|
||||
open-webui:
|
||||
image: ghcr.io/open-webui/open-webui:ollama
|
||||
image: ghcr.io/open-webui/open-webui:main
|
||||
container_name: open-webui
|
||||
ports:
|
||||
- "3125:8080"
|
||||
volumes:
|
||||
- ollama:/root/.ollama
|
||||
- open-webui:/app/backend/data
|
||||
- /mnt/ssd/ai/open-webui:/app/backend/data
|
||||
restart: always
|
||||
deploy:
|
||||
resources:
|
||||
@@ -18,6 +48,22 @@ services:
|
||||
environment:
|
||||
- ANTHROPIC_API_KEY=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
|
||||
|
||||
volumes:
|
||||
ollama:
|
||||
open-webui:
|
||||
searxng:
|
||||
image: docker.io/searxng/searxng:latest
|
||||
container_name: searxng
|
||||
volumes:
|
||||
- /mnt/ssd/ai/searxng/config/:/etc/searxng/
|
||||
- /mnt/ssd/ai/searxng/data/:/var/cache/searxng/
|
||||
restart: always
|
||||
ports:
|
||||
- "11437:8080"
|
||||
|
||||
qdrant:
|
||||
image: qdrant/qdrant
|
||||
container_name: qdrant
|
||||
ports:
|
||||
- "6333:6333"
|
||||
- "6334:6334"
|
||||
restart: always
|
||||
volumes:
|
||||
- /mnt/ssd/dbs/qdrant:/qdrant/storage:z
|
||||
|
||||
9
otter/docker-compose.yml
Normal file
9
otter/docker-compose.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
services:
|
||||
otterwiki:
|
||||
image: redimp/otterwiki:2
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- 8083:80
|
||||
volumes:
|
||||
- /mnt/ssd/dbs/otter/app-data:/app-data
|
||||
|
||||
58
pihole/docker-compose.yaml
Normal file
58
pihole/docker-compose.yaml
Normal file
@@ -0,0 +1,58 @@
|
||||
|
||||
networks:
|
||||
macvlan-br0:
|
||||
driver: macvlan
|
||||
driver_opts:
|
||||
parent: br0
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 192.168.1.0/24
|
||||
gateway: 192.168.1.1
|
||||
# ip_range: 192.168.1.192/27
|
||||
|
||||
services:
|
||||
pihole:
|
||||
container_name: pihole
|
||||
image: pihole/pihole:latest
|
||||
#ports:
|
||||
# DNS Ports
|
||||
#- "53:53/tcp"
|
||||
#- "53:53/udp"
|
||||
# Default HTTP Port
|
||||
#- "80:80/tcp"
|
||||
# Default HTTPs Port. FTL will generate a self-signed certificate
|
||||
#- "443:443/tcp"
|
||||
# Uncomment the below if using Pi-hole as your DHCP Server
|
||||
#- "67:67/udp"
|
||||
# Uncomment the line below if you are using Pi-hole as your NTP server
|
||||
#- "123:123/udp"
|
||||
|
||||
dns:
|
||||
- 8.8.8.8
|
||||
- 1.1.1.1
|
||||
networks:
|
||||
macvlan-br0:
|
||||
ipv4_address: 192.168.1.2
|
||||
environment:
|
||||
# Set the appropriate timezone for your location from
|
||||
# https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, e.g:
|
||||
TZ: 'Europe/Moscow'
|
||||
# Set a password to access the web interface. Not setting one will result in a random password being assigned
|
||||
FTLCONF_webserver_api_password: 'correct horse 123'
|
||||
# If using Docker's default `bridge` network setting the dns listening mode should be set to 'ALL'
|
||||
FTLCONF_dns_listeningMode: 'ALL'
|
||||
# Volumes store your data between container upgrades
|
||||
volumes:
|
||||
# For persisting Pi-hole's databases and common configuration file
|
||||
- '/mnt/ssd/dbs/pihole:/etc/pihole'
|
||||
# Uncomment the below if you have custom dnsmasq config files that you want to persist. Not needed for most starting fresh with Pi-hole v6. If you're upgrading from v5 you and have used this directory before, you should keep it enabled for the first v6 container start to allow for a complete migration. It can be removed afterwards. Needs environment variable FTLCONF_misc_etc_dnsmasq_d: 'true'
|
||||
#- './etc-dnsmasq.d:/etc/dnsmasq.d'
|
||||
cap_add:
|
||||
# See https://github.com/pi-hole/docker-pi-hole#note-on-capabilities
|
||||
# Required if you are using Pi-hole as your DHCP server, else not needed
|
||||
- NET_ADMIN
|
||||
# Required if you are using Pi-hole as your NTP client to be able to set the host's system time
|
||||
- SYS_TIME
|
||||
# Optional, if Pi-hole should get some more processing time
|
||||
- SYS_NICE
|
||||
restart: unless-stopped
|
||||
44
seafile/backup.sh
Executable file
44
seafile/backup.sh
Executable file
@@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
# Seafile backup script.
|
||||
# Backs up MySQL databases and seafile data directory.
|
||||
# Runs every 3 days via root crontab. Keeps last 5 backups.
|
||||
# Notifies Zabbix (item seafile.backup.ts, id 70369 on AgapHost) after success.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
BACKUP_DIR="/mnt/backups/seafile"
|
||||
DATA_DIR="/mnt/misc/seafile"
|
||||
DATE=$(date '+%Y%m%d-%H%M')
|
||||
DEST="$BACKUP_DIR/$DATE"
|
||||
|
||||
mkdir -p "$DEST"
|
||||
|
||||
# Dump all three Seafile databases from the running container
|
||||
for DB in ccnet_db seafile_db seahub_db; do
|
||||
docker exec seafile-mysql mysqldump \
|
||||
-u seafile -pFWsYYeZa15ro6x \
|
||||
--single-transaction "$DB" > "$DEST/${DB}.sql"
|
||||
echo "Dumped: $DB"
|
||||
done
|
||||
|
||||
# Copy seafile data (libraries, config — excludes mysql and caddy dirs)
|
||||
rsync -a --delete \
|
||||
--exclude='seafile-mysql/' \
|
||||
--exclude='seafile-caddy/' \
|
||||
"$DATA_DIR/" "$DEST/data/"
|
||||
|
||||
echo "$(date): Backup complete: $DEST"
|
||||
ls "$DEST/"
|
||||
|
||||
# Notify Zabbix
|
||||
if [[ -f /root/.zabbix_token ]]; then
|
||||
ZABBIX_TOKEN=$(cat /root/.zabbix_token)
|
||||
curl -s -X POST http://localhost:81/api_jsonrpc.php \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $ZABBIX_TOKEN" \
|
||||
-d "{\"jsonrpc\":\"2.0\",\"method\":\"history.push\",\"id\":1,\"params\":{\"itemid\":\"70369\",\"value\":\"$(date '+%Y-%m-%d %H:%M')\"}}" > /dev/null \
|
||||
&& echo "Zabbix notified."
|
||||
fi
|
||||
|
||||
# Rotate: keep last 5 backups
|
||||
ls -1dt "$BACKUP_DIR"/[0-9]*-[0-9]* 2>/dev/null | tail -n +6 | xargs -r rm -rf
|
||||
26
seafile/caddy.yml
Normal file
26
seafile/caddy.yml
Normal file
@@ -0,0 +1,26 @@
|
||||
services:
|
||||
|
||||
caddy:
|
||||
image: ${SEAFILE_CADDY_IMAGE:-lucaslorentz/caddy-docker-proxy:2.9-alpine}
|
||||
restart: unless-stopped
|
||||
container_name: seafile-caddy
|
||||
ports:
|
||||
- 8077:80
|
||||
- 4433:443
|
||||
environment:
|
||||
- CADDY_INGRESS_NETWORKS=seafile-net
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock
|
||||
- ${SEAFILE_CADDY_VOLUME:-/opt/seafile-caddy}:/data/caddy
|
||||
networks:
|
||||
- seafile-net
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "curl --fail http://localhost:2019/metrics || exit 1"]
|
||||
start_period: 20s
|
||||
interval: 20s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
|
||||
networks:
|
||||
seafile-net:
|
||||
name: seafile-net
|
||||
20
seafile/onlyoffice.yml
Normal file
20
seafile/onlyoffice.yml
Normal file
@@ -0,0 +1,20 @@
|
||||
services:
|
||||
onlyoffice:
|
||||
image: ${ONLYOFFICE_IMAGE:-onlyoffice/documentserver:8.1.0.1}
|
||||
container_name: seafile-onlyoffice
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- JWT_ENABLED=true
|
||||
- JWT_SECRET=${ONLYOFFICE_JWT_SECRET:?Variable is not set or empty}
|
||||
volumes:
|
||||
- "${ONLYOFFICE_VOLUME:-/opt/onlyoffice}:/var/lib/onlyoffice"
|
||||
ports:
|
||||
- "127.0.0.1:6233:80"
|
||||
extra_hosts:
|
||||
- "docs.alogins.net:host-gateway"
|
||||
networks:
|
||||
- seafile-net
|
||||
|
||||
networks:
|
||||
seafile-net:
|
||||
name: seafile-net
|
||||
40
seafile/seadoc.yml
Normal file
40
seafile/seadoc.yml
Normal file
@@ -0,0 +1,40 @@
|
||||
services:
|
||||
|
||||
seadoc:
|
||||
image: ${SEADOC_IMAGE:-seafileltd/sdoc-server:2.0-latest}
|
||||
container_name: seadoc
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- ${SEADOC_VOLUME:-/opt/seadoc-data/}:/shared
|
||||
# ports:
|
||||
# - "80:80"
|
||||
environment:
|
||||
- DB_HOST=${SEAFILE_MYSQL_DB_HOST:-db}
|
||||
- DB_PORT=${SEAFILE_MYSQL_DB_PORT:-3306}
|
||||
- DB_USER=${SEAFILE_MYSQL_DB_USER:-seafile}
|
||||
- DB_PASSWORD=${SEAFILE_MYSQL_DB_PASSWORD:?Variable is not set or empty}
|
||||
- DB_NAME=${SEADOC_MYSQL_DB_NAME:-${SEAFILE_MYSQL_DB_SEAHUB_DB_NAME:-seahub_db}}
|
||||
- TIME_ZONE=${TIME_ZONE:-Etc/UTC}
|
||||
- JWT_PRIVATE_KEY=${JWT_PRIVATE_KEY:?Variable is not set or empty}
|
||||
- NON_ROOT=${NON_ROOT:-false}
|
||||
- SEAHUB_SERVICE_URL=${SEAFILE_SERVICE_URL:-http://seafile}
|
||||
labels:
|
||||
caddy: ${SEAFILE_SERVER_PROTOCOL:-http}://${SEAFILE_SERVER_HOSTNAME:?Variable is not set or empty}
|
||||
caddy.@ws.0_header: "Connection *Upgrade*"
|
||||
caddy.@ws.1_header: "Upgrade websocket"
|
||||
caddy.0_reverse_proxy: "@ws {{upstreams 80}}"
|
||||
caddy.1_handle_path: "/socket.io/*"
|
||||
caddy.1_handle_path.0_rewrite: "* /socket.io{uri}"
|
||||
caddy.1_handle_path.1_reverse_proxy: "{{upstreams 80}}"
|
||||
caddy.2_handle_path: "/sdoc-server/*"
|
||||
caddy.2_handle_path.0_rewrite: "* {uri}"
|
||||
caddy.2_handle_path.1_reverse_proxy: "{{upstreams 80}}"
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
networks:
|
||||
- seafile-net
|
||||
|
||||
networks:
|
||||
seafile-net:
|
||||
name: seafile-net
|
||||
103
seafile/seafile-server.yml
Normal file
103
seafile/seafile-server.yml
Normal file
@@ -0,0 +1,103 @@
|
||||
services:
|
||||
db:
|
||||
image: ${SEAFILE_DB_IMAGE:-mariadb:10.11}
|
||||
container_name: seafile-mysql
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- MYSQL_ROOT_PASSWORD=${INIT_SEAFILE_MYSQL_ROOT_PASSWORD:-}
|
||||
- MYSQL_LOG_CONSOLE=true
|
||||
- MARIADB_AUTO_UPGRADE=1
|
||||
volumes:
|
||||
- "${SEAFILE_MYSQL_VOLUME:-/opt/seafile-mysql/db}:/var/lib/mysql"
|
||||
networks:
|
||||
- seafile-net
|
||||
healthcheck:
|
||||
test:
|
||||
[
|
||||
"CMD",
|
||||
"/usr/local/bin/healthcheck.sh",
|
||||
"--connect",
|
||||
"--mariadbupgrade",
|
||||
"--innodb_initialized",
|
||||
]
|
||||
interval: 20s
|
||||
start_period: 30s
|
||||
timeout: 5s
|
||||
retries: 10
|
||||
|
||||
redis:
|
||||
image: ${SEAFILE_REDIS_IMAGE:-redis}
|
||||
container_name: seafile-redis
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- redis-server --requirepass "$$REDIS_PASSWORD"
|
||||
environment:
|
||||
- REDIS_PASSWORD=${REDIS_PASSWORD:-}
|
||||
networks:
|
||||
- seafile-net
|
||||
|
||||
seafile:
|
||||
image: ${SEAFILE_IMAGE:-seafileltd/seafile-mc:13.0-latest}
|
||||
container_name: seafile
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "127.0.0.1:8078:80"
|
||||
volumes:
|
||||
- ${SEAFILE_VOLUME:-/opt/seafile-data}:/shared
|
||||
environment:
|
||||
- SEAFILE_MYSQL_DB_HOST=${SEAFILE_MYSQL_DB_HOST:-db}
|
||||
- SEAFILE_MYSQL_DB_PORT=${SEAFILE_MYSQL_DB_PORT:-3306}
|
||||
- SEAFILE_MYSQL_DB_USER=${SEAFILE_MYSQL_DB_USER:-seafile}
|
||||
- SEAFILE_MYSQL_DB_PASSWORD=${SEAFILE_MYSQL_DB_PASSWORD:?Variable is not set or empty}
|
||||
- INIT_SEAFILE_MYSQL_ROOT_PASSWORD=${INIT_SEAFILE_MYSQL_ROOT_PASSWORD:-}
|
||||
- SEAFILE_MYSQL_DB_CCNET_DB_NAME=${SEAFILE_MYSQL_DB_CCNET_DB_NAME:-ccnet_db}
|
||||
- SEAFILE_MYSQL_DB_SEAFILE_DB_NAME=${SEAFILE_MYSQL_DB_SEAFILE_DB_NAME:-seafile_db}
|
||||
- SEAFILE_MYSQL_DB_SEAHUB_DB_NAME=${SEAFILE_MYSQL_DB_SEAHUB_DB_NAME:-seahub_db}
|
||||
- TIME_ZONE=${TIME_ZONE:-Etc/UTC}
|
||||
- INIT_SEAFILE_ADMIN_EMAIL=${INIT_SEAFILE_ADMIN_EMAIL:-me@example.com}
|
||||
- INIT_SEAFILE_ADMIN_PASSWORD=${INIT_SEAFILE_ADMIN_PASSWORD:-asecret}
|
||||
- SEAFILE_SERVER_HOSTNAME=${SEAFILE_SERVER_HOSTNAME:?Variable is not set or empty}
|
||||
- SEAFILE_SERVER_PROTOCOL=${SEAFILE_SERVER_PROTOCOL:-http}
|
||||
- SITE_ROOT=${SITE_ROOT:-/}
|
||||
- NON_ROOT=${NON_ROOT:-false}
|
||||
- JWT_PRIVATE_KEY=${JWT_PRIVATE_KEY:?Variable is not set or empty}
|
||||
- SEAFILE_LOG_TO_STDOUT=${SEAFILE_LOG_TO_STDOUT:-false}
|
||||
- ENABLE_GO_FILESERVER=${ENABLE_GO_FILESERVER:-true}
|
||||
- ENABLE_SEADOC=${ENABLE_SEADOC:-true}
|
||||
- SEADOC_SERVER_URL=${SEAFILE_SERVER_PROTOCOL:-http}://${SEAFILE_SERVER_HOSTNAME:?Variable is not set or empty}/sdoc-server
|
||||
- CACHE_PROVIDER=${CACHE_PROVIDER:-redis}
|
||||
- REDIS_HOST=${REDIS_HOST:-redis}
|
||||
- REDIS_PORT=${REDIS_PORT:-6379}
|
||||
- REDIS_PASSWORD=${REDIS_PASSWORD:-}
|
||||
- MEMCACHED_HOST=${MEMCACHED_HOST:-memcached}
|
||||
- MEMCACHED_PORT=${MEMCACHED_PORT:-11211}
|
||||
- ENABLE_NOTIFICATION_SERVER=${ENABLE_NOTIFICATION_SERVER:-false}
|
||||
- INNER_NOTIFICATION_SERVER_URL=${INNER_NOTIFICATION_SERVER_URL:-http://notification-server:8083}
|
||||
- NOTIFICATION_SERVER_URL=${NOTIFICATION_SERVER_URL:-${SEAFILE_SERVER_PROTOCOL:-http}://${SEAFILE_SERVER_HOSTNAME:?Variable is not set or empty}/notification}
|
||||
- ENABLE_SEAFILE_AI=${ENABLE_SEAFILE_AI:-false}
|
||||
- ENABLE_FACE_RECOGNITION=${ENABLE_FACE_RECOGNITION:-false}
|
||||
- SEAFILE_AI_SERVER_URL=${SEAFILE_AI_SERVER_URL:-http://seafile-ai:8888}
|
||||
- SEAFILE_AI_SECRET_KEY=${JWT_PRIVATE_KEY:?Variable is not set or empty}
|
||||
- MD_FILE_COUNT_LIMIT=${MD_FILE_COUNT_LIMIT:-100000}
|
||||
labels:
|
||||
caddy: ${SEAFILE_SERVER_PROTOCOL:-http}://${SEAFILE_SERVER_HOSTNAME:?Variable is not set or empty}
|
||||
caddy.reverse_proxy: "{{upstreams 80}}"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "curl -f http://localhost:80 || exit 1"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_started
|
||||
networks:
|
||||
- seafile-net
|
||||
|
||||
networks:
|
||||
seafile-net:
|
||||
name: seafile-net
|
||||
25
users-backup.sh
Executable file
25
users-backup.sh
Executable file
@@ -0,0 +1,25 @@
|
||||
#!/bin/bash
|
||||
# Backup /mnt/misc/alvis and /mnt/misc/liza to /mnt/backups/users/
|
||||
# Runs every 3 days via root crontab.
|
||||
# Notifies Zabbix (item users.backup.ts, id 70379 on AgapHost) after success.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
DEST=/mnt/backups/users
|
||||
|
||||
mkdir -p "$DEST/alvis" "$DEST/liza"
|
||||
|
||||
rsync -a --delete /mnt/misc/alvis/ "$DEST/alvis/"
|
||||
rsync -a --delete /mnt/misc/liza/ "$DEST/liza/"
|
||||
|
||||
echo "$(date): Backup complete."
|
||||
|
||||
# Notify Zabbix (token stored in /root/.zabbix_token)
|
||||
if [[ -f /root/.zabbix_token ]]; then
|
||||
ZABBIX_TOKEN=$(cat /root/.zabbix_token)
|
||||
curl -s -X POST http://localhost:81/api_jsonrpc.php \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $ZABBIX_TOKEN" \
|
||||
-d "{\"jsonrpc\":\"2.0\",\"method\":\"history.push\",\"id\":1,\"params\":{\"itemid\":\"70379\",\"value\":\"$(date '+%Y-%m-%d %H:%M')\"}}" > /dev/null \
|
||||
&& echo "Zabbix notified."
|
||||
fi
|
||||
41
vaultwarden/backup.sh
Executable file
41
vaultwarden/backup.sh
Executable file
@@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
# Vaultwarden backup — uses built-in container backup command (safe with live DB).
|
||||
# Runs every 3 days via root crontab. Keeps last 5 backups.
|
||||
# Notifies Zabbix (item vaultwarden.backup.ts, id 70368 on AgapHost) after success.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
BACKUP_DIR="/mnt/backups/vaultwarden"
|
||||
DATA_DIR="/mnt/ssd/dbs/vw-data"
|
||||
DATE=$(date '+%Y%m%d-%H%M')
|
||||
DEST="$BACKUP_DIR/$DATE"
|
||||
|
||||
mkdir -p "$DEST"
|
||||
|
||||
# Run built-in backup inside container — writes db_<timestamp>.sqlite3 to /data/ on the host
|
||||
docker exec vaultwarden /vaultwarden backup 2>&1
|
||||
|
||||
# Move the newly created sqlite3 backup file out of the data dir
|
||||
find "$DATA_DIR" -maxdepth 1 -name 'db_*.sqlite3' -newer "$DATA_DIR/db.sqlite3" | xargs -r mv -t "$DEST/"
|
||||
|
||||
# Copy config and RSA keys
|
||||
cp "$DATA_DIR/config.json" "$DEST/"
|
||||
cp "$DATA_DIR"/rsa_key* "$DEST/"
|
||||
[ -d "$DATA_DIR/attachments" ] && cp -r "$DATA_DIR/attachments" "$DEST/"
|
||||
[ -d "$DATA_DIR/sends" ] && cp -r "$DATA_DIR/sends" "$DEST/"
|
||||
|
||||
echo "$(date): Backup complete: $DEST"
|
||||
ls "$DEST/"
|
||||
|
||||
# Notify Zabbix (token stored in /root/.zabbix_token)
|
||||
if [[ -f /root/.zabbix_token ]]; then
|
||||
ZABBIX_TOKEN=$(cat /root/.zabbix_token)
|
||||
curl -s -X POST http://localhost:81/api_jsonrpc.php \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $ZABBIX_TOKEN" \
|
||||
-d "{\"jsonrpc\":\"2.0\",\"method\":\"history.push\",\"id\":1,\"params\":{\"itemid\":\"70368\",\"value\":\"$(date '+%Y-%m-%d %H:%M')\"}}" > /dev/null \
|
||||
&& echo "Zabbix notified."
|
||||
fi
|
||||
|
||||
# Rotate: keep last 5 backups
|
||||
ls -1dt "$BACKUP_DIR"/[0-9]*-[0-9]* 2>/dev/null | tail -n +6 | xargs -r rm -rf
|
||||
12
vaultwarden/docker-compose.yml
Normal file
12
vaultwarden/docker-compose.yml
Normal file
@@ -0,0 +1,12 @@
|
||||
services:
|
||||
vaultwarden:
|
||||
image: vaultwarden/server:latest
|
||||
container_name: vaultwarden
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
DOMAIN: "https://vw.alogins.net"
|
||||
ADMIN_TOKEN: $$argon2id$$v=19$$m=65540,t=3,p=4$$bkE5Y1grLzF4czZiUk9tcWR6WTlGNC9CQmxGeHg0R1JUMFBrY2l0SVZocz0$$hn0snCmQkzDTEBzPYGQxFNmHxTgpxQ+O8OvzOhR3/a0
|
||||
volumes:
|
||||
- /mnt/ssd/dbs/vw-data/:/data/
|
||||
ports:
|
||||
- 127.0.0.1:8041:80
|
||||
Reference in New Issue
Block a user