Server (agent.py):
- _stream_queues: per-session asyncio.Queue for token chunks
- _push_stream_chunk() / _end_stream() helpers
- Medium tier: astream() with <think> block filtering — real token streaming
- Light tier: full reply pushed as single chunk then [DONE]
- Complex tier: full reply pushed after agent completes then [DONE]
- GET /stream/{session_id} SSE endpoint (data: <chunk>\n\n, data: [DONE]\n\n)
- medium_model promoted to module-level global for astream() access
CLI (cli.py):
- stream_reply(): reads /stream/ SSE, renders tokens live with Rich Live (transient)
- Final reply rendered as Markdown after stream completes
- os.getlogin() replaced with os.getenv("USER") for container compatibility
Dockerfile.cli + docker-compose cli service (profiles: tools):
- Run: docker compose --profile tools run --rm -it cli
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
91 lines
2.3 KiB
YAML
91 lines
2.3 KiB
YAML
services:
|
|
bifrost:
|
|
image: maximhq/bifrost
|
|
container_name: bifrost
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- ./bifrost-config.json:/app/data/config.json:ro
|
|
environment:
|
|
- APP_DIR=/app/data
|
|
- APP_PORT=8080
|
|
- LOG_LEVEL=info
|
|
extra_hosts:
|
|
- "host.docker.internal:host-gateway"
|
|
restart: unless-stopped
|
|
|
|
deepagents:
|
|
build: .
|
|
container_name: deepagents
|
|
ports:
|
|
- "8000:8000"
|
|
environment:
|
|
- PYTHONUNBUFFERED=1
|
|
# Bifrost gateway — all LLM inference goes through here
|
|
- BIFROST_URL=http://bifrost:8080/v1
|
|
# Direct Ollama GPU URL — used only by VRAMManager for flush/prewarm
|
|
- OLLAMA_BASE_URL=http://host.docker.internal:11436
|
|
- DEEPAGENTS_MODEL=qwen3:4b
|
|
- DEEPAGENTS_COMPLEX_MODEL=qwen3:8b
|
|
- DEEPAGENTS_ROUTER_MODEL=qwen2.5:1.5b
|
|
- SEARXNG_URL=http://host.docker.internal:11437
|
|
- GRAMMY_URL=http://grammy:3001
|
|
- CRAWL4AI_URL=http://crawl4ai:11235
|
|
extra_hosts:
|
|
- "host.docker.internal:host-gateway"
|
|
depends_on:
|
|
- openmemory
|
|
- grammy
|
|
- crawl4ai
|
|
- bifrost
|
|
restart: unless-stopped
|
|
|
|
openmemory:
|
|
build: ./openmemory
|
|
container_name: openmemory
|
|
ports:
|
|
- "8765:8765"
|
|
environment:
|
|
# Extraction LLM runs on GPU — qwen2.5:1.5b for speed (~3s)
|
|
- OLLAMA_GPU_URL=http://host.docker.internal:11436
|
|
- OLLAMA_EXTRACTION_MODEL=qwen2.5:1.5b
|
|
# Embedding (nomic-embed-text) runs on CPU — fast enough for search (50-150ms)
|
|
- OLLAMA_CPU_URL=http://host.docker.internal:11435
|
|
extra_hosts:
|
|
- "host.docker.internal:host-gateway"
|
|
restart: unless-stopped
|
|
|
|
grammy:
|
|
build: ./grammy
|
|
container_name: grammy
|
|
ports:
|
|
- "3001:3001"
|
|
environment:
|
|
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
|
|
- DEEPAGENTS_URL=http://deepagents:8000
|
|
restart: unless-stopped
|
|
|
|
cli:
|
|
build:
|
|
context: .
|
|
dockerfile: Dockerfile.cli
|
|
container_name: cli
|
|
environment:
|
|
- DEEPAGENTS_URL=http://deepagents:8000
|
|
depends_on:
|
|
- deepagents
|
|
stdin_open: true
|
|
tty: true
|
|
profiles:
|
|
- tools
|
|
|
|
crawl4ai:
|
|
image: unclecode/crawl4ai:latest
|
|
container_name: crawl4ai
|
|
ports:
|
|
- "11235:11235"
|
|
environment:
|
|
- CRAWL4AI_LOG_LEVEL=WARNING
|
|
shm_size: "1g"
|
|
restart: unless-stopped
|