Update docs: streaming, CLI container, use_cases tests

- /stream/{session_id} SSE endpoint replaces /reply/ for CLI - Medium tier streams per-token via astream() with in_think filtering - CLI now runs as Docker container (Dockerfile.cli, profile:tools) - Correct medium model to qwen3:4b with real-time think block filtering - Add use_cases/ test category to commands section - Update files tree and services table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:31:36 +00:00
parent b04e8a0925
commit 8cd41940f0
2 changed files with 36 additions and 22 deletions
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -18,7 +18,8 @@ Autonomous personal assistant with a multi-channel gateway. Three-tier model rou
 │          │                         │                │
 │          │  POST /message          │  ← all inbound │
 │          │  POST /chat  (legacy)   │                │
-│          │  GET  /reply/{id}  SSE  │  ← CLI polling │
+│          │  GET  /stream/{id} SSE  │  ← token stream│
+│          │  GET  /reply/{id}  SSE  │  ← legacy poll │
 │          │  GET  /health           │                │
 │          │                         │                │
 │          │  channels.py registry   │                │
@@ -42,7 +43,7 @@ Autonomous personal assistant with a multi-channel gateway. Three-tier model rou
 | Channel | session_id | Inbound | Outbound |
 |---------|-----------|---------|---------|
 | Telegram | `tg-<chat_id>` | Grammy long-poll → POST /message | channels.py → POST grammy:3001/send |
-| CLI | `cli-<user>` | POST /message directly | GET /reply/{id} SSE stream |
+| CLI | `cli-<user>` | POST /message directly | GET /stream/{id} SSE — Rich Live streaming |
 | Voice | `voice-<device>` | (future) | (future) |

 ## Unified Message Flow
@@ -58,11 +59,13 @@ Autonomous personal assistant with a multi-channel gateway. Three-tier model rou
 6. router.route() with enriched history (url_context + memories as system msgs)
   - if URL content fetched and tier=light → upgrade to medium
 7. Invoke agent for tier with url_context + memories in system prompt
-8. channels.deliver(session_id, channel, reply_text)
-   - always puts reply in pending_replies[session_id] queue (for SSE)
-   - calls channel-specific send callback
-9. _store_memory() background task — stores turn in openmemory
-10. GET /reply/{session_id} SSE clients receive the reply
+8. Token streaming:
+   - medium: astream() pushes per-token chunks to _stream_queues[session_id]; <think> blocks filtered in real time
+   - light/complex: full reply pushed as single chunk after completion
+   - _end_stream() sends [DONE] sentinel
+9. channels.deliver(session_id, channel, reply_text) — Telegram callback
+10. _store_memory() background task — stores turn in openmemory
+11. GET /stream/{session_id} SSE clients receive chunks; CLI renders with Rich Live + final Markdown
 ```

 ## Tool Handling
@@ -132,15 +135,19 @@ Conversation history is keyed by session_id (5-turn buffer).

 ```
 adolf/
-├── docker-compose.yml      Services: bifrost, deepagents, openmemory, grammy, crawl4ai
+├── docker-compose.yml      Services: bifrost, deepagents, openmemory, grammy, crawl4ai, cli (profile:tools)
 ├── Dockerfile              deepagents container (Python 3.12)
-├── agent.py                FastAPI gateway, run_agent_task, Crawl4AI pre-fetch, memory pipeline
+├── Dockerfile.cli          CLI container (python:3.12-slim + rich)
+├── agent.py                FastAPI gateway, run_agent_task, Crawl4AI pre-fetch, memory pipeline, /stream/ SSE
 ├── channels.py             Channel registry + deliver() + pending_replies
 ├── router.py               Router class — regex + LLM tier classification
 ├── vram_manager.py         VRAMManager — flush/prewarm/poll Ollama VRAM
 ├── agent_factory.py        _DirectModel (medium) / create_deep_agent (complex)
-├── cli.py                  Interactive CLI REPL client
+├── cli.py                  Interactive CLI REPL — Rich Live streaming + Markdown render
 ├── wiki_research.py        Batch wiki research pipeline (uses /message + SSE)
+├── tests/
+│   ├── integration/        Standalone integration test scripts (common.py + test_*.py)
+│   └── use_cases/          Claude Code skill markdown files — Claude acts as user + evaluator
 ├── .env                    TELEGRAM_BOT_TOKEN (not committed)
 ├── openmemory/
 │   ├── server.py           FastMCP + mem0: add_memory, search_memory, get_all_memories