Server (agent.py):
- _stream_queues: per-session asyncio.Queue for token chunks
- _push_stream_chunk() / _end_stream() helpers
- Medium tier: astream() with <think> block filtering — real token streaming
- Light tier: full reply pushed as single chunk then [DONE]
- Complex tier: full reply pushed after agent completes then [DONE]
- GET /stream/{session_id} SSE endpoint (data: <chunk>\n\n, data: [DONE]\n\n)
- medium_model promoted to module-level global for astream() access
CLI (cli.py):
- stream_reply(): reads /stream/ SSE, renders tokens live with Rich Live (transient)
- Final reply rendered as Markdown after stream completes
- os.getlogin() replaced with os.getenv("USER") for container compatibility
Dockerfile.cli + docker-compose cli service (profiles: tools):
- Run: docker compose --profile tools run --rm -it cli
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.5 KiB
1.5 KiB
Use Case: Apple Pie Research
Verify that a deep research query triggers the complex tier, uses web search and page fetching, and produces a substantive, well-sourced recipe response.
Steps
1. Send the research query (the /think prefix forces complex tier):
curl -s -X POST http://localhost:8000/message \
-H "Content-Type: application/json" \
-d '{"text": "/think what is the best recipe for an apple pie?", "session_id": "use-case-apple-pie", "channel": "cli", "user_id": "claude"}'
2. Wait for the streaming reply (complex tier can take up to 5 minutes):
curl -s -N --max-time 300 "http://localhost:8000/stream/use-case-apple-pie"
3. Confirm tier and tool usage in agent logs:
docker compose -f /home/alvis/adolf/docker-compose.yml logs deepagents \
--since=600s | grep -E "tier=complex|web_search|fetch_url|crawl4ai"
Evaluate (use your judgment)
Check each of the following:
- Tier: logs show
tier=complexfor this session - Tool use: logs show
web_searchorfetch_urlcalls during the request - Ingredients: response lists specific apple pie ingredients (apples, flour, butter, sugar, etc.)
- Method: response includes preparation or baking steps
- Sources: response cites real URLs it fetched, not invented links
- Quality: response is structured and practical — not a refusal, stub, or generic placeholder
Report PASS only if all six criteria are met. For any failure, state which criterion failed and quote the relevant part of the response or logs.