Use cases are markdown files that Claude Code reads, executes step by step using its tools, and evaluates with its own judgment — not assertion scripts. - cli_startup.md: pipe EOF into cli.py, verify banner and exit code 0 - apple_pie_research.md: /think query → complex tier → web_search + fetch → evaluate recipe quality, sources, and structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
42 lines
1.5 KiB
Markdown
42 lines
1.5 KiB
Markdown
# Use Case: Apple Pie Research
|
|
|
|
Verify that a deep research query triggers the complex tier, uses web search and
|
|
page fetching, and produces a substantive, well-sourced recipe response.
|
|
|
|
## Steps
|
|
|
|
**1. Send the research query** (the `/think` prefix forces complex tier):
|
|
|
|
```bash
|
|
curl -s -X POST http://localhost:8000/message \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"text": "/think what is the best recipe for an apple pie?", "session_id": "use-case-apple-pie", "channel": "cli", "user_id": "claude"}'
|
|
```
|
|
|
|
**2. Wait for the reply** via SSE (complex tier can take up to 5 minutes):
|
|
|
|
```bash
|
|
curl -s -N --max-time 300 "http://localhost:8000/reply/use-case-apple-pie"
|
|
```
|
|
|
|
**3. Confirm tier and tool usage in agent logs:**
|
|
|
|
```bash
|
|
docker compose -f /home/alvis/adolf/docker-compose.yml logs deepagents \
|
|
--since=600s --no-log-prefix | grep -E "tier=complex|web_search|fetch_url|crawl4ai"
|
|
```
|
|
|
|
## Evaluate (use your judgment)
|
|
|
|
Check each of the following:
|
|
|
|
- **Tier**: logs show `tier=complex` for this session
|
|
- **Tool use**: logs show `web_search` or `fetch_url` calls during the request
|
|
- **Ingredients**: response lists specific apple pie ingredients (apples, flour, butter, sugar, etc.)
|
|
- **Method**: response includes preparation or baking steps
|
|
- **Sources**: response cites real URLs it fetched, not invented links
|
|
- **Quality**: response is structured and practical — not a refusal, stub, or generic placeholder
|
|
|
|
Report PASS only if all six criteria are met. For any failure, state which criterion
|
|
failed and quote the relevant part of the response or logs.
|