Add use_cases test category as Claude Code skill instructions

Use cases are markdown files that Claude Code reads, executes step by step using its tools, and evaluates with its own judgment — not assertion scripts. - cli_startup.md: pipe EOF into cli.py, verify banner and exit code 0 - apple_pie_research.md: /think query → complex tier → web_search + fetch → evaluate recipe quality, sources, and structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:01:13 +00:00
parent a35ba83db7
commit edc9a96f7a
3 changed files with 59 additions and 46 deletions
--- a/tests/use_cases/apple_pie_research.md
+++ b/tests/use_cases/apple_pie_research.md
@@ -0,0 +1,41 @@
 # Use Case: Apple Pie Research
 Verify that a deep research query triggers the complex tier, uses web search and
 page fetching, and produces a substantive, well-sourced recipe response.
 ## Steps
 **1. Send the research query** (the `/think` prefix forces complex tier):
 ```bash
 curl -s -X POST http://localhost:8000/message \
  -H "Content-Type: application/json" \
  -d '{"text": "/think what is the best recipe for an apple pie?", "session_id": "use-case-apple-pie", "channel": "cli", "user_id": "claude"}'
 ```
 **2. Wait for the reply** via SSE (complex tier can take up to 5 minutes):
 ```bash
 curl -s -N --max-time 300 "http://localhost:8000/reply/use-case-apple-pie"
 ```
 **3. Confirm tier and tool usage in agent logs:**
 ```bash
 docker compose -f /home/alvis/adolf/docker-compose.yml logs deepagents \
  --since=600s --no-log-prefix | grep -E "tier=complex|web_search|fetch_url|crawl4ai"
 ```
 ## Evaluate (use your judgment)
 Check each of the following:
 - **Tier**: logs show `tier=complex` for this session
 - **Tool use**: logs show `web_search` or `fetch_url` calls during the request
 - **Ingredients**: response lists specific apple pie ingredients (apples, flour, butter, sugar, etc.)
 - **Method**: response includes preparation or baking steps
 - **Sources**: response cites real URLs it fetched, not invented links
 - **Quality**: response is structured and practical — not a refusal, stub, or generic placeholder
 Report PASS only if all six criteria are met. For any failure, state which criterion
 failed and quote the relevant part of the response or logs.
--- a/tests/use_cases/cli_startup.md
+++ b/tests/use_cases/cli_startup.md
@@ -0,0 +1,18 @@
 # Use Case: CLI Startup
 Verify the Adolf CLI starts cleanly and exits without error when the user closes input.
 ## Steps
 Run the CLI with empty stdin (simulates user pressing Ctrl+D immediately):
 ```bash
 echo "" | python3 /home/alvis/adolf/cli.py --session use-case-cli-startup
 echo "exit code: $?"
 ```
 ## Pass if
 - Output contains `Adolf CLI`
 - Output contains the session name and gateway URL
 - Exit code is 0
--- a/tests/use_cases/test_cli_startup.py
+++ b/tests/use_cases/test_cli_startup.py
@@ -1,46 +0,0 @@
 #!/usr/bin/env python3
 """
 Use case: CLI startup and clean exit.
 Starts the Adolf CLI, reads its welcome banner, then closes it by sending
 EOF (simulating Ctrl+D). Prints a structured transcript for the Claude Code
 agent to evaluate.
 Expected:
  - Banner line contains "Adolf CLI"
  - Prompt "> " appears
  - Process exits with code 0 after EOF
 """
 import os
 import subprocess
 import sys
 import time
 CLI = os.path.join(os.path.dirname(__file__), "../../cli.py")
 proc = subprocess.Popen(
    [sys.executable, CLI, "--session", "use-case-cli-startup"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True,
 )
 # Give the process time to print its banner before closing stdin
 time.sleep(0.3)
 try:
    stdout, stderr = proc.communicate(input="", timeout=10)
 except subprocess.TimeoutExpired:
    proc.kill()
    stdout, stderr = proc.communicate()
    print("RESULT: TIMEOUT — process did not exit within 10s after EOF")
    sys.exit(1)
 print("=== stdout ===")
 print(stdout)
 if stderr.strip():
    print("=== stderr ===")
    print(stderr)
 print(f"=== exit code: {proc.returncode} ===")