- CLAUDE.md: add benchmark commands (run_benchmark.py flags, dry-run, categories, voice benchmark) - README.md: add benchmarks/ to Files tree; fix incorrect claim that complex tier requires /think prefix — it is auto-classified via regex and embedding similarity; fix "Complex agent (/think prefix)" heading Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
42 lines
1.4 KiB
Markdown
42 lines
1.4 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Start all services
|
|
docker compose up --build
|
|
|
|
# Interactive CLI (requires services running)
|
|
docker compose --profile tools run --rm -it cli
|
|
|
|
# Integration tests — run from tests/integration/, require all services up
|
|
python3 test_health.py
|
|
python3 test_memory.py [--name-only|--bench-only|--dedup-only]
|
|
python3 test_routing.py [--easy-only|--medium-only|--hard-only]
|
|
|
|
# Use case tests — read the .md file and follow its steps as Claude Code
|
|
# example: read tests/use_cases/weather_now.md and execute it
|
|
|
|
# Routing benchmark — measures tier classification accuracy across 120 queries
|
|
# Run from benchmarks/ — Adolf must be running. DO NOT run during active use (holds GPU).
|
|
cd benchmarks
|
|
python3 run_benchmark.py # full run (120 queries)
|
|
python3 run_benchmark.py --tier light # light tier only (30 queries)
|
|
python3 run_benchmark.py --tier medium # medium tier only (50 queries)
|
|
python3 run_benchmark.py --tier complex --dry-run # complex tier, medium model (no API cost)
|
|
python3 run_benchmark.py --category smart_home_control
|
|
python3 run_benchmark.py --ids 1,2,3
|
|
python3 run_benchmark.py --list-categories
|
|
|
|
# Voice benchmark
|
|
python3 run_voice_benchmark.py
|
|
|
|
# benchmark.json (dataset) and results_latest.json are gitignored — not committed
|
|
```
|
|
|
|
## Architecture
|
|
|
|
@README.md
|