Split monolithic test_pipeline.py into focused integration test scripts

- common.py: shared config, URL constants, benchmark questions, all helpers
  (get, post_json, check_sse, qdrant_count, fetch_logs, parse_run_block, wait_for, etc.)
- test_health.py: service health checks (deepagents, bifrost, GPU/CPU Ollama, Qdrant, SearXNG)
- test_memory.py: name store/recall pipeline, memory benchmark (5 facts + 10 recalls), dedup test
- test_routing.py: easy/medium/hard tier routing benchmarks with --easy/medium/hard-only flags
- Removed test_pipeline.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Alvis
2026-03-12 16:02:57 +00:00
parent 50097d6092
commit 021104f510
6 changed files with 1255 additions and 1304 deletions

View File

@@ -14,19 +14,23 @@ docker compose up --build
python3 cli.py [--url http://localhost:8000] [--session cli-alvis] [--timeout 400]
```
**Run integration tests:**
**Run integration tests** (from `tests/integration/`, require all Docker services running):
```bash
python3 test_pipeline.py [--chat-id CHAT_ID]
python3 test_health.py # service health: deepagents, bifrost, Ollama, Qdrant, SearXNG
# Selective sections:
python3 test_pipeline.py --bench-only # routing + memory benchmarks only (sections 1013)
python3 test_pipeline.py --easy-only # light-tier routing benchmark
python3 test_pipeline.py --medium-only # medium-tier routing benchmark
python3 test_pipeline.py --hard-only # complex-tier + VRAM flush benchmark
python3 test_pipeline.py --memory-only # memory store/recall/dedup benchmark
python3 test_pipeline.py --no-bench # service health + single name store/recall only
python3 test_memory.py # name store/recall + memory benchmark + dedup
python3 test_memory.py --name-only # only name store/recall pipeline
python3 test_memory.py --bench-only # only 5-fact store + 10-question recall
python3 test_memory.py --dedup-only # only deduplication test
python3 test_routing.py # all routing benchmarks (easy + medium + hard)
python3 test_routing.py --easy-only # light-tier routing benchmark
python3 test_routing.py --medium-only # medium-tier routing benchmark
python3 test_routing.py --hard-only # complex-tier + VRAM flush benchmark
```
Shared config and helpers are in `tests/integration/common.py`.
## Architecture
Adolf is a multi-channel personal assistant. All LLM inference is routed through **Bifrost**, an open-source Go-based LLM gateway that adds retry logic, failover, and observability in front of Ollama.