Split monolithic test_pipeline.py into focused integration test scripts

- common.py: shared config, URL constants, benchmark questions, all helpers (get, post_json, check_sse, qdrant_count, fetch_logs, parse_run_block, wait_for, etc.) - test_health.py: service health checks (deepagents, bifrost, GPU/CPU Ollama, Qdrant, SearXNG) - test_memory.py: name store/recall pipeline, memory benchmark (5 facts + 10 recalls), dedup test - test_routing.py: easy/medium/hard tier routing benchmarks with --easy/medium/hard-only flags - Removed test_pipeline.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 16:02:57 +00:00
parent 50097d6092
commit 021104f510
6 changed files with 1255 additions and 1304 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -14,19 +14,23 @@ docker compose up --build
 python3 cli.py [--url http://localhost:8000] [--session cli-alvis] [--timeout 400]
 ```

-**Run integration tests:**
+**Run integration tests** (from `tests/integration/`, require all Docker services running):
 ```bash
-python3 test_pipeline.py [--chat-id CHAT_ID]
+python3 test_health.py                          # service health: deepagents, bifrost, Ollama, Qdrant, SearXNG

-# Selective sections:
-python3 test_pipeline.py --bench-only      # routing + memory benchmarks only (sections 10–13)
-python3 test_pipeline.py --easy-only       # light-tier routing benchmark
-python3 test_pipeline.py --medium-only     # medium-tier routing benchmark
-python3 test_pipeline.py --hard-only       # complex-tier + VRAM flush benchmark
-python3 test_pipeline.py --memory-only     # memory store/recall/dedup benchmark
-python3 test_pipeline.py --no-bench        # service health + single name store/recall only
+python3 test_memory.py                          # name store/recall + memory benchmark + dedup
+python3 test_memory.py --name-only              # only name store/recall pipeline
+python3 test_memory.py --bench-only             # only 5-fact store + 10-question recall
+python3 test_memory.py --dedup-only             # only deduplication test
+
+python3 test_routing.py                         # all routing benchmarks (easy + medium + hard)
+python3 test_routing.py --easy-only             # light-tier routing benchmark
+python3 test_routing.py --medium-only           # medium-tier routing benchmark
+python3 test_routing.py --hard-only             # complex-tier + VRAM flush benchmark
 ```

+Shared config and helpers are in `tests/integration/common.py`.
+
 ## Architecture

 Adolf is a multi-channel personal assistant. All LLM inference is routed through **Bifrost**, an open-source Go-based LLM gateway that adds retry logic, failover, and observability in front of Ollama.