adolf

Author	SHA1	Message	Date
Alvis	8cd41940f0	Update docs: streaming, CLI container, use_cases tests - /stream/{session_id} SSE endpoint replaces /reply/ for CLI - Medium tier streams per-token via astream() with in_think filtering - CLI now runs as Docker container (Dockerfile.cli, profile:tools) - Correct medium model to qwen3:4b with real-time think block filtering - Add use_cases/ test category to commands section - Update files tree and services table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 17:31:36 +00:00
Alvis	021104f510	Split monolithic test_pipeline.py into focused integration test scripts - common.py: shared config, URL constants, benchmark questions, all helpers (get, post_json, check_sse, qdrant_count, fetch_logs, parse_run_block, wait_for, etc.) - test_health.py: service health checks (deepagents, bifrost, GPU/CPU Ollama, Qdrant, SearXNG) - test_memory.py: name store/recall pipeline, memory benchmark (5 facts + 10 recalls), dedup test - test_routing.py: easy/medium/hard tier routing benchmarks with --easy/medium/hard-only flags - Removed test_pipeline.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 16:02:57 +00:00
Alvis	50097d6092	Embed Crawl4AI at all tiers, restore qwen3:4b medium, update docs - Pre-routing URL fetch: any message with URLs gets content fetched async (httpx.AsyncClient) before routing via _fetch_urls_from_message() - URL context and memories gathered concurrently with asyncio.gather - Light tier upgraded to medium when URL content is present - url_context injected into system prompt for medium and complex agents - Complex agent retains web_search/fetch_url tools + receives pre-fetched content - Medium model restored to qwen3:4b (was temporarily qwen2.5:1.5b) - Unit tests added for _extract_urls - ARCHITECTURE.md: added Tool Handling, Crawl4AI Integration, Memory Pipeline sections - CLAUDE.md: updated request flow and Crawl4AI integration docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 15:49:34 +00:00
Alvis	f9618a9bbf	Integrate Bifrost LLM gateway, add test suite, implement memory pipeline - Add Bifrost (maximhq/bifrost) as LLM gateway: all inference routes through bifrost:8080/v1 with retry logic and observability; VRAMManager keeps direct Ollama access for VRAM flush/prewarm operations - Switch medium model from qwen3:4b to qwen2.5:1.5b (direct call, no tools) via _DirectModel wrapper; complex keeps create_deep_agent with qwen3:8b - Implement out-of-agent memory pipeline: _retrieve_memories pre-fetches relevant context (injected into all tiers), _store_memory runs as background task after each reply writing to openmemory/Qdrant - Add tests/unit/ with 133 tests covering router, channels, vram_manager, agent helpers; move integration test to tests/integration/ - Add bifrost-config.json with GPU Ollama (qwen2.5:0.5b/1.5b, qwen3:4b/8b, gemma3:4b) and CPU Ollama providers - Integration test 28/29 pass (only grammy fails — no TELEGRAM_BOT_TOKEN) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:50:12 +00:00

4 Commits