adolf

Author	SHA1	Message	Date
Alvis	8cd41940f0	Update docs: streaming, CLI container, use_cases tests - /stream/{session_id} SSE endpoint replaces /reply/ for CLI - Medium tier streams per-token via astream() with in_think filtering - CLI now runs as Docker container (Dockerfile.cli, profile:tools) - Correct medium model to qwen3:4b with real-time think block filtering - Add use_cases/ test category to commands section - Update files tree and services table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 17:31:36 +00:00
Alvis	50097d6092	Embed Crawl4AI at all tiers, restore qwen3:4b medium, update docs - Pre-routing URL fetch: any message with URLs gets content fetched async (httpx.AsyncClient) before routing via _fetch_urls_from_message() - URL context and memories gathered concurrently with asyncio.gather - Light tier upgraded to medium when URL content is present - url_context injected into system prompt for medium and complex agents - Complex agent retains web_search/fetch_url tools + receives pre-fetched content - Medium model restored to qwen3:4b (was temporarily qwen2.5:1.5b) - Unit tests added for _extract_urls - ARCHITECTURE.md: added Tool Handling, Crawl4AI Integration, Memory Pipeline sections - CLAUDE.md: updated request flow and Crawl4AI integration docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 15:49:34 +00:00
Alvis	ec45d255f0	wiki search people tested pipeline	2026-03-05 11:22:34 +00:00
Alvis	ea77b2308b	Add three-tier model routing with VRAM management and benchmark suite - Three-tier routing: light (router answers directly ~3s), medium (qwen3:4b + tools ~60s), complex (/think prefix → qwen3:8b + subagents ~140s) - Router: qwen2.5:1.5b, temp=0, regex pre-classifier + raw-text LLM classify - VRAMManager: explicit flush/poll/prewarm to prevent Ollama CPU-spill bug - agent_factory: build_medium_agent and build_complex_agent using deepagents (TodoListMiddleware + SubAgentMiddleware with research/memory subagents) - Fix: split Telegram replies >4000 chars into multiple messages - Benchmark: 30 questions (easy/medium/hard) — 10/10/10 verified passing easy→light, medium→medium, hard→complex with VRAM flush confirmed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-28 17:54:51 +00:00
Alvis	66ab93aa37	Add Adolf architecture doc and integration test script - ARCHITECTURE.md: comprehensive pipeline description (copied from Gitea wiki) - test_pipeline.py: tests all services, memory, async timing, and recall Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-23 04:52:40 +00:00

5 Commits