Commit Graph

5 Commits

Author SHA1 Message Date
Alvis
8cd41940f0 Update docs: streaming, CLI container, use_cases tests
- /stream/{session_id} SSE endpoint replaces /reply/ for CLI
- Medium tier streams per-token via astream() with in_think filtering
- CLI now runs as Docker container (Dockerfile.cli, profile:tools)
- Correct medium model to qwen3:4b with real-time think block filtering
- Add use_cases/ test category to commands section
- Update files tree and services table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:31:36 +00:00
Alvis
50097d6092 Embed Crawl4AI at all tiers, restore qwen3:4b medium, update docs
- Pre-routing URL fetch: any message with URLs gets content fetched
  async (httpx.AsyncClient) before routing via _fetch_urls_from_message()
- URL context and memories gathered concurrently with asyncio.gather
- Light tier upgraded to medium when URL content is present
- url_context injected into system prompt for medium and complex agents
- Complex agent retains web_search/fetch_url tools + receives pre-fetched content
- Medium model restored to qwen3:4b (was temporarily qwen2.5:1.5b)
- Unit tests added for _extract_urls
- ARCHITECTURE.md: added Tool Handling, Crawl4AI Integration, Memory Pipeline sections
- CLAUDE.md: updated request flow and Crawl4AI integration docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:49:34 +00:00
Alvis
ec45d255f0 wiki search people tested pipeline 2026-03-05 11:22:34 +00:00
Alvis
ea77b2308b Add three-tier model routing with VRAM management and benchmark suite
- Three-tier routing: light (router answers directly ~3s), medium (qwen3:4b
  + tools ~60s), complex (/think prefix → qwen3:8b + subagents ~140s)
- Router: qwen2.5:1.5b, temp=0, regex pre-classifier + raw-text LLM classify
- VRAMManager: explicit flush/poll/prewarm to prevent Ollama CPU-spill bug
- agent_factory: build_medium_agent and build_complex_agent using deepagents
  (TodoListMiddleware + SubAgentMiddleware with research/memory subagents)
- Fix: split Telegram replies >4000 chars into multiple messages
- Benchmark: 30 questions (easy/medium/hard) — 10/10/10 verified passing
  easy→light, medium→medium, hard→complex with VRAM flush confirmed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 17:54:51 +00:00
Alvis
66ab93aa37 Add Adolf architecture doc and integration test script
- ARCHITECTURE.md: comprehensive pipeline description (copied from Gitea wiki)
- test_pipeline.py: tests all services, memory, async timing, and recall

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 04:52:40 +00:00