Commit Graph

4 Commits

Author SHA1 Message Date
Alvis
ea77b2308b Add three-tier model routing with VRAM management and benchmark suite
- Three-tier routing: light (router answers directly ~3s), medium (qwen3:4b
  + tools ~60s), complex (/think prefix → qwen3:8b + subagents ~140s)
- Router: qwen2.5:1.5b, temp=0, regex pre-classifier + raw-text LLM classify
- VRAMManager: explicit flush/poll/prewarm to prevent Ollama CPU-spill bug
- agent_factory: build_medium_agent and build_complex_agent using deepagents
  (TodoListMiddleware + SubAgentMiddleware with research/memory subagents)
- Fix: split Telegram replies >4000 chars into multiple messages
- Benchmark: 30 questions (easy/medium/hard) — 10/10/10 verified passing
  easy→light, medium→medium, hard→complex with VRAM flush confirmed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 17:54:51 +00:00
Alvis
1718d70203 Fix system prompt: agent now correctly handles memory requests
- Tell agent that memory is saved automatically after every reply
- Instruct agent to never say it cannot store information
- Instruct agent to acknowledge and confirm when user asks to remember something
- Fix misleading startup log (gemma3:1b → qwen2.5:1.5b)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 05:22:08 +00:00
Alvis
19e2c27976 Switch extraction model to qwen2.5:1.5b, fix mem0migrations dims, update tests
- openmemory: use qwen2.5:1.5b instead of gemma3:1b for fact extraction
- test_pipeline.py: check qwen2.5:1.5b, fix SSE checks, fix Qdrant payload
  parsing, relax SearXNG threshold to 5s, improve marker word test
- potential-directions.md: ranked CPU extraction model candidates
- Root cause: mem0migrations collection had stale 1536-dim vectors causing
  silent dedup failures; recreate both collections at 768 dims

All 18 pipeline tests now pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 05:11:29 +00:00
Alvis
66ab93aa37 Add Adolf architecture doc and integration test script
- ARCHITECTURE.md: comprehensive pipeline description (copied from Gitea wiki)
- test_pipeline.py: tests all services, memory, async timing, and recall

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 04:52:40 +00:00