Commit Graph

6 Commits

Author SHA1 Message Date
Alvis
3ed47b45da Split CLAUDE.md per official Claude Code recommendations
CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import
routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas
openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 07:15:51 +00:00
Alvis
eba805f787 Update docs: fast tools, routecheck service, commute tool
- Request flow: add fast_tool_runner.run_matching() to pre-flight gather
- New Fast Tools section: WeatherTool + CommuteTool table, extension guide
- New routecheck section: captcha UI, internal API, proxy requirements
- Services table: add routecheck:8090
- Files tree: add fast_tools.py, routecheck/, updated .env note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 07:10:30 +00:00
Alvis
8cd41940f0 Update docs: streaming, CLI container, use_cases tests
- /stream/{session_id} SSE endpoint replaces /reply/ for CLI
- Medium tier streams per-token via astream() with in_think filtering
- CLI now runs as Docker container (Dockerfile.cli, profile:tools)
- Correct medium model to qwen3:4b with real-time think block filtering
- Add use_cases/ test category to commands section
- Update files tree and services table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:31:36 +00:00
Alvis
021104f510 Split monolithic test_pipeline.py into focused integration test scripts
- common.py: shared config, URL constants, benchmark questions, all helpers
  (get, post_json, check_sse, qdrant_count, fetch_logs, parse_run_block, wait_for, etc.)
- test_health.py: service health checks (deepagents, bifrost, GPU/CPU Ollama, Qdrant, SearXNG)
- test_memory.py: name store/recall pipeline, memory benchmark (5 facts + 10 recalls), dedup test
- test_routing.py: easy/medium/hard tier routing benchmarks with --easy/medium/hard-only flags
- Removed test_pipeline.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 16:02:57 +00:00
Alvis
50097d6092 Embed Crawl4AI at all tiers, restore qwen3:4b medium, update docs
- Pre-routing URL fetch: any message with URLs gets content fetched
  async (httpx.AsyncClient) before routing via _fetch_urls_from_message()
- URL context and memories gathered concurrently with asyncio.gather
- Light tier upgraded to medium when URL content is present
- url_context injected into system prompt for medium and complex agents
- Complex agent retains web_search/fetch_url tools + receives pre-fetched content
- Medium model restored to qwen3:4b (was temporarily qwen2.5:1.5b)
- Unit tests added for _extract_urls
- ARCHITECTURE.md: added Tool Handling, Crawl4AI Integration, Memory Pipeline sections
- CLAUDE.md: updated request flow and Crawl4AI integration docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:49:34 +00:00
Alvis
f9618a9bbf Integrate Bifrost LLM gateway, add test suite, implement memory pipeline
- Add Bifrost (maximhq/bifrost) as LLM gateway: all inference routes through
  bifrost:8080/v1 with retry logic and observability; VRAMManager keeps direct
  Ollama access for VRAM flush/prewarm operations
- Switch medium model from qwen3:4b to qwen2.5:1.5b (direct call, no tools)
  via _DirectModel wrapper; complex keeps create_deep_agent with qwen3:8b
- Implement out-of-agent memory pipeline: _retrieve_memories pre-fetches
  relevant context (injected into all tiers), _store_memory runs as background
  task after each reply writing to openmemory/Qdrant
- Add tests/unit/ with 133 tests covering router, channels, vram_manager,
  agent helpers; move integration test to tests/integration/
- Add bifrost-config.json with GPU Ollama (qwen2.5:0.5b/1.5b, qwen3:4b/8b,
  gemma3:4b) and CPU Ollama providers
- Integration test 28/29 pass (only grammy fails — no TELEGRAM_BOT_TOKEN)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:50:12 +00:00