CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.3 KiB
1.3 KiB
openmemory
FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.
Tools exposed (MCP)
add_memory(text, user_id)— extract facts from a conversation turn and store in Qdrantsearch_memory(query, user_id)— semantic search, returns results with score ≥ 0.5get_all_memories(user_id)— dump all stored memories for a session
These are called directly by agent.py (outside the agent loop), never exposed to the LLM as tools.
Two Ollama instances
- GPU (
OLLAMA_GPU_URL, port 11436) — extraction model (qwen2.5:1.5b): pulls facts from conversation text - CPU (
OLLAMA_CPU_URL, port 11435) — embedding model (nomic-embed-text): 50–150 ms per query
Prompts
Custom EXTRACTION_PROMPT starts with /no_think to suppress qwen3 chain-of-thought and get clean JSON output. Custom UPDATE_MEMORY_PROMPT handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.
Notes
- Qdrant collection is created automatically on first use
- Memory is keyed by
user_idwhich equalssession_idinagent.py - Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before
_store_memory()is scheduled