CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
27 lines
1.3 KiB
Markdown
27 lines
1.3 KiB
Markdown
# openmemory
|
||
|
||
FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.
|
||
|
||
## Tools exposed (MCP)
|
||
|
||
- `add_memory(text, user_id)` — extract facts from a conversation turn and store in Qdrant
|
||
- `search_memory(query, user_id)` — semantic search, returns results with score ≥ 0.5
|
||
- `get_all_memories(user_id)` — dump all stored memories for a session
|
||
|
||
These are called directly by `agent.py` (outside the agent loop), never exposed to the LLM as tools.
|
||
|
||
## Two Ollama instances
|
||
|
||
- **GPU** (`OLLAMA_GPU_URL`, port 11436) — extraction model (`qwen2.5:1.5b`): pulls facts from conversation text
|
||
- **CPU** (`OLLAMA_CPU_URL`, port 11435) — embedding model (`nomic-embed-text`): 50–150 ms per query
|
||
|
||
## Prompts
|
||
|
||
Custom `EXTRACTION_PROMPT` starts with `/no_think` to suppress qwen3 chain-of-thought and get clean JSON output. Custom `UPDATE_MEMORY_PROMPT` handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.
|
||
|
||
## Notes
|
||
|
||
- Qdrant collection is created automatically on first use
|
||
- Memory is keyed by `user_id` which equals `session_id` in `agent.py`
|
||
- Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before `_store_memory()` is scheduled
|