# openmemory FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text. ## Tools exposed (MCP) - `add_memory(text, user_id)` — extract facts from a conversation turn and store in Qdrant - `search_memory(query, user_id)` — semantic search, returns results with score ≥ 0.5 - `get_all_memories(user_id)` — dump all stored memories for a session These are called directly by `agent.py` (outside the agent loop), never exposed to the LLM as tools. ## Two Ollama instances - **GPU** (`OLLAMA_GPU_URL`, port 11436) — extraction model (`qwen2.5:1.5b`): pulls facts from conversation text - **CPU** (`OLLAMA_CPU_URL`, port 11435) — embedding model (`nomic-embed-text`): 50–150 ms per query ## Prompts Custom `EXTRACTION_PROMPT` starts with `/no_think` to suppress qwen3 chain-of-thought and get clean JSON output. Custom `UPDATE_MEMORY_PROMPT` handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates. ## Notes - Qdrant collection is created automatically on first use - Memory is keyed by `user_id` which equals `session_id` in `agent.py` - Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before `_store_memory()` is scheduled