Files
adolf/openmemory/CLAUDE.md
Alvis 3ed47b45da Split CLAUDE.md per official Claude Code recommendations
CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import
routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas
openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 07:15:51 +00:00

1.3 KiB
Raw Permalink Blame History

openmemory

FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.

Tools exposed (MCP)

  • add_memory(text, user_id) — extract facts from a conversation turn and store in Qdrant
  • search_memory(query, user_id) — semantic search, returns results with score ≥ 0.5
  • get_all_memories(user_id) — dump all stored memories for a session

These are called directly by agent.py (outside the agent loop), never exposed to the LLM as tools.

Two Ollama instances

  • GPU (OLLAMA_GPU_URL, port 11436) — extraction model (qwen2.5:1.5b): pulls facts from conversation text
  • CPU (OLLAMA_CPU_URL, port 11435) — embedding model (nomic-embed-text): 50150 ms per query

Prompts

Custom EXTRACTION_PROMPT starts with /no_think to suppress qwen3 chain-of-thought and get clean JSON output. Custom UPDATE_MEMORY_PROMPT handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.

Notes

  • Qdrant collection is created automatically on first use
  • Memory is keyed by user_id which equals session_id in agent.py
  • Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before _store_memory() is scheduled