Split CLAUDE.md per official Claude Code recommendations

CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 07:15:51 +00:00
parent eba805f787
commit 3ed47b45da
3 changed files with 74 additions and 160 deletions
--- a/openmemory/CLAUDE.md
+++ b/openmemory/CLAUDE.md
@@ -0,0 +1,26 @@
+# openmemory
+
+FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.
+
+## Tools exposed (MCP)
+
+- `add_memory(text, user_id)` — extract facts from a conversation turn and store in Qdrant
+- `search_memory(query, user_id)` — semantic search, returns results with score ≥ 0.5
+- `get_all_memories(user_id)` — dump all stored memories for a session
+
+These are called directly by `agent.py` (outside the agent loop), never exposed to the LLM as tools.
+
+## Two Ollama instances
+
+- **GPU** (`OLLAMA_GPU_URL`, port 11436) — extraction model (`qwen2.5:1.5b`): pulls facts from conversation text
+- **CPU** (`OLLAMA_CPU_URL`, port 11435) — embedding model (`nomic-embed-text`): 50–150 ms per query
+
+## Prompts
+
+Custom `EXTRACTION_PROMPT` starts with `/no_think` to suppress qwen3 chain-of-thought and get clean JSON output. Custom `UPDATE_MEMORY_PROMPT` handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.
+
+## Notes
+
+- Qdrant collection is created automatically on first use
+- Memory is keyed by `user_id` which equals `session_id` in `agent.py`
+- Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before `_store_memory()` is scheduled