Split CLAUDE.md per official Claude Code recommendations

CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import
routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas
openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Alvis
2026-03-13 07:15:51 +00:00
parent eba805f787
commit 3ed47b45da
3 changed files with 74 additions and 160 deletions

26
openmemory/CLAUDE.md Normal file
View File

@@ -0,0 +1,26 @@
# openmemory
FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.
## Tools exposed (MCP)
- `add_memory(text, user_id)` — extract facts from a conversation turn and store in Qdrant
- `search_memory(query, user_id)` — semantic search, returns results with score ≥ 0.5
- `get_all_memories(user_id)` — dump all stored memories for a session
These are called directly by `agent.py` (outside the agent loop), never exposed to the LLM as tools.
## Two Ollama instances
- **GPU** (`OLLAMA_GPU_URL`, port 11436) — extraction model (`qwen2.5:1.5b`): pulls facts from conversation text
- **CPU** (`OLLAMA_CPU_URL`, port 11435) — embedding model (`nomic-embed-text`): 50150 ms per query
## Prompts
Custom `EXTRACTION_PROMPT` starts with `/no_think` to suppress qwen3 chain-of-thought and get clean JSON output. Custom `UPDATE_MEMORY_PROMPT` handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.
## Notes
- Qdrant collection is created automatically on first use
- Memory is keyed by `user_id` which equals `session_id` in `agent.py`
- Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before `_store_memory()` is scheduled