Files

Alvis 3ed47b45da Split CLAUDE.md per official Claude Code recommendations

CLAUDE.md: lean — commands, key conventions, fast tool guide, @ARCHITECTURE.md import
routecheck/CLAUDE.md: purpose, access paths, env vars, gotchas
openmemory/CLAUDE.md: tools, two Ollama instances, prompts, notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-13 07:15:51 +00:00

1.3 KiB

Raw Permalink Blame History

openmemory

FastMCP server wrapping mem0 for persistent per-session memory, backed by Qdrant + nomic-embed-text.

Tools exposed (MCP)

add_memory(text, user_id) — extract facts from a conversation turn and store in Qdrant
search_memory(query, user_id) — semantic search, returns results with score ≥ 0.5
get_all_memories(user_id) — dump all stored memories for a session

These are called directly by agent.py (outside the agent loop), never exposed to the LLM as tools.

Two Ollama instances

GPU (OLLAMA_GPU_URL, port 11436) — extraction model (qwen2.5:1.5b): pulls facts from conversation text
CPU (OLLAMA_CPU_URL, port 11435) — embedding model (nomic-embed-text): 50–150 ms per query

Prompts

Custom EXTRACTION_PROMPT starts with /no_think to suppress qwen3 chain-of-thought and get clean JSON output. Custom UPDATE_MEMORY_PROMPT handles deduplication — mem0 merges new facts with existing ones rather than creating duplicates.

Notes

Qdrant collection is created automatically on first use
Memory is keyed by user_id which equals session_id in agent.py
Extraction runs after the reply is sent (background task) — GPU contention with medium model is avoided since the semaphore is released before _store_memory() is scheduled

1.3 KiB Raw Permalink Blame History Unescape Escape