Building Agent Memory Systems
Production-grade memory layers for AI agents that persist context across sessions.
Quick Decision
| Need |
Tool |
Why |
| Simple memory for any agent |
Mem0 |
Open-source, Python SDK, minimal setup |
| Claude Code agent memory |
claude-mem |
Automatic hooks, 3-layer retrieval |
| Enterprise/self-editing memory |
Letta (MemGPT) |
Agent-driven memory management |
| Custom memory |
RAG + state management |
Full control |
Mem0: Primary Implementation
Installation
pip install mem0ai
export OPENAI_API_KEY="your-key"
Basic Usage
from mem0 import Memory
m = Memory()
# Add memory from conversation
messages = [
{"role": "user", "content": "Hi, I'm Alex. I love basketball and gaming."},
{"role": "assistant", "content": "Hey Alex! I'll remember your interests."}
]
m.add(messages, user_id="alex")
# Search memories
results = m.search("What do you know about me?", filters={"user_id": "alex"})
# Returns: {"results": [{"memory": "Name is Alex. Enjoys basketball and gaming.", "score": 0.89}]}
Default Configuration (OSS)
- LLM: OpenAI gpt-4.1-nano-2025-04-14 (fact extraction)
- Embeddings: text-embedding-3-small (1536 dims)
- Vector Store: Qdrant (on-disk at /tmp/qdrant)
- History: SQLite at ~/.mem0/history.db
Memory Categories
| Category |
Scope |
Use Case |
user_id |
Individual users |
Preferences, identity, history |
agent_id |
Specific agents |
Agent-specific context |
session_id |
Conversation |
Temporary, conversation-scoped |
run_id |
Single execution |
Within-session partitioning |
Core Operations
# Add with metadata
m.add(
messages,
user_id="alex",
metadata={"category": "preferences", "project": "task-api"}
)
# Search with filters
results = m.search(
"task preferences",
filters={
"user_id": "alex",
"category": "preferences"
}
)
# Search with date filter
results = m.search(
query="recent interactions",
filters={
"AND": [
{"user_id": "alex"},
{"created_at": {"gte": "2024-07-01"}}
]
}
)
# Update memory
m.update(memory_id="mem_123abc", new_content="Updated preference")
# Delete memory
m.delete(memory_id="mem_123abc")
Custom Configuration
from mem0 import Memory
config = {
"llm": {
"provider": "anthropic",
"config": {
"model": "claude-3-5-sonnet-20241022",
"api_key": "your-anthropic-key"
}
},
"embedder": {
"provider": "openai",
"config": {
"model": "text-embedding-3-small"
}
},
"vector_store": {
"provider": "qdrant",
"config": {
"path": "./my_memories" # Custom path
}
}
}
m = Memory.from_config(config)
Memory Architecture Patterns
Five Memory Types
┌────────────────────────────────────────────────────┐
│ AGENT MEMORY │
├─────────────┬──────────────┬───────────────────────┤
│ Conversation│ Working │ Long-term │
│ Memory │ Memory │ Memory │
│ │ │ │
│ Last N │ Current task │ ┌─────────────────┐ │
│ messages │ context │ │ Episodic Memory │ │
│ │ │ │ (events) │ │
│ Volatile │ Task-scoped │ ├─────────────────┤ │
│ │ │ │ Semantic Memory │ │
│ │ │ │ (facts) │ │
│ │ │ └─────────────────┘ │
└─────────────┴──────────────┴───────────────────────┘
| Type |
Persistence |
Scope |
Example |
| Conversation |
Session |
Current chat |
Last 10 messages |
| Working |
Task |
Current task |
Active goal, intermediate results |
| Episodic |
Long-term |
Time-stamped events |
"Last Tuesday you asked about X" |
| Semantic |
Long-term |
Facts/entities |
"Alex works at Acme" |
| Procedural |
Long-term |
Learned patterns |
How user structures tasks |
Letta/MemGPT Two-Tier Architecture
┌─────────────────────────────────────────────────────┐
│ IN-CONTEXT (Core Memory) │
│ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Persona Block│ │ Human Block │ │
│ │ (agent) │ │ (user info) │ │
│ └──────────────┘ └──────────────────────────┘ │
│ Always visible to LLM │
└─────────────────────────────────────────────────────┘
│
│ (tool calls)
▼
┌─────────────────────────────────────────────────────┐
│ OUT-OF-CONTEXT (External Memory) │
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ Archival Memory │ │ Conversation History │ │
│ │ (vector DB) │ │ (full message log) │ │
│ └──────────────────┘ └────────────────────────┘ │
│ Retrieved on-demand via tools │
└─────────────────────────────────────────────────────┘
Self-Editing Tools (Letta pattern):
memory_replace(block, old_text, new_text) - Targeted updates
memory_insert(block, text) - Add new information
memory_rethink(block, new_content) - Complete rewrite
Retrieval Strategies
Recency-Based
def get_recent_memories(user_id: str, limit: int = 5):
"""Most recent memories first."""
return m.search(
query="", # Empty query = recency sort
filters={"user_id": user_id},
top_k=limit
)
Relevance-Based (Semantic)
def get_relevant_memories(user_id: str, query: str, limit: int = 5):
"""Semantic search over memories."""
return m.search(
query=query,
filters={"user_id": user_id},
top_k=limit,
threshold=0.7 # Minimum similarity
)
Entity-Based
def get_entity_memories(user_id: str, entity: str):
"""Memories mentioning specific entity."""
return m.search(
query=entity,
filters={
"user_id": user_id,
"categories": {"contains": "entities"}
}
)
Hybrid Retrieval
def hybrid_retrieve(user_id: str, query: str, limit: int = 10):
"""Combine recency + relevance + entity."""
# Get relevant
relevant = m.search(query, filters={"user_id": user_id}, top_k=limit)
# Extract entities from query
entities = extract_entities(query)
entity_memories = []
for entity in entities:
entity_memories.extend(
m.search(entity, filters={"user_id": user_id}, top_k=3)
)
# Deduplicate and score
all_memories = deduplicate(relevant + entity_memories)
# Weight: 0.5*relevance + 0.3*recency + 0.2*entity_match
return sorted(all_memories, key=weighted_score, reverse=True)[:limit]
Context Window Management
Memory Injection Pattern
def build_prompt_with_memory(user_query: str, user_id: str) -> str:
"""Inject relevant memories before user query."""
memories = m.search(user_query, filters={"user_id": user_id}, top_k=5)
memory_context = "\n".join([
f"- {mem['memory']}" for mem in memories['results']
])
return f"""You know the following about this user:
{memory_context}
User query: {user_query}
"""
Token Budget Management
def fit_memories_to_budget(memories: list, budget_tokens: int = 2000) -> list:
"""Select memories that fit within token budget."""
selected = []
current_tokens = 0
for mem in memories:
mem_tokens = estimate_tokens(mem['memory'])
if current_tokens + mem_tokens <= budget_tokens:
selected.append(mem)
current_tokens += mem_tokens
else:
break
return selected
Hierarchical Summarization
def summarize_old_memories(user_id: str, older_than_days: int = 30):
"""Compress old memories into summaries."""
old_memories = m.search(
query="",
filters={
"user_id": user_id,
"created_at": {"lte": f"{days_ago(older_than_days)}"}
}
)
# Group by topic/category
grouped = group_by_category(old_memories)
# Generate summary per group
for category, memories in grouped.items():
summary = llm.summarize(memories)
m.add(
[{"role": "system", "content": f"Summary of {category}: {summary}"}],
user_id=user_id,
metadata={"type": "summary", "category": category}
)
# Delete original detailed memories
for mem in memories:
m.delete(mem['id'])
FastAPI Integration Pattern
from fastapi import FastAPI, Depends
from mem0 import Memory
app = FastAPI()
memory = Memory()
@app.post("/tasks")
async def create_task_with_memory(task: TaskCreate, user_id: str):
# 1. Retrieve relevant memories
preference_memories = memory.search(
f"task preferences for {task.category}",
filters={"user_id": user_id, "category": "preferences"}
)
pattern_memories = memory.search(
f"past tasks like {task.title}",
filters={"user_id": user_id, "category": "patterns"}
)
# 2. Apply memories to task creation
task = enhance_task_with_preferences(task, preference_memories)
task = estimate_time_from_patterns(task, pattern_memories)
# 3. Create the task
created_task = await create_task(task)
# 4. Store this interaction as new memory
memory.add([
{"role": "user", "content": f"Created task: {task.title} in {task.category}"},
{"role": "assistant", "content": f"Task created with estimated time: {task.estimated_hours}h"}
], user_id=user_id, metadata={"category": "interactions"})
return created_task
claude-mem: Claude Code Memory
Architecture
┌─────────────────────────────────────────────────────┐
│ claude-mem │
├─────────────────────────────────────────────────────┤
│ 5 Lifecycle Hooks: │
│ • SessionStart • PostToolUse • SessionEnd │
│ • UserPromptSubmit • Stop │
├─────────────────────────────────────────────────────┤
│ Worker Service: HTTP API on port 37777 │
│ Web Viewer: http://localhost:37777 │
├─────────────────────────────────────────────────────┤
│ Storage: │
│ • SQLite: Sessions, observations, summaries │
│ • Chroma: Hybrid semantic + keyword search │
└─────────────────────────────────────────────────────┘
3-Layer Token-Efficient Retrieval
Layer 1: search(query, type, limit)
→ Compact index (~50-100 tokens/result)
→ Returns observation IDs for filtering
Layer 2: timeline(observation_id, context_window)
→ Chronological context around observations
→ Understand sequence of events
Layer 3: get_observations(ids)
→ Full details only for filtered IDs
→ ~500-1,000 tokens/result
Result: 10x token savings by filtering before fetching
Privacy Controls
<private>
API_KEY=secret123
Sensitive project data here
</private>
Everything inside <private> tags is excluded from memory.
Memory Prioritization
Relevance Scoring Formula
def calculate_relevance_score(memory, query, current_time):
"""
Score = 0.5 * semantic_similarity
+ 0.3 * recency_decay
+ 0.2 * access_frequency
"""
semantic = cosine_similarity(embed(memory.text), embed(query))
days_old = (current_time - memory.created_at).days
recency = math.exp(-days_old / 30) # Decay over 30 days
frequency = memory.access_count / max_access_count
return 0.5 * semantic + 0.3 * recency + 0.2 * frequency
Contradiction Resolution
def handle_contradiction(old_memory, new_memory):
"""Newer information wins for preferences."""
if new_memory.created_at > old_memory.created_at:
# Update old memory with new information
m.update(old_memory.id, new_memory.content)
return "updated"
return "kept_old"
Safety & Privacy
NEVER Store
- Passwords, API keys, tokens
- SSN, credit card numbers
- Medical records without explicit consent
- Information user explicitly asks to forget
ALWAYS Implement
- User consent before memory creation
- Clear disclosure of what's stored
- Easy opt-out mechanisms (delete all)
- GDPR/CCPA compliance for deletion requests
- Encryption at rest for memory stores
- Access controls for multi-tenant systems
Right to Forget
async def forget_user(user_id: str):
"""GDPR-compliant deletion."""
# Get all user memories
all_memories = m.search("", filters={"user_id": user_id}, top_k=10000)
# Delete each one
for mem in all_memories['results']:
m.delete(mem['id'])
# Audit log
log.info(f"Deleted all memories for user {user_id}")
Anti-Patterns
| Don't |
Do Instead |
| Store everything |
Prioritize by relevance |
| Infinite memory growth |
Regular consolidation cycles |
| No user consent |
Explicit opt-in for memory |
| Store PII freely |
Anonymize or exclude sensitive data |
| Ignore contradictions |
Timestamp-based resolution |
| Skip privacy controls |
Implement <private> patterns |
| Full retrieval every time |
Token-budget constrained retrieval |
References