Claude Code Plugins

Community-maintained marketplace

Feedback

building-with-agent-memory

@panaversity/agentfactory
106
0

Build persistent memory systems for AI agents using Mem0, claude-mem, or custom implementations. Use when adding conversation memory, user preferences, or contextual recall to agents. Covers memory architecture patterns, retrieval strategies, and privacy controls. NOT for RAG systems (use building-rag-systems).

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name building-with-agent-memory
description Build persistent memory systems for AI agents using Mem0, claude-mem, or custom implementations. Use when adding conversation memory, user preferences, or contextual recall to agents. Covers memory architecture patterns, retrieval strategies, and privacy controls. NOT for RAG systems (use building-rag-systems).

Building Agent Memory Systems

Production-grade memory layers for AI agents that persist context across sessions.

Quick Decision

Need Tool Why
Simple memory for any agent Mem0 Open-source, Python SDK, minimal setup
Claude Code agent memory claude-mem Automatic hooks, 3-layer retrieval
Enterprise/self-editing memory Letta (MemGPT) Agent-driven memory management
Custom memory RAG + state management Full control

Mem0: Primary Implementation

Installation

pip install mem0ai
export OPENAI_API_KEY="your-key"

Basic Usage

from mem0 import Memory

m = Memory()

# Add memory from conversation
messages = [
    {"role": "user", "content": "Hi, I'm Alex. I love basketball and gaming."},
    {"role": "assistant", "content": "Hey Alex! I'll remember your interests."}
]
m.add(messages, user_id="alex")

# Search memories
results = m.search("What do you know about me?", filters={"user_id": "alex"})
# Returns: {"results": [{"memory": "Name is Alex. Enjoys basketball and gaming.", "score": 0.89}]}

Default Configuration (OSS)

  • LLM: OpenAI gpt-4.1-nano-2025-04-14 (fact extraction)
  • Embeddings: text-embedding-3-small (1536 dims)
  • Vector Store: Qdrant (on-disk at /tmp/qdrant)
  • History: SQLite at ~/.mem0/history.db

Memory Categories

Category Scope Use Case
user_id Individual users Preferences, identity, history
agent_id Specific agents Agent-specific context
session_id Conversation Temporary, conversation-scoped
run_id Single execution Within-session partitioning

Core Operations

# Add with metadata
m.add(
    messages,
    user_id="alex",
    metadata={"category": "preferences", "project": "task-api"}
)

# Search with filters
results = m.search(
    "task preferences",
    filters={
        "user_id": "alex",
        "category": "preferences"
    }
)

# Search with date filter
results = m.search(
    query="recent interactions",
    filters={
        "AND": [
            {"user_id": "alex"},
            {"created_at": {"gte": "2024-07-01"}}
        ]
    }
)

# Update memory
m.update(memory_id="mem_123abc", new_content="Updated preference")

# Delete memory
m.delete(memory_id="mem_123abc")

Custom Configuration

from mem0 import Memory

config = {
    "llm": {
        "provider": "anthropic",
        "config": {
            "model": "claude-3-5-sonnet-20241022",
            "api_key": "your-anthropic-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    },
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "path": "./my_memories"  # Custom path
        }
    }
}

m = Memory.from_config(config)

Memory Architecture Patterns

Five Memory Types

┌────────────────────────────────────────────────────┐
│                   AGENT MEMORY                      │
├─────────────┬──────────────┬───────────────────────┤
│ Conversation│   Working    │     Long-term         │
│   Memory    │   Memory     │      Memory           │
│             │              │                       │
│ Last N      │ Current task │ ┌─────────────────┐   │
│ messages    │ context      │ │ Episodic Memory │   │
│             │              │ │ (events)        │   │
│ Volatile    │ Task-scoped  │ ├─────────────────┤   │
│             │              │ │ Semantic Memory │   │
│             │              │ │ (facts)         │   │
│             │              │ └─────────────────┘   │
└─────────────┴──────────────┴───────────────────────┘
Type Persistence Scope Example
Conversation Session Current chat Last 10 messages
Working Task Current task Active goal, intermediate results
Episodic Long-term Time-stamped events "Last Tuesday you asked about X"
Semantic Long-term Facts/entities "Alex works at Acme"
Procedural Long-term Learned patterns How user structures tasks

Letta/MemGPT Two-Tier Architecture

┌─────────────────────────────────────────────────────┐
│              IN-CONTEXT (Core Memory)               │
│  ┌──────────────┐  ┌──────────────────────────┐    │
│  │ Persona Block│  │ Human Block              │    │
│  │ (agent)      │  │ (user info)              │    │
│  └──────────────┘  └──────────────────────────┘    │
│              Always visible to LLM                  │
└─────────────────────────────────────────────────────┘
                         │
                         │ (tool calls)
                         ▼
┌─────────────────────────────────────────────────────┐
│            OUT-OF-CONTEXT (External Memory)         │
│  ┌──────────────────┐  ┌────────────────────────┐  │
│  │ Archival Memory  │  │ Conversation History   │  │
│  │ (vector DB)      │  │ (full message log)     │  │
│  └──────────────────┘  └────────────────────────┘  │
│              Retrieved on-demand via tools          │
└─────────────────────────────────────────────────────┘

Self-Editing Tools (Letta pattern):

  • memory_replace(block, old_text, new_text) - Targeted updates
  • memory_insert(block, text) - Add new information
  • memory_rethink(block, new_content) - Complete rewrite

Retrieval Strategies

Recency-Based

def get_recent_memories(user_id: str, limit: int = 5):
    """Most recent memories first."""
    return m.search(
        query="",  # Empty query = recency sort
        filters={"user_id": user_id},
        top_k=limit
    )

Relevance-Based (Semantic)

def get_relevant_memories(user_id: str, query: str, limit: int = 5):
    """Semantic search over memories."""
    return m.search(
        query=query,
        filters={"user_id": user_id},
        top_k=limit,
        threshold=0.7  # Minimum similarity
    )

Entity-Based

def get_entity_memories(user_id: str, entity: str):
    """Memories mentioning specific entity."""
    return m.search(
        query=entity,
        filters={
            "user_id": user_id,
            "categories": {"contains": "entities"}
        }
    )

Hybrid Retrieval

def hybrid_retrieve(user_id: str, query: str, limit: int = 10):
    """Combine recency + relevance + entity."""
    # Get relevant
    relevant = m.search(query, filters={"user_id": user_id}, top_k=limit)

    # Extract entities from query
    entities = extract_entities(query)
    entity_memories = []
    for entity in entities:
        entity_memories.extend(
            m.search(entity, filters={"user_id": user_id}, top_k=3)
        )

    # Deduplicate and score
    all_memories = deduplicate(relevant + entity_memories)

    # Weight: 0.5*relevance + 0.3*recency + 0.2*entity_match
    return sorted(all_memories, key=weighted_score, reverse=True)[:limit]

Context Window Management

Memory Injection Pattern

def build_prompt_with_memory(user_query: str, user_id: str) -> str:
    """Inject relevant memories before user query."""
    memories = m.search(user_query, filters={"user_id": user_id}, top_k=5)

    memory_context = "\n".join([
        f"- {mem['memory']}" for mem in memories['results']
    ])

    return f"""You know the following about this user:
{memory_context}

User query: {user_query}
"""

Token Budget Management

def fit_memories_to_budget(memories: list, budget_tokens: int = 2000) -> list:
    """Select memories that fit within token budget."""
    selected = []
    current_tokens = 0

    for mem in memories:
        mem_tokens = estimate_tokens(mem['memory'])
        if current_tokens + mem_tokens <= budget_tokens:
            selected.append(mem)
            current_tokens += mem_tokens
        else:
            break

    return selected

Hierarchical Summarization

def summarize_old_memories(user_id: str, older_than_days: int = 30):
    """Compress old memories into summaries."""
    old_memories = m.search(
        query="",
        filters={
            "user_id": user_id,
            "created_at": {"lte": f"{days_ago(older_than_days)}"}
        }
    )

    # Group by topic/category
    grouped = group_by_category(old_memories)

    # Generate summary per group
    for category, memories in grouped.items():
        summary = llm.summarize(memories)
        m.add(
            [{"role": "system", "content": f"Summary of {category}: {summary}"}],
            user_id=user_id,
            metadata={"type": "summary", "category": category}
        )

        # Delete original detailed memories
        for mem in memories:
            m.delete(mem['id'])

FastAPI Integration Pattern

from fastapi import FastAPI, Depends
from mem0 import Memory

app = FastAPI()
memory = Memory()

@app.post("/tasks")
async def create_task_with_memory(task: TaskCreate, user_id: str):
    # 1. Retrieve relevant memories
    preference_memories = memory.search(
        f"task preferences for {task.category}",
        filters={"user_id": user_id, "category": "preferences"}
    )

    pattern_memories = memory.search(
        f"past tasks like {task.title}",
        filters={"user_id": user_id, "category": "patterns"}
    )

    # 2. Apply memories to task creation
    task = enhance_task_with_preferences(task, preference_memories)
    task = estimate_time_from_patterns(task, pattern_memories)

    # 3. Create the task
    created_task = await create_task(task)

    # 4. Store this interaction as new memory
    memory.add([
        {"role": "user", "content": f"Created task: {task.title} in {task.category}"},
        {"role": "assistant", "content": f"Task created with estimated time: {task.estimated_hours}h"}
    ], user_id=user_id, metadata={"category": "interactions"})

    return created_task

claude-mem: Claude Code Memory

Architecture

┌─────────────────────────────────────────────────────┐
│                   claude-mem                         │
├─────────────────────────────────────────────────────┤
│  5 Lifecycle Hooks:                                  │
│  • SessionStart    • PostToolUse   • SessionEnd     │
│  • UserPromptSubmit • Stop                          │
├─────────────────────────────────────────────────────┤
│  Worker Service: HTTP API on port 37777              │
│  Web Viewer: http://localhost:37777                  │
├─────────────────────────────────────────────────────┤
│  Storage:                                            │
│  • SQLite: Sessions, observations, summaries        │
│  • Chroma: Hybrid semantic + keyword search         │
└─────────────────────────────────────────────────────┘

3-Layer Token-Efficient Retrieval

Layer 1: search(query, type, limit)
→ Compact index (~50-100 tokens/result)
→ Returns observation IDs for filtering

Layer 2: timeline(observation_id, context_window)
→ Chronological context around observations
→ Understand sequence of events

Layer 3: get_observations(ids)
→ Full details only for filtered IDs
→ ~500-1,000 tokens/result

Result: 10x token savings by filtering before fetching

Privacy Controls

<private>
API_KEY=secret123
Sensitive project data here
</private>

Everything inside <private> tags is excluded from memory.

Memory Prioritization

Relevance Scoring Formula

def calculate_relevance_score(memory, query, current_time):
    """
    Score = 0.5 * semantic_similarity
          + 0.3 * recency_decay
          + 0.2 * access_frequency
    """
    semantic = cosine_similarity(embed(memory.text), embed(query))

    days_old = (current_time - memory.created_at).days
    recency = math.exp(-days_old / 30)  # Decay over 30 days

    frequency = memory.access_count / max_access_count

    return 0.5 * semantic + 0.3 * recency + 0.2 * frequency

Contradiction Resolution

def handle_contradiction(old_memory, new_memory):
    """Newer information wins for preferences."""
    if new_memory.created_at > old_memory.created_at:
        # Update old memory with new information
        m.update(old_memory.id, new_memory.content)
        return "updated"
    return "kept_old"

Safety & Privacy

NEVER Store

  • Passwords, API keys, tokens
  • SSN, credit card numbers
  • Medical records without explicit consent
  • Information user explicitly asks to forget

ALWAYS Implement

  • User consent before memory creation
  • Clear disclosure of what's stored
  • Easy opt-out mechanisms (delete all)
  • GDPR/CCPA compliance for deletion requests
  • Encryption at rest for memory stores
  • Access controls for multi-tenant systems

Right to Forget

async def forget_user(user_id: str):
    """GDPR-compliant deletion."""
    # Get all user memories
    all_memories = m.search("", filters={"user_id": user_id}, top_k=10000)

    # Delete each one
    for mem in all_memories['results']:
        m.delete(mem['id'])

    # Audit log
    log.info(f"Deleted all memories for user {user_id}")

Anti-Patterns

Don't Do Instead
Store everything Prioritize by relevance
Infinite memory growth Regular consolidation cycles
No user consent Explicit opt-in for memory
Store PII freely Anonymize or exclude sensitive data
Ignore contradictions Timestamp-based resolution
Skip privacy controls Implement <private> patterns
Full retrieval every time Token-budget constrained retrieval

References