name	langgraph-checkpoints
description	LangGraph checkpointing and persistence. Use when implementing fault-tolerant workflows, resuming interrupted executions, debugging with state history, or avoiding re-running expensive operations.

LangGraph Checkpointing

Name: langgraph-checkpoints
Author: yonatangross

Persist workflow state for recovery and debugging.

When to Use

Fault-tolerant workflows
Resume after crashes
Debug state at each step
Avoid re-running expensive LLM calls

Checkpointer Options

from langgraph.checkpoint import MemorySaver
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.checkpoint.postgres import PostgresSaver

# Development: In-memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Production: SQLite
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
app = workflow.compile(checkpointer=checkpointer)

# Production: PostgreSQL
checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = workflow.compile(checkpointer=checkpointer)

Using Thread IDs

# Start new workflow
config = {"configurable": {"thread_id": "analysis-123"}}
result = app.invoke(initial_state, config=config)

# Resume interrupted workflow
config = {"configurable": {"thread_id": "analysis-123"}}
result = app.invoke(None, config=config)  # Resumes from checkpoint

PostgreSQL Setup

def create_checkpointer():
    """Create PostgreSQL checkpointer for production."""
    return PostgresSaver.from_conn_string(
        settings.DATABASE_URL,
        save_every=1  # Save after each node
    )

# Compile with checkpointing
app = workflow.compile(
    checkpointer=create_checkpointer(),
    interrupt_before=["quality_gate"]  # Manual review point
)

Inspecting Checkpoints

# Get all checkpoints for a workflow
checkpoints = app.get_state_history(config)

for checkpoint in checkpoints:
    print(f"Step: {checkpoint.metadata['step']}")
    print(f"Node: {checkpoint.metadata['source']}")
    print(f"State: {checkpoint.values}")

# Get current state
current = app.get_state(config)
print(current.values)

Resuming After Crash

import logging

async def run_with_recovery(workflow_id: str, initial_state: dict):
    """Run workflow with automatic recovery."""
    config = {"configurable": {"thread_id": workflow_id}}

    try:
        # Try to resume existing workflow
        state = app.get_state(config)
        if state.values:
            logging.info(f"Resuming workflow {workflow_id}")
            return app.invoke(None, config=config)
    except Exception:
        pass  # No existing checkpoint

    # Start fresh
    logging.info(f"Starting new workflow {workflow_id}")
    return app.invoke(initial_state, config=config)

Step-by-Step Debugging

# Execute one node at a time
for step in app.stream(initial_state, config):
    print(f"After {step['node']}: {step['state']}")
    input("Press Enter to continue...")

# Rollback to previous checkpoint
history = list(app.get_state_history(config))
previous_state = history[1]  # One step back
app.update_state(config, previous_state.values)

Store vs Checkpointer (2026 Best Practice)

from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.store.postgres import PostgresStore

# Checkpointer = SHORT-TERM memory (thread-scoped)
# - Conversation history within a session
# - Workflow state for resume/recovery
# - Scoped to thread_id

checkpointer = PostgresSaver.from_conn_string(DATABASE_URL)

# Store = LONG-TERM memory (cross-thread)
# - User preferences across sessions
# - Learned facts about users
# - Shared across ALL threads for a user

store = PostgresStore.from_conn_string(DATABASE_URL)

# Compile with BOTH for full memory support
app = workflow.compile(
    checkpointer=checkpointer,  # Thread-scoped state
    store=store                  # Cross-thread memory
)

Using Store for Cross-Thread Memory

from langgraph.store.base import BaseStore

async def agent_with_memory(state: AgentState, *, store: BaseStore):
    """Agent that remembers across conversations."""
    user_id = state["user_id"]

    # Read cross-thread memory (user preferences)
    memories = await store.aget(namespace=("users", user_id), key="preferences")

    # Use memories in agent logic
    if memories and memories.value.get("prefers_concise"):
        state["system_prompt"] += "\nBe concise in responses."

    # Update cross-thread memory (learned facts)
    await store.aput(
        namespace=("users", user_id),
        key="last_topic",
        value={"topic": state["current_topic"], "timestamp": datetime.now().isoformat()}
    )

    return state

# Register node with store access
workflow.add_node("agent", agent_with_memory)

Memory Architecture

┌─────────────────────────────────────────────────────────────┐
│                    User: alice                               │
├─────────────────────────────────────────────────────────────┤
│  Thread 1 (chat-001)    │  Thread 2 (chat-002)              │
│  ┌─────────────────┐    │  ┌─────────────────┐              │
│  │ Checkpointer    │    │  │ Checkpointer    │              │
│  │ - msg history   │    │  │ - msg history   │              │
│  │ - workflow pos  │    │  │ - workflow pos  │              │
│  └─────────────────┘    │  └─────────────────┘              │
├─────────────────────────────────────────────────────────────┤
│                     Store (cross-thread)                     │
│  namespace=("users", "alice")                                │
│  - preferences: {prefers_concise: true}                     │
│  - last_topic: {topic: "langgraph", timestamp: "..."}       │
└─────────────────────────────────────────────────────────────┘

Key Decisions

Decision	Recommendation
Development	MemorySaver (fast, no setup)
Production	PostgresSaver (shared, durable)
save_every	1 for expensive nodes, 5 for cheap
Thread ID	Use deterministic ID (workflow_id)
Short-term memory	Checkpointer (thread-scoped)
Long-term memory	Store (cross-thread, namespaced)

Common Mistakes

No checkpointer in production (lose progress)
Random thread IDs (can't resume)
Not handling missing checkpoints
Saving too frequently (overhead)
Using only checkpointer for user preferences (lost across threads)
Not using namespaces in Store (data collisions)

Related Skills

langgraph-state - State design for checkpointing
langgraph-human-in-loop - Interrupt patterns
database-schema-designer - PostgreSQL setup

langgraph-checkpoints

Install Skill

SKILL.md

LangGraph Checkpointing

When to Use

Checkpointer Options

Using Thread IDs

PostgreSQL Setup

Inspecting Checkpoints

Resuming After Crash

Step-by-Step Debugging

Store vs Checkpointer (2026 Best Practice)

Using Store for Cross-Thread Memory

Memory Architecture

Key Decisions

Common Mistakes

Related Skills