Claude Code Plugins

Community-maintained marketplace

Feedback

chunking-strategies

@jpoutrin/product-forge
2
0

Document chunking strategies for RAG systems. Use when implementing document processing pipelines to determine optimal chunking approaches based on document type and retrieval requirements.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name chunking-strategies
description Document chunking strategies for RAG systems. Use when implementing document processing pipelines to determine optimal chunking approaches based on document type and retrieval requirements.

Chunking Strategies Skill

This skill provides chunking strategies for RAG document processing.

Chunking Methods

1. Fixed-Size Chunking

def fixed_size_chunk(text: str, chunk_size: int = 500, overlap: int = 50):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

2. Semantic Chunking

Split on natural boundaries (sentences, paragraphs).

def semantic_chunk(text: str, max_tokens: int = 500):
    paragraphs = text.split("\n\n")
    chunks = []
    current_chunk = []
    current_tokens = 0

    for para in paragraphs:
        para_tokens = count_tokens(para)
        if current_tokens + para_tokens > max_tokens:
            chunks.append("\n\n".join(current_chunk))
            current_chunk = [para]
            current_tokens = para_tokens
        else:
            current_chunk.append(para)
            current_tokens += para_tokens

    if current_chunk:
        chunks.append("\n\n".join(current_chunk))
    return chunks

3. Recursive Chunking

Hierarchical splitting on multiple separators.

SEPARATORS = ["\n\n", "\n", ". ", " "]

def recursive_chunk(text: str, max_size: int, separators: list[str]):
    if len(text) <= max_size:
        return [text]

    sep = separators[0] if separators else ""
    chunks = []
    parts = text.split(sep)

    for part in parts:
        if len(part) <= max_size:
            chunks.append(part)
        elif len(separators) > 1:
            chunks.extend(recursive_chunk(part, max_size, separators[1:]))
        else:
            chunks.append(part[:max_size])

    return chunks

Chunking by Document Type

Document Type Recommended Strategy Chunk Size
Technical docs Semantic (headers) 500-1000 tokens
Legal documents Semantic (sections) 1000-2000 tokens
Code Function/class based 200-500 tokens
Conversations Message boundaries 100-300 tokens
General text Recursive 300-500 tokens

Chunk Enrichment

@dataclass
class EnrichedChunk:
    content: str
    metadata: dict
    summary: str  # LLM-generated
    keywords: list[str]
    parent_id: str  # For hierarchical retrieval

Best Practices

  • Add overlap between chunks (10-20%)
  • Preserve semantic boundaries
  • Include metadata (source, position)
  • Consider hierarchical chunking for long docs
  • Test retrieval quality with different sizes