Claude Code Plugins

Community-maintained marketplace

Feedback

context-engineering-framework

@doctorduke/claude-config
0
0

Optimize context windows, manage token budgets, compress context, and create effective handoff documents for long-running multi-agent workflows

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name context-engineering-framework
description Optimize context windows, manage token budgets, compress context, and create effective handoff documents for long-running multi-agent workflows
allowed-tools Read, Write, Edit, Grep, Glob, Bash, WebFetch

Context Engineering Framework

Purpose

Context engineering is critical for managing the finite resource of LLM context windows. This skill provides systematic approaches to:

  • Measure token usage and identify optimization opportunities
  • Compress context without losing critical information
  • Optimize token budget allocation across workflow stages
  • Handoff work between agents with complete state preservation

Token costs scale linearly with context size - a 100k token context costs 10x more than 10k. Poor context management leads to:

  • Failed completions when context overflows
  • Degraded quality from truncated information
  • Increased costs from redundant context
  • Lost work when sessions can't resume properly

Quick Start: 4-Step Process

Step 1: Measure Current Context

# Count tokens in your context
from tiktoken import encoding_for_model
enc = encoding_for_model("gpt-4")
tokens = len(enc.encode(context_text))
print(f"Current context: {tokens:,} tokens")

Step 2: Compress Context

# Apply compression techniques
compressed = apply_compression(context_text, {
    'remove_whitespace': True,
    'deduplicate': True,
    'summarize_verbose': True,
    'extract_references': True
})
print(f"Compressed to: {len(enc.encode(compressed)):,} tokens")

Step 3: Optimize Budget Allocation

# Allocate tokens across workflow stages
budget = TokenBudget(total=100_000)
budget.allocate('system_prompt', 2_000)
budget.allocate('working_memory', 30_000)
budget.allocate('reference_docs', 40_000)
budget.allocate('conversation', 28_000)

Step 4: Create Handoff Document

# Generate handoff for next agent/session
handoff = create_handoff({
    'completed': ['task1', 'task2'],
    'current_state': working_memory,
    'next_steps': ['task3', 'task4'],
    'constraints': ['must_preserve_X', 'avoid_Y'],
    'context_summary': compressed_context
})

Core Patterns Overview

Pattern 1: Token Budget Management

Track token usage across all context components. Set alerts before limits. Enforce hard boundaries to prevent overflow.

Pattern 2: Context Compression (Lossless & Lossy)

Remove redundancy without information loss. Apply semantic compression when acceptable. Preserve critical details while reducing verbosity.

Pattern 3: Semantic Chunking

Split context along natural boundaries. Maintain semantic coherence within chunks. Enable selective retrieval of relevant sections.

Pattern 4: Progressive Summarization

Create multi-level summaries for different detail needs. Drill down when specifics required. Maintain high-level overview for navigation.

Pattern 5: Handoff Document Generation

Capture complete state for work continuation. Include decision history and rationale. Provide clear next steps and constraints.

Pattern 6: Context Window Optimization

Balance detail vs breadth based on task needs. Use RAG for large reference sets. Implement sliding windows for long conversations.

Detailed Documentation

Top 3 Gotchas

1. Information Loss During Compression

Problem: Aggressive compression removes critical details Solution: Always validate compressed output preserves key information. Use tiered compression with fallback to less aggressive methods.

2. Token Counting Mismatches

Problem: Different models use different tokenizers, counts vary Solution: Always use model-specific tokenizer. Add 10% safety margin to budgets.

3. Handoff Document Incompleteness

Problem: Missing context causes next agent/session to fail or repeat work Solution: Use structured handoff templates. Validate all required fields present.

Quick Reference Card

Token Budget Allocation Guide

System Prompt:      2-5% (2-5k tokens)
Working Memory:    20-30% (20-30k tokens)
Reference Docs:    30-50% (30-50k tokens)
Conversation:      20-40% (20-40k tokens)
Safety Buffer:     10% (10k tokens)

Compression Ratios by Technique

Whitespace removal:        5-10% reduction
Deduplication:           10-30% reduction
Reference extraction:     20-40% reduction
Semantic compression:     40-60% reduction
Aggressive summarization: 70-90% reduction

Context Window Sizes (2024)

GPT-4 Turbo:     128k tokens
Claude 3:        200k tokens
Gemini 1.5 Pro:  2M tokens
GPT-3.5:         16k tokens
Most OSS models: 4-32k tokens

Quick Diagnostic Commands

# Check current token usage
print(f"Tokens used: {count_tokens(context)}")

# Find redundancy opportunities
find_duplicates(context)

# Test compression ratio
test_compression(context, method='semantic')

# Validate handoff document
validate_handoff(handoff_doc)

When to Use This Skill

  • Managing projects with 50k+ tokens of context
  • Coordinating multi-agent workflows requiring state transfer
  • Optimizing costs for high-volume LLM usage
  • Debugging context overflow errors
  • Implementing long-running conversational agents
  • Creating checkpoint/resume capabilities
  • Building RAG systems with large document sets

Related Skills

  • reverse-engineering-toolkit - Analyze existing context usage patterns
  • work-forecasting-parallelization - Estimate token requirements for workflows
  • agent-builder-framework - Design agents with efficient context usage
  • workflow-builder-framework - Orchestrate context across workflow stages

Integration with Agents

Primary User: context-manager

The context-manager agent should be refactored to use this skill instead of embedding context engineering logic directly.

# Before: Embedded in agent
class ContextManager:
    def compress_context(self, text):
        # 500+ lines of compression logic...

# After: Delegate to skill
class ContextManager:
    def compress_context(self, text):
        return skill('context-engineering-framework').compress(text)

Other Agent Integrations

  • agent-orchestrator - Manage context budgets across agent pods
  • prompt-engineer - Optimize prompts for token efficiency
  • long-running-assistant - Handle session continuity
  • document-processor - Chunk and compress large documents

Best Practices

DO

  • ✅ Measure before optimizing - profile actual token usage
  • ✅ Preserve information architecture during compression
  • ✅ Test compression with real workloads
  • ✅ Version handoff document schemas
  • ✅ Monitor token usage in production

DON'T

  • ❌ Compress without validating information preservation
  • ❌ Assume token counts are consistent across models
  • ❌ Hard-code context window limits
  • ❌ Skip handoff validation before agent switches
  • ❌ Ignore compression impact on downstream tasks

Validation Checklist

Before using compressed context:

  • Key information preserved?
  • Semantic coherence maintained?
  • References still resolvable?
  • Compression ratio acceptable?
  • Handoff document complete?

Performance Benchmarks

Technique Compression Ratio Info Retention Speed
Whitespace removal 5-10% 100% <1ms
Deduplication 10-30% 100% <10ms
Reference extraction 20-40% 95% <50ms
Semantic compression 40-60% 85% <200ms
Aggressive summary 70-90% 60% <500ms

Getting Started

  1. Install dependencies:
pip install tiktoken langchain openai anthropic
  1. Run token analysis:
from context_engineering import analyze_context
report = analyze_context("your_context.txt")
print(report.summary())
  1. Apply compression:
from context_engineering import compress
compressed = compress(context, target_ratio=0.5)
  1. Create handoff:
from context_engineering import create_handoff
handoff = create_handoff(state, next_agent='reviewer')

Conclusion

Effective context engineering is the difference between LLM applications that scale and those that fail under load. This framework provides battle-tested patterns for managing context windows, reducing token costs, and ensuring work continuity across agent boundaries.

Master these patterns to build robust, cost-effective, and scalable LLM systems.


For implementation details, see PATTERNS.md. For theory and research, see KNOWLEDGE.md.