| name | context-engineering-framework |
| description | Optimize context windows, manage token budgets, compress context, and create effective handoff documents for long-running multi-agent workflows |
| allowed-tools | Read, Write, Edit, Grep, Glob, Bash, WebFetch |
Context Engineering Framework
Purpose
Context engineering is critical for managing the finite resource of LLM context windows. This skill provides systematic approaches to:
- Measure token usage and identify optimization opportunities
- Compress context without losing critical information
- Optimize token budget allocation across workflow stages
- Handoff work between agents with complete state preservation
Token costs scale linearly with context size - a 100k token context costs 10x more than 10k. Poor context management leads to:
- Failed completions when context overflows
- Degraded quality from truncated information
- Increased costs from redundant context
- Lost work when sessions can't resume properly
Quick Start: 4-Step Process
Step 1: Measure Current Context
# Count tokens in your context
from tiktoken import encoding_for_model
enc = encoding_for_model("gpt-4")
tokens = len(enc.encode(context_text))
print(f"Current context: {tokens:,} tokens")
Step 2: Compress Context
# Apply compression techniques
compressed = apply_compression(context_text, {
'remove_whitespace': True,
'deduplicate': True,
'summarize_verbose': True,
'extract_references': True
})
print(f"Compressed to: {len(enc.encode(compressed)):,} tokens")
Step 3: Optimize Budget Allocation
# Allocate tokens across workflow stages
budget = TokenBudget(total=100_000)
budget.allocate('system_prompt', 2_000)
budget.allocate('working_memory', 30_000)
budget.allocate('reference_docs', 40_000)
budget.allocate('conversation', 28_000)
Step 4: Create Handoff Document
# Generate handoff for next agent/session
handoff = create_handoff({
'completed': ['task1', 'task2'],
'current_state': working_memory,
'next_steps': ['task3', 'task4'],
'constraints': ['must_preserve_X', 'avoid_Y'],
'context_summary': compressed_context
})
Core Patterns Overview
Pattern 1: Token Budget Management
Track token usage across all context components. Set alerts before limits. Enforce hard boundaries to prevent overflow.
Pattern 2: Context Compression (Lossless & Lossy)
Remove redundancy without information loss. Apply semantic compression when acceptable. Preserve critical details while reducing verbosity.
Pattern 3: Semantic Chunking
Split context along natural boundaries. Maintain semantic coherence within chunks. Enable selective retrieval of relevant sections.
Pattern 4: Progressive Summarization
Create multi-level summaries for different detail needs. Drill down when specifics required. Maintain high-level overview for navigation.
Pattern 5: Handoff Document Generation
Capture complete state for work continuation. Include decision history and rationale. Provide clear next steps and constraints.
Pattern 6: Context Window Optimization
Balance detail vs breadth based on task needs. Use RAG for large reference sets. Implement sliding windows for long conversations.
Detailed Documentation
- PATTERNS.md - Full implementation details for all 6 patterns
- KNOWLEDGE.md - Theory, research, and advanced techniques
- EXAMPLES.md - Working code for common scenarios
- GOTCHAS.md - Common pitfalls and debugging strategies
- REFERENCE.md - API documentation and performance data
Top 3 Gotchas
1. Information Loss During Compression
Problem: Aggressive compression removes critical details Solution: Always validate compressed output preserves key information. Use tiered compression with fallback to less aggressive methods.
2. Token Counting Mismatches
Problem: Different models use different tokenizers, counts vary Solution: Always use model-specific tokenizer. Add 10% safety margin to budgets.
3. Handoff Document Incompleteness
Problem: Missing context causes next agent/session to fail or repeat work Solution: Use structured handoff templates. Validate all required fields present.
Quick Reference Card
Token Budget Allocation Guide
System Prompt: 2-5% (2-5k tokens)
Working Memory: 20-30% (20-30k tokens)
Reference Docs: 30-50% (30-50k tokens)
Conversation: 20-40% (20-40k tokens)
Safety Buffer: 10% (10k tokens)
Compression Ratios by Technique
Whitespace removal: 5-10% reduction
Deduplication: 10-30% reduction
Reference extraction: 20-40% reduction
Semantic compression: 40-60% reduction
Aggressive summarization: 70-90% reduction
Context Window Sizes (2024)
GPT-4 Turbo: 128k tokens
Claude 3: 200k tokens
Gemini 1.5 Pro: 2M tokens
GPT-3.5: 16k tokens
Most OSS models: 4-32k tokens
Quick Diagnostic Commands
# Check current token usage
print(f"Tokens used: {count_tokens(context)}")
# Find redundancy opportunities
find_duplicates(context)
# Test compression ratio
test_compression(context, method='semantic')
# Validate handoff document
validate_handoff(handoff_doc)
When to Use This Skill
- Managing projects with 50k+ tokens of context
- Coordinating multi-agent workflows requiring state transfer
- Optimizing costs for high-volume LLM usage
- Debugging context overflow errors
- Implementing long-running conversational agents
- Creating checkpoint/resume capabilities
- Building RAG systems with large document sets
Related Skills
reverse-engineering-toolkit- Analyze existing context usage patternswork-forecasting-parallelization- Estimate token requirements for workflowsagent-builder-framework- Design agents with efficient context usageworkflow-builder-framework- Orchestrate context across workflow stages
Integration with Agents
Primary User: context-manager
The context-manager agent should be refactored to use this skill instead of embedding context engineering logic directly.
# Before: Embedded in agent
class ContextManager:
def compress_context(self, text):
# 500+ lines of compression logic...
# After: Delegate to skill
class ContextManager:
def compress_context(self, text):
return skill('context-engineering-framework').compress(text)
Other Agent Integrations
agent-orchestrator- Manage context budgets across agent podsprompt-engineer- Optimize prompts for token efficiencylong-running-assistant- Handle session continuitydocument-processor- Chunk and compress large documents
Best Practices
DO
- ✅ Measure before optimizing - profile actual token usage
- ✅ Preserve information architecture during compression
- ✅ Test compression with real workloads
- ✅ Version handoff document schemas
- ✅ Monitor token usage in production
DON'T
- ❌ Compress without validating information preservation
- ❌ Assume token counts are consistent across models
- ❌ Hard-code context window limits
- ❌ Skip handoff validation before agent switches
- ❌ Ignore compression impact on downstream tasks
Validation Checklist
Before using compressed context:
- Key information preserved?
- Semantic coherence maintained?
- References still resolvable?
- Compression ratio acceptable?
- Handoff document complete?
Performance Benchmarks
| Technique | Compression Ratio | Info Retention | Speed |
|---|---|---|---|
| Whitespace removal | 5-10% | 100% | <1ms |
| Deduplication | 10-30% | 100% | <10ms |
| Reference extraction | 20-40% | 95% | <50ms |
| Semantic compression | 40-60% | 85% | <200ms |
| Aggressive summary | 70-90% | 60% | <500ms |
Getting Started
- Install dependencies:
pip install tiktoken langchain openai anthropic
- Run token analysis:
from context_engineering import analyze_context
report = analyze_context("your_context.txt")
print(report.summary())
- Apply compression:
from context_engineering import compress
compressed = compress(context, target_ratio=0.5)
- Create handoff:
from context_engineering import create_handoff
handoff = create_handoff(state, next_agent='reviewer')
Conclusion
Effective context engineering is the difference between LLM applications that scale and those that fail under load. This framework provides battle-tested patterns for managing context windows, reducing token costs, and ensuring work continuity across agent boundaries.
Master these patterns to build robust, cost-effective, and scalable LLM systems.
For implementation details, see PATTERNS.md. For theory and research, see KNOWLEDGE.md.