| name | context-packing-memory-management |
| description | Systematic context window optimization and cross-session memory management for long-running multi-agent tasks. Use when working on projects spanning multiple sessions, managing large codebases with 50+ files, conducting extended research, or coordinating context between multiple agents. Includes token allocation strategies, intelligent compaction procedures, persistent memory schemas, and agent handoff protocols. |
Context Packing & Memory Management
Overview
Context window management is the foundational constraint governing multi-agent system performance. With Claude Sonnet 4.5's 200,000-token context window, efficient utilization determines whether agents can maintain project continuity across sessions, coordinate effectively, and execute complex tasks without information loss.
This skill provides quantitative, automation-ready strategies for:
- Optimal token allocation across context types
- Systematic context compaction when approaching capacity
- Cross-session memory persistence with structured schemas
- Hierarchical information organization for rapid navigation
- Multi-agent context coordination and handoff protocols
Target outcome: Maintain 95%+ critical information retention while operating within token constraints across extended sessions.
Core Principles
- Token Budget Discipline: Treat context window as a scarce, shared resource with explicit allocation rules
- Progressive Disclosure: Load information just-in-time based on relevance scoring, not preemptively
- Compression Without Loss: Preserve architectural decisions and critical state while eliminating redundancy
- Hierarchical Navigation: Organize information in layers enabling quick jumps without full context traversal
- Persistent Memory: Extract and store session-independent knowledge for rehydration in future sessions
- Automation-First Design: Use threshold-based rules and scoring algorithms, not subjective judgment
- Multi-Agent Coordination: Explicit ownership and handoff protocols prevent context fragmentation
Context Window Optimization
Token Allocation Strategy
Claude Sonnet 4.5 Context Budget: 200,000 tokens
Allocate tokens according to this distribution:
| Context Type | Token Allocation | Percentage | Purpose |
|---|---|---|---|
| Knowledge Base & System Instructions | 60,000-80,000 | 30-40% | Skills, system prompts, core procedures |
| Active Task Context | 80,000-100,000 | 40-50% | Current files, recent outputs, working state |
| Session Memory | 20,000-30,000 | 10-15% | Architectural decisions, persistent state |
| Buffer/Overhead | 10,000-20,000 | 5-10% | Tool outputs, safety margin |
Rationale: Knowledge base is static and necessary. Active context is dynamic and scales with task complexity. Session memory grows slowly. Buffer prevents hard limits.
Implementation:
MAX_CONTEXT = 200000
KNOWLEDGE_BASE_MAX = 80000 # 40%
ACTIVE_CONTEXT_MAX = 100000 # 50%
SESSION_MEMORY_MAX = 30000 # 15%
BUFFER_MIN = 10000 # 5% minimum safety
Token Estimation Techniques
Using Claude API Tokenization (Recommended for accuracy):
# Anthropic API call for exact token count
import anthropic
def count_tokens(text: str) -> int:
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-sonnet-4-5-20250929",
messages=[{"role": "user", "content": text}]
)
return response.input_tokens
Cost-Benefit: API tokenization costs ~$0.0001 per call. For context management in long sessions (>4 hours), the marginal cost ($0.01-0.05 total) is justified by preventing context overflow errors that waste entire sessions.
Approximation Formulas (Use when API calls are impractical):
- Code files: 0.75 tokens/character (includes syntax, whitespace)
- Documentation: 0.65 tokens/character (prose is more compact)
- JSON/structured data: 0.85 tokens/character (brackets, quotes add overhead)
- Log files: 0.70 tokens/character (mixed content)
Validation: Test approximations against API counts for your specific content mix. Adjust formulas if error exceeds ±10%.
File Loading Prioritization
Relevance Scoring Algorithm:
Each file receives a score from 0-100 based on:
FILE_SCORE = (RELEVANCE_SCORE × 0.50) +
(RECENCY_SCORE × 0.30) +
(DEPENDENCY_SCORE × 0.20)
Relevance Score (0-50 points):
- Mentioned in current task description: +25
- Modified in last 5 operations: +15
- Contains unresolved issues/TODOs: +10
- Core architectural file (config, schema): +20
- Utility/helper file: +5
Recency Score (0-30 points):
- Modified in last hour: +30
- Modified in last 4 hours: +20
- Modified in last 24 hours: +10
- Modified in last week: +5
- Older: 0
Dependency Score (0-20 points):
- Direct dependency of active file: +20
- Second-degree dependency: +10
- Imports/references active file: +15
- No relationship: 0
Loading Strategy:
- Sort files by score (descending)
- Load files until reaching 70% of ACTIVE_CONTEXT_MAX
- Reserve remaining 30% for:
- Tool outputs (15%)
- Dynamic context expansion (10%)
- Safety buffer (5%)
Just-In-Time vs. Preloading Decision:
| Condition | Strategy | Rationale |
|---|---|---|
| Score ≥ 70 | Preload | High probability of need |
| Score 40-69 | Just-in-time | Moderate probability |
| Score < 40 | On-demand only | Low probability |
| File size > 10,000 tokens | Just-in-time | Large footprint |
| Total context > 80% | Just-in-time all | Capacity constraint |
Capacity Monitoring
Warning Thresholds:
| Level | Context Used | Action Required |
|---|---|---|
| Green | 0-79% (0-158k tokens) | Normal operation |
| Yellow | 80-89% (160k-178k) | Begin planning compaction |
| Orange | 90-94% (180k-188k) | Initiate compaction immediately |
| Red | 95-100% (190k-200k) | Emergency compaction + shed low-priority |
Monitoring Implementation:
def check_capacity_status(current_tokens: int) -> str:
usage_pct = (current_tokens / MAX_CONTEXT) * 100
if usage_pct < 80:
return "GREEN"
elif usage_pct < 90:
return "YELLOW: Plan compaction within 10 operations"
elif usage_pct < 95:
return "ORANGE: Compact now before next major operation"
else:
return "RED: Emergency compaction required"
Capacity Alert Responses:
- Yellow: Generate compaction plan, identify candidates for removal
- Orange: Execute compaction procedure (see below), defer non-critical file loads
- Red: Aggressive compaction, shed all files with score < 30, summarize verbose outputs
Intelligent Context Compaction
When to Compact
Automatic Triggers:
- Context usage reaches 80% (Yellow threshold)
- Planning session completion (natural break point)
- Before loading large file set (>20,000 tokens)
- Agent handoff initiation (clean context for recipient)
- Every 50 operations (proactive maintenance)
Manual Triggers:
- User requests context summary
- Performance degradation observed (slow responses)
- Before critical decision-making operations
Preservation Rules
MUST PRESERVE (Critical retention score: 100):
Architectural Decisions:
- Decision description with timestamp
- Rationale and alternatives considered
- Expected impact and validation criteria
- Author agent identifier (if multi-agent)
Example:
[2025-11-04T14:23:00Z] DECISION: Adopt microservices architecture Rationale: Enables independent team scaling, better fault isolation Alternatives: Monolith (rejected: scaling limits), Serverless (rejected: vendor lock-in) Impact: 3-month migration timeline, reduced coupling by 60% Validated: Service isolation tests pass, deployment time reduced 45% Agent: Architecture-Planner-001Active Bugs and Unresolved Issues:
- Issue ID, description, reproduction steps
- Impact severity (Critical/High/Medium/Low)
- Current investigation status
- Attempted fixes and results
Example:
BUG-2847 [CRITICAL]: Auth service timeout under load Repro: 100+ concurrent requests → 30% timeout rate Status: Root cause identified (connection pool exhaustion) Attempted: Increased pool size (no effect), Added retry logic (partial improvement) Next: Implement connection queueing with backpressureCritical Implementation Details:
- Non-obvious algorithm choices with rationale
- Performance-critical optimizations
- Security considerations and threat model assumptions
- Data integrity constraints
Example:
CRITICAL: User.email uses case-insensitive unique index Rationale: Prevent bob@example.com vs Bob@example.com duplicates Implementation: PostgreSQL LOWER(email) functional index Query pattern: WHERE LOWER(email) = LOWER($1)Recent File Modifications (Last 5 operations):
- File path, modification timestamp
- Change summary (1-2 sentences)
- Reason for change
- Related files impacted
Example:
[2025-11-04T15:47:00Z] Modified: src/auth/jwt_handler.py Changes: Added refresh token rotation, increased expiry to 7 days Reason: Support mobile offline mode per FEATURE-892 Impact: Affects src/api/auth_routes.py (refresh endpoint updated)Current Task State:
- Active task description and acceptance criteria
- Completion percentage (with substep breakdown)
- Next 3 planned actions with dependencies
- Blockers and resolution strategies
Example:
TASK: Implement user profile API endpoints Progress: 65% complete ✓ GET /profile (done) ✓ PUT /profile (done) ⧖ DELETE /profile (in progress - cascade logic remaining) ☐ PATCH /profile (not started) Next actions: 1. Complete delete cascade to related tables (blocks: finalize schema) 2. Implement PATCH with partial update support 3. Add rate limiting to all endpoints Blockers: DB migration approval needed from DBA team
Discard Rules
CAN SAFELY DISCARD (Retention score: 0-30):
Redundant Tool Outputs:
- Duplicate search results with same information
- Repeated file listings showing unchanged directories
- Multiple passes of same linting output
- Successful operation confirmations without actionable data
Deduplication algorithm:
def is_duplicate_output(new_output, existing_outputs): # Hash-based deduplication new_hash = hash(normalize(new_output)) for existing in existing_outputs: if hash(normalize(existing)) == new_hash: similarity = compute_similarity(new_output, existing) if similarity > 0.85: # 85% threshold return True return False def normalize(text): # Remove timestamps, IDs, non-semantic variations return re.sub(r'\d{4}-\d{2}-\d{2}T[\d:]+Z', '', text)Resolved Issues with Confirmed Fixes:
- Bugs marked "RESOLVED" with passing tests
- Completed tasks with acceptance criteria validated
- Questions answered with no follow-up needed
Retention criteria: Keep resolved issues for 10 operations, then discard if no re-mention
Exploratory Attempts That Didn't Lead Anywhere:
- Dead-end implementation approaches explicitly abandoned
- Failed experiments with documented negative results
- Prototype code replaced by production implementation
Preserve as lessons learned: Extract 1-sentence summary before discarding details
Verbose Debug Logs:
- Stack traces after issue is identified and fixed
- Verbose logging output when summary captures key points
- Intermediate computation steps when only result matters
Preserve: Error message and root cause (discard trace). Preserve: Summary statistics (discard raw logs).
Successful Operation Confirmations:
- "File saved successfully" (retain only file path + timestamp)
- "Tests passed" (retain only pass count, discard individual test output)
- "Build completed" (retain only artifact location, discard build logs)
Deduplication Strategies
Text-Based Deduplication:
- Exact match elimination: Hash-based identification of identical content
- Semantic clustering: Group similar outputs, keep most recent representative
- Incremental diff preservation: For similar file versions, store only deltas
Implementation:
from difflib import SequenceMatcher
def semantic_similarity(text1: str, text2: str) -> float:
"""Returns similarity score 0.0-1.0"""
return SequenceMatcher(None, text1, text2).ratio()
def deduplicate_outputs(outputs: list[str]) -> list[str]:
"""Returns deduplicated list, preserving most recent unique items"""
unique = []
seen_hashes = set()
for output in reversed(outputs): # Process newest first
output_hash = hash(output)
if output_hash in seen_hashes:
continue
# Check semantic similarity against existing unique items
is_duplicate = False
for unique_item in unique:
if semantic_similarity(output, unique_item) > 0.85:
is_duplicate = True
break
if not is_duplicate:
unique.append(output)
seen_hashes.add(output_hash)
return list(reversed(unique)) # Return in chronological order
Tool Output Consolidation:
Instead of:
[Search 1] Found 12 files matching "auth"
- auth_handler.py
- auth_routes.py
- ...
[Search 2] Found 12 files matching "auth"
- auth_handler.py
- auth_routes.py
- ...
Consolidate to:
[Searches 1-2] Found 12 files matching "auth" (checked 2x, unchanged)
- auth_handler.py
- auth_routes.py
- ...
Compaction Process
Step-by-step procedure:
Snapshot Current State (Safety first):
def create_pre_compaction_snapshot(): snapshot = { 'timestamp': current_time(), 'total_tokens': estimate_current_context_size(), 'file_list': list_loaded_files(), 'decision_count': count_architectural_decisions(), 'issue_count': count_unresolved_issues() } save_snapshot(snapshot) return snapshotScore All Context Elements:
- Apply preservation rules (score: 100)
- Apply discard rules (score: 0-30)
- Score remaining elements (score: 31-99) by:
- Recency (30 points)
- Reference count (20 points)
- Task relevance (30 points)
- Information density (20 points)
Calculate Token Savings Target:
current_usage = estimate_current_context_size() target_usage = MAX_CONTEXT * 0.65 # Target 65% after compaction tokens_to_remove = current_usage - target_usageRemove Low-Score Elements (Ascending score order):
removed_tokens = 0 sorted_elements = sort_by_score(context_elements) for element in sorted_elements: if element.score >= 31: # Never remove preserved content break if removed_tokens >= tokens_to_remove: break remove_from_context(element) removed_tokens += element.token_countDeduplicate Remaining Content:
- Apply deduplication algorithms to tool outputs
- Consolidate similar findings
- Merge redundant sections
Generate Compaction Summary:
summary = { 'tokens_before': pre_snapshot['total_tokens'], 'tokens_after': estimate_current_context_size(), 'tokens_saved': tokens_before - tokens_after, 'elements_removed': count_removed_elements(), 'preservation_validation': validate_critical_content_present() }Validate No Critical Loss:
- Check all architectural decisions still present
- Verify unresolved issues retained
- Confirm current task state intact
- Validate recent modifications preserved
Validation Checklist:
- All decisions with timestamp in last 7 days preserved
- All CRITICAL and HIGH severity issues preserved
- Last 5 file modifications with full context preserved
- Current task state with next actions preserved
- Agent ownership information preserved (if multi-agent)
- Context reduction achieved (target: 20-35% token savings)
- No preservation-scored (100) items removed
Compaction Examples
Example 1: Research Session Compaction
Before Compaction (185,000 tokens - 92.5% capacity):
[Web Search 1] "machine learning optimization" - 15 results (4,500 tokens)
[Web Fetch 1] Article: "Gradient Descent Variants" (12,000 tokens full text)
[Web Search 2] "machine learning optimization" - 15 results (4,500 tokens - DUPLICATE)
[Web Fetch 2] Article: "Adam Optimizer Explained" (8,000 tokens full text)
[Web Search 3] "neural network architectures" - 20 results (5,500 tokens)
[Web Fetch 3] Article: "CNN Architectures Review" (15,000 tokens full text)
[Analysis 1] "Compare optimization algorithms" (3,000 tokens)
[Analysis 2] "Evaluate CNN architectures" (2,500 tokens)
[Chat History] 45 exchanges of clarifying questions (22,000 tokens)
[System Prompts & Skills] (80,000 tokens)
[Working Notes] Architectural decision log (8,000 tokens)
After Compaction (128,000 tokens - 64% capacity, 31% savings):
[Web Search - Deduplicated] Combined results on ML optimization (4,500 tokens)
[Key Findings Summary]
- Gradient Descent: Vanilla approach, slow convergence
- Adam: Adaptive learning rate, fastest convergence (85% of cases)
- RMSprop: Good for RNNs, second choice
(Extracted from 20,000 tokens of full articles → 800 tokens summary)
[Architecture Evaluation]
- CNN: Best for image tasks (preserved from 15k token article)
- RNN: Sequential data (not needed for current task - discarded)
- Transformer: NLP focus (not relevant - discarded)
(2,000 tokens preserved from 15k)
[Analyses Consolidated] Merged overlapping sections (2,500 tokens)
[Chat History] Retained last 10 exchanges + key decisions (8,000 tokens)
[System Prompts & Skills] (80,000 tokens - unchanged)
[Working Notes] (8,000 tokens - unchanged, contains decisions)
Token savings: 57,000 tokens (31%)
Critical information retained: 100% (decisions, current task, key findings)
Example 2: Code Development Compaction
Before Compaction (178,000 tokens - 89% capacity):
[Files Loaded]
- src/main.py (5,000 tokens)
- src/auth.py (8,000 tokens)
- src/database.py (12,000 tokens)
- src/utils.py (3,000 tokens)
- tests/test_auth.py (6,000 tokens)
- tests/test_database.py (8,000 tokens)
- Old version: src/auth_old_backup.py (8,000 tokens - DISCARD)
- Experimental: src/auth_prototype_v2.py (7,000 tokens - DISCARD)
[Test Outputs]
- Test run 1: All passed (2,500 tokens of verbose output)
- Test run 2: All passed (2,500 tokens - DUPLICATE)
- Test run 3: 1 failure in auth (2,500 tokens)
- Test run 4: All passed after fix (2,500 tokens)
[Linting Results]
- Run 1: 23 issues found (3,000 tokens detailed output)
- Run 2: 15 issues remain (2,500 tokens - PARTIAL DUPLICATE)
- Run 3: All clear (500 tokens)
[Git Logs] Last 50 commits (15,000 tokens)
[Documentation] API reference loaded but unused (12,000 tokens)
[System Prompts & Skills] (70,000 tokens)
After Compaction (115,000 tokens - 57.5% capacity, 35% savings):
[Files Loaded - Current Versions Only]
- src/main.py (5,000 tokens)
- src/auth.py (8,000 tokens)
- src/database.py (12,000 tokens)
- src/utils.py (3,000 tokens)
- tests/test_auth.py (6,000 tokens) [Referenced in recent work]
Removed: Old backups, prototypes (15,000 tokens saved)
Deferred: test_database.py (not modified recently, load if needed)
[Test Summary]
Latest status: All tests passing (4 runs consolidated)
Critical: Test run 3 showed auth timeout bug → Fixed in run 4
(Preserved: Bug description + fix. Discarded: Verbose passing test logs)
(2,500 tokens preserved from 10,000 tokens)
[Code Quality]
Linting: Clean (23 initial issues resolved)
Critical fixes: SQL injection prevention added to database.py
(500 tokens preserved from 6,000 tokens)
[Recent Changes Summary]
Last 5 commits relevant to current task:
- Fixed auth timeout bug (critical, preserved)
- Added rate limiting (relevant, preserved)
- Updated dependencies (not critical, summarized)
(2,000 tokens preserved from 15,000 tokens)
[System Prompts & Skills] (70,000 tokens - unchanged)
Token savings: 63,000 tokens (35%)
Critical information retained: 100% (bug fix, architecture, current files)
Example 3: Multi-Agent Coordination Compaction
Before Agent Handoff (172,000 tokens - 86% capacity):
[Agent A - Research Phase Context]
- Market analysis (15,000 tokens)
- Competitor research (12,000 tokens)
- Technology evaluation (18,000 tokens)
- 30 web searches with full results (35,000 tokens)
- Working notes and intermediate analyses (20,000 tokens)
[Agent A - Decisions Made]
- Technology stack selected: React + Node.js
- Database: PostgreSQL (rationale: ACID + JSON support)
- Architecture: Microservices (rationale: team scaling)
(2,000 tokens)
[Shared Project Context]
- Requirements document (8,000 tokens)
- Project timeline (1,000 tokens)
- Team assignments (500 tokens)
[System Prompts & Skills] (60,000 tokens)
After Compaction for Handoff to Agent B - Implementation (98,000 tokens - 49% capacity):
[Handoff Package for Agent B]
- Key Decisions Summary:
• Tech Stack: React + Node.js (rationale: team expertise, ecosystem maturity)
• Database: PostgreSQL (rationale: ACID compliance, JSON support for flexibility)
• Architecture: Microservices (rationale: enables independent team scaling)
(800 tokens - extracted from 2,000 tokens)
- Critical Constraints:
• Must support 10k concurrent users (performance requirement)
• 99.9% uptime SLA (reliability requirement)
• GDPR compliance mandatory (legal requirement)
(300 tokens - extracted from research)
- Implementation Priorities:
1. Auth service (foundation for other services)
2. User profile service
3. Core business logic service
(200 tokens)
- Resources for Agent B:
• Requirements doc (8,000 tokens - full preservation)
• Technology decision rationale (800 tokens)
• Project timeline (1,000 tokens - full preservation)
(9,800 tokens total)
[Agent A Research - Archived]
Compressed summary stored in persistent memory:
"Evaluated 5 tech stacks, 3 databases, 2 architectures.
Final selections justified by: team capabilities (60% weight),
ecosystem maturity (25% weight), scalability (15% weight).
Detailed analysis available in session archive."
(Stored externally, not loaded unless explicitly needed)
[Shared Project Context] (9,500 tokens - preserved)
[System Prompts & Skills] (60,000 tokens - unchanged)
Token savings: 74,000 tokens (43%)
Agent B receives: Actionable decisions, critical constraints, clear priorities
Agent A work preserved: Archived in persistent memory for future reference
Cross-Session Memory Persistence
Memory Schema
Abstract storage format (implementation-agnostic):
class PersistentMemory:
"""
Base schema for persistent memory objects.
Implementation can be: JSON files, database, MCP memory server, etc.
"""
# Required fields
memory_id: str # Unique identifier (UUID)
memory_type: str # "decision" | "issue" | "state" | "learning"
timestamp: str # ISO 8601 format
project_id: str # Links memories to specific projects
agent_id: str # Agent that created this memory
# Content fields
title: str # Brief description (max 100 chars)
content: str # Full content (structured based on memory_type)
tags: list[str] # Searchable tags
# Metadata
importance: int # 1-10 scale (10 = critical)
expires_at: str | None # Optional expiration timestamp
parent_memory_id: str | None # Links related memories
# State management
is_archived: bool # False = active, True = archived
access_count: int # Number of times retrieved
last_accessed: str # Last retrieval timestamp
class DecisionMemory(PersistentMemory):
"""Architectural and significant decisions"""
content: {
'decision': str, # What was decided
'rationale': str, # Why this decision
'alternatives_considered': list[str], # What else was evaluated
'expected_impact': str, # Predicted outcomes
'validation_criteria': str, # How to verify success
'context': str # Situational factors
}
class IssueMemory(PersistentMemory):
"""Bugs, blockers, and unresolved problems"""
content: {
'description': str, # Issue description
'severity': str, # "critical" | "high" | "medium" | "low"
'reproduction_steps': str, # How to reproduce
'investigation_status': str, # Current understanding
'attempted_fixes': list[str], # What's been tried
'next_actions': str # Planned resolution steps
}
class StateMemory(PersistentMemory):
"""Project state and progress tracking"""
content: {
'task_description': str, # Current task
'completion_percentage': int, # 0-100
'substeps_completed': list[str],
'substeps_remaining': list[str],
'blockers': list[str],
'next_actions': list[str]
}
class LearningMemory(PersistentMemory):
"""Lessons learned and insights"""
content: {
'lesson': str, # What was learned
'context': str, # Situation where learned
'applicability': str, # When to apply this lesson
'evidence': str # Supporting data/results
}
Session Summary Generation
Trigger: End of session (explicit user end, or after 2 hours of inactivity)
Algorithm:
def generate_session_summary(session_context) -> dict:
"""
Extract critical information for cross-session persistence.
Target: 5,000-10,000 tokens for typical session.
"""
summary = {
'session_metadata': {
'session_id': generate_uuid(),
'start_time': session_context.start_time,
'end_time': current_time(),
'duration_minutes': calculate_duration(),
'agent_ids': session_context.active_agents,
'project_id': session_context.project_id
},
'decisions_made': extract_decisions(session_context),
# Returns list of DecisionMemory objects
# Criteria: Any explicit "DECISION:" markers, architecture changes,
# technology selections, design pattern adoptions
'issues_encountered': extract_issues(session_context),
# Returns list of IssueMemory objects
# Criteria: Unresolved bugs, blockers, questions pending answers
# Excludes: Resolved issues unless resolution method is noteworthy
'current_state': extract_state(session_context),
# Returns StateMemory object
# Captures: Task progress, file modifications, next planned actions
'learnings': extract_learnings(session_context),
# Returns list of LearningMemory objects
# Criteria: Failed approaches (save future time), unexpected behaviors,
# performance insights, useful patterns discovered
'file_modifications': {
'created': list_files_created(),
'modified': list_files_modified_with_summary(),
'deleted': list_files_deleted()
},
'metrics': {
'tokens_used': session_context.total_tokens,
'files_accessed': len(session_context.accessed_files),
'operations_performed': session_context.operation_count,
'compactions_executed': session_context.compaction_count
}
}
return summary
Extraction Criteria:
Decision Extraction:
- Scan for explicit decision markers: "DECISION:", "We will", "Chose X because"
- Identify architecture changes in code structure
- Detect technology/library selections
- Score importance: 8-10 (preserve forever), 5-7 (preserve 30 days), 1-4 (preserve 7 days)
Issue Extraction:
- Identify unresolved "BUG-" or "ISSUE-" markers
- Detect error messages not followed by resolution
- Find "TODO:" comments in code with context
- Capture blockers mentioned in task status
State Extraction:
- Parse task description and completion percentage
- List last 5 file modifications with change summaries
- Extract "Next actions:" or "Next steps:" sections
- Identify blockers and dependencies
Learning Extraction:
- Find "Lesson:" or "Note:" markers
- Detect failed approaches with analysis
- Identify performance improvements with measurements
- Capture useful patterns or anti-patterns discovered
Decision Logging Template
Usage: Apply this template whenever an architectural or significant decision is made.
## DECISION: [Brief decision title]
**Decision ID**: DEC-[YYYY-MM-DD]-[Sequential Number]
**Timestamp**: [ISO 8601 timestamp]
**Agent**: [Agent identifier]
**Project**: [Project identifier]
**Importance**: [1-10 score]
### Decision
[Clear statement of what was decided, 2-3 sentences max]
### Rationale
[Why this decision was made, addressing:]
- Primary factors driving the decision
- Key constraints or requirements satisfied
- Expected benefits
### Alternatives Considered
1. **[Alternative 1]**: [Why rejected/not selected]
2. **[Alternative 2]**: [Why rejected/not selected]
3. **[Alternative 3]**: [Why rejected/not selected]
### Expected Impact
[Predicted outcomes, including:]
- Performance implications (quantitative if possible)
- Development timeline effects
- Team/process changes required
- Technical debt or trade-offs accepted
### Validation Criteria
[How to verify this was the right decision:]
- Metric 1: [Measurable criterion]
- Metric 2: [Measurable criterion]
- Timeframe: [When to evaluate]
### Context
[Situational factors relevant to this decision:]
- Project phase, team composition, constraints, deadlines
- Assumptions made
- Related decisions (reference Decision IDs)
### Tags
[Searchable tags]: architecture, performance, security, etc.
Example:
## DECISION: Adopt PostgreSQL over MongoDB
**Decision ID**: DEC-2025-11-04-003
**Timestamp**: 2025-11-04T14:32:00Z
**Agent**: Database-Architect-002
**Project**: ecommerce-platform-v2
**Importance**: 9
### Decision
Selected PostgreSQL as the primary database for the e-commerce platform, replacing the initially proposed MongoDB solution.
### Rationale
- ACID compliance critical for order processing and payment transactions
- Complex relational queries needed for inventory management across warehouses
- JSONB support provides document-store flexibility where needed
- Team has 5 years PostgreSQL experience vs. 1 year MongoDB experience
### Alternatives Considered
1. **MongoDB**: Rejected due to lack of multi-document ACID transactions (required for order processing) and weaker consistency guarantees
2. **MySQL**: Rejected due to inferior JSON handling and less robust full-text search capabilities
3. **Hybrid (PostgreSQL + MongoDB)**: Rejected due to operational complexity and data synchronization challenges
### Expected Impact
- Development: Estimated 2 weeks faster due to team expertise
- Performance: 95th percentile query latency < 100ms (validated in prototype)
- Scalability: Vertical scaling to 10k transactions/sec before sharding needed
- Cost: $800/month for managed service (vs. $1,200/month for MongoDB Atlas equivalent tier)
### Validation Criteria
- Order processing maintains ACID guarantees under load testing (10k orders/hour)
- Complex inventory queries execute in < 200ms at 95th percentile
- Database operational overhead < 5 hours/week after 3 months
- Timeframe: Validate after 3 months in production
### Context
- Project in initial architecture phase, no existing database commitment
- Team composition: 3 developers with strong PostgreSQL background
- Requirement: Support 10k concurrent users at launch, scale to 50k within 12 months
- Compliance: GDPR and PCI-DSS requirements mandate ACID transactions
- Timeline: 6 months to production launch
### Tags
database, architecture, postgresql, acid-transactions, ecommerce
Memory Rehydration Process
Trigger: Starting a new session in an existing project
Procedure:
def rehydrate_memory_for_session(project_id: str, agent_id: str) -> dict:
"""
Load relevant persistent memories into new session context.
Target: 20,000-30,000 tokens for memory context.
"""
# Step 1: Load active state
active_state = load_memories(
project_id=project_id,
memory_type="state",
is_archived=False,
sort_by="timestamp",
limit=1 # Most recent state
)
# Step 2: Load recent decisions (importance-weighted)
decisions = load_memories(
project_id=project_id,
memory_type="decision",
is_archived=False,
importance_gte=7, # High importance only
timestamp_after=days_ago(30), # Last 30 days
sort_by="importance DESC, timestamp DESC",
limit=10
)
# Step 3: Load unresolved issues
issues = load_memories(
project_id=project_id,
memory_type="issue",
is_archived=False,
content_severity_in=["critical", "high"], # Critical/high only
sort_by="severity DESC, timestamp ASC", # Oldest critical issues first
limit=15
)
# Step 4: Load relevant learnings
learnings = load_memories(
project_id=project_id,
memory_type="learning",
is_archived=False,
timestamp_after=days_ago(60), # Last 60 days
sort_by="access_count DESC", # Most referenced first
limit=5
)
# Step 5: Update access metadata
for memory in (active_state + decisions + issues + learnings):
increment_access_count(memory.memory_id)
update_last_accessed(memory.memory_id, current_time())
# Step 6: Format for context loading
rehydrated_context = {
'project_summary': generate_project_summary(project_id),
'current_state': format_state_for_context(active_state),
'key_decisions': format_decisions_for_context(decisions),
'open_issues': format_issues_for_context(issues),
'applicable_learnings': format_learnings_for_context(learnings),
'session_continuity': {
'last_session_end': get_last_session_end_time(project_id),
'time_since_last_session': calculate_time_gap(),
'sessions_in_project': count_total_sessions(project_id)
}
}
return rehydrated_context
Formatting for Context:
def format_for_context(memories: list[PersistentMemory], max_tokens: int) -> str:
"""Convert memory objects to human-readable context text"""
output = []
token_count = 0
for memory in memories:
formatted = f"""
### {memory.title}
**ID**: {memory.memory_id} | **Importance**: {memory.importance}/10 | **Date**: {memory.timestamp}
{format_memory_content(memory)}
---
"""
tokens = estimate_tokens(formatted)
if token_count + tokens > max_tokens:
break
output.append(formatted)
token_count += tokens
return "\n".join(output)
Memory Search and Retrieval
Query Interface:
def search_memories(
project_id: str,
query: str = None, # Full-text search
memory_type: str = None, # Filter by type
tags: list[str] = None, # Tag-based filtering
importance_gte: int = None, # Minimum importance
timestamp_after: str = None, # Time-based filtering
timestamp_before: str = None,
is_archived: bool = False,
sort_by: str = "relevance DESC",
limit: int = 10
) -> list[PersistentMemory]:
"""
Flexible memory search supporting multiple query patterns.
"""
pass
# Usage examples:
# Find all high-importance decisions about authentication
auth_decisions = search_memories(
project_id="proj-123",
query="authentication security",
memory_type="decision",
importance_gte=7,
tags=["security", "authentication"]
)
# Find recent critical bugs
critical_bugs = search_memories(
project_id="proj-123",
memory_type="issue",
timestamp_after=days_ago(7),
sort_by="severity DESC, timestamp DESC"
)
# Find learnings related to performance
perf_learnings = search_memories(
project_id="proj-123",
memory_type="learning",
tags=["performance", "optimization"],
sort_by="access_count DESC"
)
Search Ranking Algorithm:
def calculate_relevance_score(memory: PersistentMemory, query: str) -> float:
"""
Score memories for relevance to search query.
Returns 0.0-1.0 score.
"""
score = 0.0
# Text similarity (40% weight)
title_similarity = text_similarity(query, memory.title)
content_similarity = text_similarity(query, str(memory.content))
score += (title_similarity * 0.15) + (content_similarity * 0.25)
# Importance (25% weight)
score += (memory.importance / 10) * 0.25
# Recency (20% weight)
days_old = (current_time() - memory.timestamp).days
recency_factor = 1 / (1 + days_old / 30) # Decay over 30 days
score += recency_factor * 0.20
# Access frequency (15% weight)
access_factor = min(memory.access_count / 10, 1.0) # Cap at 10 accesses
score += access_factor * 0.15
return score
Memory Lifecycle Management
Archival Rules:
| Memory Type | Archive Condition | Rationale |
|---|---|---|
| Decision | Importance ≤ 5 AND age > 90 days | Low-importance decisions lose relevance |
| Decision | Superseded by newer conflicting decision | Keep history but archive old decision |
| Issue | Status = "resolved" AND age > 30 days | Resolved issues rarely need review |
| State | Superseded by newer state | Only current state needs active loading |
| Learning | Access count = 0 AND age > 180 days | Unused learnings are not applicable |
Deletion Rules (Permanent removal):
| Memory Type | Delete Condition | Rationale |
|---|---|---|
| Any | Marked for deletion by user | User override |
| Decision | Importance ≤ 3 AND archived > 365 days | Very low importance + very old |
| Issue | Resolved AND archived > 180 days | Unlikely to recur after 6 months |
| State | Superseded AND age > 90 days | Historical state not needed |
| Learning | Access count = 0 AND age > 365 days | Not useful after 1 year of no use |
Lifecycle Automation:
def execute_memory_lifecycle_maintenance(project_id: str):
"""
Run periodically (daily) to manage memory lifecycle.
"""
# Archive candidates
archive_candidates = find_memories_matching(
project_id=project_id,
is_archived=False,
conditions=[
("memory_type='decision' AND importance<=5 AND age_days>90"),
("memory_type='issue' AND content->>'status'='resolved' AND age_days>30"),
("memory_type='learning' AND access_count=0 AND age_days>180")
]
)
for memory in archive_candidates:
archive_memory(memory.memory_id)
log_lifecycle_action("ARCHIVED", memory.memory_id, reason="Age and importance criteria")
# Delete candidates
delete_candidates = find_memories_matching(
project_id=project_id,
is_archived=True,
conditions=[
("memory_type='decision' AND importance<=3 AND archived_days>365"),
("memory_type='issue' AND archived_days>180"),
("memory_type='state' AND archived_days>90"),
("memory_type='learning' AND access_count=0 AND age_days>365")
]
)
for memory in delete_candidates:
delete_memory(memory.memory_id)
log_lifecycle_action("DELETED", memory.memory_id, reason="Retention period expired")
Hierarchical Information Organization
Organization Template
Four-Level Hierarchy:
Level 1: Executive Summary (Target: 500-1,000 tokens)
├─ Level 2: Section Summaries (Target: 2,000-4,000 tokens)
│ ├─ Level 3: Detailed Information (Target: 10,000-20,000 tokens)
│ │ └─ Level 4: Raw Code/Data (Variable: 20,000-80,000 tokens)
Level 1 - Executive Summary:
- Purpose: 30-second read for project orientation
- Contents:
- Project goal (1 sentence)
- Current status and completion percentage
- Top 3 priorities
- Critical blockers (if any)
- Key architectural decisions (links to Level 2)
# PROJECT EXECUTIVE SUMMARY
**Goal**: Build scalable e-commerce API supporting 50k concurrent users
**Status**: 45% complete - Authentication and user management done, payment processing in progress
**Priorities**:
1. Complete payment service integration [BLOCKED: pending PCI compliance review]
2. Implement order management service
3. Set up production infrastructure
**Architecture**: Microservices on Kubernetes, PostgreSQL database, Redis caching
[Details: §2.1-Architecture](#21-architecture)
**Last Updated**: 2025-11-04T15:30:00Z | **Agent**: Implementation-Lead-003
Level 2 - Section Summaries:
- Purpose: 2-minute read for context establishment
- Contents:
- Subsystem summaries (3-5 sentences each)
- Key decisions per subsystem with rationale
- Current progress and next actions
- Links to Level 3 detailed information
## 2.1 Architecture
**Overview**: Microservices architecture with 5 core services (Auth, User, Payment, Order, Inventory) deployed on Kubernetes. API Gateway handles routing and rate limiting. PostgreSQL for transactional data, Redis for caching and sessions.
**Key Decisions**:
- Microservices over monolith for independent scaling [DEC-2025-10-15-001]
- PostgreSQL over MongoDB for ACID compliance [DEC-2025-10-20-003]
- Kubernetes for orchestration supporting multi-region deployment [DEC-2025-10-22-005]
**Status**: Architecture validated through proof-of-concept. Auth and User services deployed to staging. Payment service 60% complete.
**Next Actions**: Complete payment service, deploy order service, implement circuit breakers between services.
[Detailed Architecture Docs: §3.1](#31-architecture-details)
Level 3 - Detailed Information:
- Purpose: Deep technical context for implementation
- Contents:
- Complete technical specifications
- Implementation details with code snippets
- Decision rationale with alternatives
- Known issues and workarounds
- References to Level 4 raw files
## 3.1 Architecture Details
### 3.1.1 Service Communication Pattern
Services communicate via synchronous REST APIs for critical paths and asynchronous message queues (RabbitMQ) for non-blocking operations.
**Synchronous Patterns** (Request-Response):
- Auth validation: Gateway → Auth Service
- User data retrieval: Any Service → User Service
- Payment processing: Order Service → Payment Service
**Asynchronous Patterns** (Event-Driven):
- Order confirmation: Order Service → Email Service (via queue)
- Inventory updates: Order Service → Inventory Service (via queue)
- Analytics events: All Services → Analytics Pipeline (via queue)
**Rationale**: Synchronous for operations requiring immediate response (user-facing). Asynchronous for fire-and-forget operations improving perceived performance.
**Implementation**: See [gateway/routing_config.yaml](§4.gateway.routing) for routing rules, [services/order/queue_handlers.py](§4.services.order.queue) for message handling.
### 3.1.2 Database Schema Design
[Detailed schema documentation with ER diagrams, constraints, indexing strategies]
[See full schema: §4.database.schema](#4-database-schema)
Level 4 - Raw Code/Data:
- Purpose: Source of truth for implementation
- Contents:
- Complete source files
- Configuration files
- Database schemas
- API specifications
- Test suites
# §4.services.auth.jwt_handler
# Complete implementation file loaded from: src/services/auth/jwt_handler.py
import jwt
from datetime import datetime, timedelta
from typing import Dict, Optional
class JWTHandler:
"""
JWT token generation and validation for authentication service.
Design decisions:
- 15-minute access token expiry (security vs. UX balance)
- 7-day refresh token expiry (mobile offline support)
- RS256 algorithm (asymmetric for multi-service verification)
"""
def __init__(self, private_key: str, public_key: str):
self.private_key = private_key
self.public_key = public_key
self.access_token_expiry = timedelta(minutes=15)
self.refresh_token_expiry = timedelta(days=7)
def generate_access_token(self, user_id: str, permissions: list[str]) -> str:
"""Generate short-lived access token with user permissions"""
payload = {
'user_id': user_id,
'permissions': permissions,
'token_type': 'access',
'exp': datetime.utcnow() + self.access_token_expiry,
'iat': datetime.utcnow()
}
return jwt.encode(payload, self.private_key, algorithm='RS256')
# ... [Additional 200 lines of implementation]
Navigation System
Section Markers:
Use consistent notation for cross-references:
§1.0 - Level 1 (Executive Summary)
§2.1 - Level 2 (Section Summary)
§3.1.2 - Level 3 (Detailed Information)
§4.file.path - Level 4 (Raw Files)
Quick Reference Index (Insert at top of context):
# NAVIGATION INDEX
## Executive Summary
[§1.0 - Project Overview](#10-project-overview) - Goal, status, priorities, blockers
## Section Summaries
[§2.1 - Architecture](#21-architecture) - System design, key decisions
[§2.2 - Authentication](#22-authentication) - Auth service implementation
[§2.3 - Payment](#23-payment) - Payment integration status [IN PROGRESS]
[§2.4 - Infrastructure](#24-infrastructure) - Deployment and operations
## Detailed Documentation
[§3.1 - Architecture Details](#31-architecture-details) - Deep technical specs
[§3.2 - Database Schema](#32-database-schema) - Tables, relationships, indexes
[§3.3 - API Specifications](#33-api-specifications) - Endpoint definitions
## Source Files (Level 4)
[§4.services.auth.*](#4-services-auth) - Authentication service code
[§4.services.payment.*](#4-services-payment) - Payment service code
[§4.database.migrations.*](#4-database-migrations) - DB migration scripts
[§4.tests.*](#4-tests) - Test suites
---
Navigation Shortcuts:
<!-- Quick jump to specific information -->
Need: JWT implementation → [§4.services.auth.jwt_handler](#jwt-handler)
Need: Decision rationale for PostgreSQL → [§2.1 Architecture - Key Decisions](#21-architecture)
Need: Current blockers → [§1.0 Executive Summary](#10-project-overview)
Need: Payment service status → [§2.3 Payment](#23-payment)
Summary Guidelines
Level 1 (Executive) Summary Rules:
- Maximum 1,000 tokens
- No technical jargon - understandable by non-technical stakeholders
- Focus on outcomes, not implementation details
- Always include: Goal, Status %, Top priorities, Blockers
- Update every major milestone (≥10% progress change)
Level 2 (Section) Summary Rules:
- 200-500 tokens per section
- Moderate technical detail - understandable by technical generalists
- Include: Overview, Key decisions with IDs, Current status, Next actions
- Link to Level 3 for details
- Update when section changes significantly
Level 3 (Detailed) Summary Rules:
- 1,000-3,000 tokens per detailed section
- Full technical depth - for implementation
- Include: Specifications, Rationale, Implementation notes, References to code
- Link to Level 4 raw files
- Update when implementation details change
Level 4 (Raw) Summary Rules:
- No summary - raw content only
- Include file metadata: path, last modified, size, purpose
- For large files (>5,000 tokens), provide section markers within file
- Keep synchronized with actual files
Information Scoping Rules
What belongs at each level:
| Information Type | Level 1 | Level 2 | Level 3 | Level 4 |
|---|---|---|---|---|
| Project goals | ✓ | ✓ | - | - |
| Current status | ✓ | ✓ | ✓ | - |
| Critical blockers | ✓ | ✓ | ✓ | - |
| Architectural decisions | Summary only | Title + rationale | Full analysis | - |
| Implementation details | - | Brief mention | ✓ | ✓ |
| Code snippets | - | - | Key excerpts | Full files |
| Configuration | - | - | Critical settings | Full configs |
| API endpoints | Count only | List | Specifications | Implementation |
| Database schema | Mention only | Table names | Full schema | Migration files |
| Test results | Pass/fail status | Summary | Detailed results | Raw logs |
Maintenance During Updates
Propagation Rules:
When updating information, propagate changes according to this matrix:
| Update Type | Update L4 | Update L3 | Update L2 | Update L1 |
|---|---|---|---|---|
| Code change | Always | If significant | If impacts status | If changes status % |
| Bug fix | Always | If notable | If critical bug | If was blocker |
| Config change | Always | If architecture | If impacts design | Rarely |
| New feature | Always | Always | Always | If major feature |
| Progress update | - | Status only | Status + % | Status + % |
| Decision made | Reference | Full details | Summary | Link only |
Update Procedure:
def update_hierarchical_documentation(change_type: str, scope: str, details: dict):
"""
Propagate documentation updates through hierarchy.
"""
# Always update Level 4 (raw files) first
update_level_4(details['file_path'], details['changes'])
# Determine propagation based on change type and scope
propagate_to_level_3 = should_propagate_to_level_3(change_type, scope)
propagate_to_level_2 = should_propagate_to_level_2(change_type, scope)
propagate_to_level_1 = should_propagate_to_level_1(change_type, scope)
if propagate_to_level_3:
update_level_3_section(
section=details['section'],
change_summary=details['technical_summary']
)
if propagate_to_level_2:
update_level_2_summary(
section=details['section'],
status_change=details['status_impact'],
next_actions_change=details['next_actions']
)
if propagate_to_level_1:
update_executive_summary(
status_pct_change=details['completion_delta'],
priority_change=details['priority_impact'],
blocker_change=details['blocker_status']
)
# Update navigation index if structure changed
if details.get('structure_change', False):
regenerate_navigation_index()
Consistency Checks:
Run these validations after updates:
def validate_hierarchy_consistency() -> list[str]:
"""Returns list of inconsistencies found"""
issues = []
# Check 1: All Level 3 references to Level 4 are valid
for ref in extract_level_4_references_from_level_3():
if not file_exists(ref.file_path):
issues.append(f"Broken L3→L4 reference: {ref}")
# Check 2: All Level 2 summaries have corresponding Level 3 details
for summary in get_level_2_summaries():
if not has_level_3_details(summary.section_id):
issues.append(f"L2 summary without L3 details: {summary.section_id}")
# Check 3: Status percentages consistent across levels
l1_status = get_level_1_status_percentage()
l2_aggregated = aggregate_level_2_status_percentages()
if abs(l1_status - l2_aggregated) > 5: # Allow 5% tolerance
issues.append(f"Status mismatch: L1={l1_status}%, L2 aggregate={l2_aggregated}%")
# Check 4: Decision IDs referenced exist in memory
for decision_ref in extract_all_decision_references():
if not memory_exists(decision_ref):
issues.append(f"Referenced decision not found: {decision_ref}")
return issues
Multi-Agent Context Coordination
Agent Ownership Model
Context Ownership Principles:
- Primary Owner: One agent has write access to a context section at a time
- Read-Only Access: Other agents can read but not modify owned sections
- Ownership Transfer: Explicit handoff protocol required to transfer ownership
- Shared Sections: Common reference materials (Level 1, Level 2 summaries) are read-only to all
Ownership Tracking:
class ContextOwnership:
"""Track which agent owns which context sections"""
section_id: str # e.g., "§2.3-payment", "§4.services.auth"
owner_agent_id: str # Current owner
ownership_type: str # "exclusive" | "shared-write" | "read-only"
acquired_at: str # When ownership acquired
expires_at: str | None # Optional expiration for auto-release
previous_owner: str | None # For audit trail
lock_reason: str # Why this section is owned
class ContextSection:
"""Represents a section of context with ownership metadata"""
section_id: str
level: int # 1-4 hierarchy level
content: str
ownership: ContextOwnership
last_modified_by: str
last_modified_at: str
modification_count: int
agents_with_read_access: list[str]
Agent Handoff Protocols
Handoff Trigger Conditions:
| Condition | Action | Example |
|---|---|---|
| Agent A completes assigned task | Automatic handoff to coordinator | Research agent → Implementation agent |
| Agent A encounters blocker outside expertise | Request handoff to specialist | Backend agent → Database agent for schema design |
| Agent A reaches token capacity | Compress and handoff to fresh agent | Long-running agent → Continuation agent |
| Scheduled rotation | Planned handoff at milestone | Phase 1 agent → Phase 2 agent |
| Agent A timeout/failure | Emergency handoff to recovery agent | Failed agent → Supervisor agent |
Handoff Protocol Procedure:
def execute_agent_handoff(
from_agent: str,
to_agent: str,
handoff_type: str, # "complete" | "specialist" | "continuation" | "emergency"
context_sections: list[str]
) -> dict:
"""
Execute structured handoff between agents.
Returns handoff package for recipient agent.
"""
# Step 1: Compact context for handoff
if handoff_type == "continuation":
# Heavy compaction for token refresh
target_reduction = 0.40 # 40% reduction
else:
# Light compaction to preserve relevant details
target_reduction = 0.20 # 20% reduction
compacted_context = compact_context_for_handoff(
sections=context_sections,
reduction_target=target_reduction
)
# Step 2: Generate handoff package
handoff_package = {
'handoff_metadata': {
'from_agent_id': from_agent,
'to_agent_id': to_agent,
'handoff_type': handoff_type,
'timestamp': current_time(),
'reason': get_handoff_reason(),
'context_token_count': estimate_tokens(compacted_context)
},
'agent_expertise_match': {
'required_skills': identify_required_skills(context_sections),
'to_agent_capabilities': get_agent_capabilities(to_agent),
'skill_match_score': calculate_skill_match(to_agent, context_sections)
},
'context_summary': {
'executive_summary': extract_level_1_summary(),
'critical_decisions': extract_decisions_from_context(importance_gte=7),
'open_issues': extract_unresolved_issues(),
'current_task_state': extract_current_state(),
'blocking_dependencies': identify_blockers()
},
'work_products': {
'completed': list_completed_items_by_agent(from_agent),
'in_progress': list_in_progress_items(from_agent),
'not_started': list_pending_items()
},
'ownership_transfers': [
ContextOwnership(
section_id=section,
owner_agent_id=to_agent,
ownership_type="exclusive",
acquired_at=current_time(),
previous_owner=from_agent,
lock_reason=f"Handoff from {from_agent}"
)
for section in context_sections
],
'next_actions': {
'immediate': extract_immediate_next_actions(),
'short_term': extract_short_term_goals(),
'success_criteria': extract_acceptance_criteria()
},
'agent_specific_notes': {
'from_agent_observations': collect_agent_observations(from_agent),
'suggested_approach': get_suggested_approach(from_agent),
'known_pitfalls': list_known_pitfalls_for_task(),
'useful_resources': list_helpful_resources()
},
'compacted_context': compacted_context
}
# Step 3: Record handoff in persistent memory
handoff_memory = create_handoff_memory(handoff_package)
store_memory(handoff_memory)
# Step 4: Update ownership records
for transfer in handoff_package['ownership_transfers']:
update_ownership_record(transfer)
# Step 5: Notify coordinator (if multi-agent orchestration)
notify_coordinator({
'event': 'agent_handoff',
'from': from_agent,
'to': to_agent,
'sections': context_sections,
'timestamp': current_time()
})
return handoff_package
Handoff Package Token Budget:
| Handoff Type | Target Token Budget | Rationale |
|---|---|---|
| Complete task | 15,000-25,000 | Full context transfer including learnings |
| Specialist consultation | 5,000-10,000 | Focused problem scope only |
| Continuation (token refresh) | 30,000-40,000 | Preserve max context for continuity |
| Emergency recovery | 10,000-15,000 | Critical state only, fast recovery |
Context Coordination Rules
Read Access Rules:
- Level 1 (Executive Summary): Always readable by all agents
- Level 2 (Section Summaries): Readable by all agents in project
- Level 3 (Detailed Info): Readable by agents with task relevance
- Level 4 (Raw Files): Readable only by owner + agents with explicit access
Write Access Rules:
- Exclusive Ownership: Only owner can modify owned sections
- Shared Write Sections: Multiple agents can write if designated "shared-write"
- Conflict Resolution: Last-write-wins with conflict detection
- Audit Trail: All modifications logged with agent ID and timestamp
Conflict Prevention:
def acquire_section_ownership(
agent_id: str,
section_id: str,
operation: str # "read" | "write"
) -> bool:
"""
Attempt to acquire ownership or access to a section.
Returns True if successful, False if denied.
"""
current_ownership = get_section_ownership(section_id)
# Read operations: Always allowed for Levels 1-2, check for 3-4
if operation == "read":
if section_id.startswith("§1") or section_id.startswith("§2"):
return True
return agent_id in get_section_read_access_list(section_id)
# Write operations: Check ownership
if operation == "write":
# No current owner - acquire ownership
if current_ownership is None:
set_section_ownership(section_id, agent_id, "exclusive")
return True
# Shared write section
if current_ownership.ownership_type == "shared-write":
return True
# Exclusive ownership by current agent
if current_ownership.owner_agent_id == agent_id:
return True
# Owned by another agent - denied
return False
Coordination State Synchronization:
For distributed multi-agent systems, maintain synchronization:
class CoordinationState:
"""Shared state for multi-agent coordination"""
project_id: str
active_agents: list[str]
section_ownership_map: dict[str, ContextOwnership]
agent_task_assignments: dict[str, list[str]]
global_blockers: list[str]
shared_resources: dict[str, any]
last_sync_timestamp: str
sync_version: int # Optimistic locking
def synchronize_coordination_state(agent_id: str) -> CoordinationState:
"""
Fetch latest coordination state and resolve any conflicts.
"""
local_state = get_local_coordination_state(agent_id)
remote_state = fetch_remote_coordination_state()
# Detect conflicts
if local_state.sync_version != remote_state.sync_version:
# Conflict: Resolve using strategy
resolved_state = resolve_coordination_conflict(
local_state,
remote_state,
resolution_strategy="remote-wins-on-ownership"
)
# Apply resolved state locally
apply_coordination_state(agent_id, resolved_state)
return resolved_state
return remote_state
Integration with Prompt Engineering (SKILL-003)
Context Organization in System Prompts
System Prompt Structure (Leveraging SKILL-003 principles):
# SYSTEM PROMPT - Agent ID: Implementation-Lead-003
## Role and Capabilities
[Agent role definition - 500 tokens]
## Project Context (Hierarchical)
[Level 1 Executive Summary - 1,000 tokens]
- Automatically included in every interaction
- Provides constant orientation
## Active Task Context
[Current task from Level 2 - 2,000 tokens]
- Dynamically updated based on current work
- Links to Level 3 details as needed
## Critical Knowledge
[Key decisions and constraints - 3,000 tokens]
- Architecture decisions with IDs
- Critical issues and blockers
- Must-follow constraints
## Available Resources
[Links to Level 3, Level 4 content]
- Load on-demand using section markers
- "For database schema details, see §3.2"
- "For JWT implementation, see §4.services.auth.jwt_handler"
## Success Criteria
[Acceptance criteria for current task - 1,000 tokens]
Total system prompt: ~7,500 tokens (3.75% of context window)
Dynamic Context Loading:
def construct_system_prompt_with_context(
agent_id: str,
project_id: str,
current_task: str
) -> str:
"""
Build system prompt with appropriate context for agent and task.
Target: 5,000-10,000 tokens for system prompt portion.
"""
# Core agent definition (static)
agent_definition = load_agent_definition(agent_id) # 500 tokens
# Level 1 summary (always included)
executive_summary = get_level_1_summary(project_id) # 1,000 tokens
# Task-relevant Level 2 sections
relevant_sections = identify_relevant_sections(current_task)
section_summaries = load_level_2_summaries(relevant_sections) # 2,000 tokens
# Critical decisions and constraints
critical_knowledge = load_critical_knowledge(
project_id=project_id,
importance_gte=8,
relevance_to_task=current_task
) # 3,000 tokens
# Acceptance criteria
success_criteria = extract_acceptance_criteria(current_task) # 1,000 tokens
# Resource links (Level 3, Level 4)
resource_index = generate_resource_index(relevant_sections) # 500 tokens
prompt = f"""
{agent_definition}
# PROJECT CONTEXT
{executive_summary}
# CURRENT FOCUS
{section_summaries}
# CRITICAL KNOWLEDGE
{critical_knowledge}
# SUCCESS CRITERIA
{success_criteria}
# AVAILABLE RESOURCES
{resource_index}
---
"""
return prompt
Dynamic State in User Messages
User Message Context Loading Strategy:
Instead of loading everything in system prompt, dynamically include in user messages:
def construct_user_message_with_context(
user_query: str,
required_context: list[str]
) -> str:
"""
Augment user query with just-in-time context.
"""
# Score and prioritize context items
scored_context = [
(ctx, calculate_relevance_score(ctx, user_query))
for ctx in required_context
]
# Sort by relevance and load until token budget
scored_context.sort(key=lambda x: x[1], reverse=True)
context_sections = []
token_count = estimate_tokens(user_query)
max_tokens = 50000 # Reserve 50k tokens for user message context
for context_item, score in scored_context:
if score < 0.3: # Relevance threshold
break
content = load_context_content(context_item)
content_tokens = estimate_tokens(content)
if token_count + content_tokens > max_tokens:
break
context_sections.append(content)
token_count += content_tokens
# Construct message
message = f"""
{user_query}
<relevant_context>
{''.join(context_sections)}
</relevant_context>
"""
return message
Token-Aware Prompt Design
Prompt Engineering Patterns for Context Management:
- Progressive Disclosure Pattern:
System Prompt: "You have access to detailed documentation via section markers (§).
When you need specific information, indicate which section you need, and it will be
loaded into context. Do not request all sections at once."
User Message: "Implement the payment service endpoint."
Agent Response: "I'll need the payment service specifications. Please load §3.3.2
Payment API Specifications."
[System loads §3.3.2 into next user message]
- Context Pruning Pattern:
System Prompt: "Periodically review your context and identify information that is
no longer needed. When you identify such information, explicitly state:
'PRUNE: [section_id] - [reason]' and it will be removed to free tokens."
Agent: "PRUNE: §4.services.user.old_implementation - Replaced by new version,
no longer needed for reference."
[System removes pruned section]
- Summary Elevation Pattern:
System Prompt: "When working with large files (>10,000 tokens), first generate
a 500-token summary and propose working with the summary. Only load full file
if summary is insufficient."
Agent: "I've analyzed §4.database.migration_001 (15,000 tokens). Here's a summary:
[500-token summary]. This summary should be sufficient for current task. Load full
file only if we need to modify the migration."
Best Practices Checklist
Context Window Optimization:
- Token allocation follows 40% knowledge / 50% active / 10% session / 5% buffer distribution
- File loading uses relevance scoring algorithm (50% relevance, 30% recency, 20% dependency)
- Capacity monitoring implemented with thresholds (80% yellow, 90% orange, 95% red)
- Just-in-time loading strategy for files scoring < 70
- Token estimation uses Claude API or validated approximation formulas
Context Compaction:
- Compaction triggered automatically at 80% capacity
- All preservation rules (score: 100) enforced - no critical data discarded
- Architectural decisions preserved with full context (decision, rationale, alternatives, impact)
- Unresolved issues (critical/high severity) retained with investigation status
- Last 5 file modifications preserved with change summaries
- Current task state includes completion %, substeps, next actions, blockers
- Deduplication applied to tool outputs (85% similarity threshold)
- Compaction achieves 20-35% token savings target
- Validation confirms no score-100 items removed
Cross-Session Memory:
- Session summary generated at session end capturing decisions, issues, state, learnings
- Decision logging uses complete template with all required fields
- Persistent memory schema implemented with all required fields (memory_id, type, timestamp, project_id, agent_id, content, importance)
- Memory rehydration loads: active state (most recent), high-importance decisions (last 30 days), critical/high issues (unresolved), relevant learnings (last 60 days)
- Memory search supports query, type filtering, tag filtering, importance filtering, time filtering
- Memory lifecycle automation runs daily (archive aged/low-importance, delete expired)
- Access count tracking implemented for usage-based retention
Hierarchical Organization:
- Four-level hierarchy implemented (Executive → Section → Detailed → Raw)
- Level 1 (Executive) ≤ 1,000 tokens with goal, status, priorities, blockers
- Level 2 (Section) 200-500 tokens per section with overview, decisions, status, next actions
- Level 3 (Detailed) 1,000-3,000 tokens per section with specs, rationale, implementation notes
- Level 4 (Raw) contains complete source files with metadata
- Navigation index provided with section markers (§) for all levels
- Section markers used consistently (§1.0, §2.1, §3.1.2, §4.file.path)
- Update propagation rules followed (L4 → L3 → L2 → L1 based on change significance)
- Consistency validation run after updates (references valid, status percentages aligned)
Multi-Agent Coordination:
- Context ownership tracking implemented per section
- Ownership types defined (exclusive | shared-write | read-only)
- Agent handoff protocol implemented with structured handoff package
- Handoff package includes: metadata, context summary, work products, ownership transfers, next actions, agent notes, compacted context
- Handoff compaction: 40% reduction for continuation, 20% for other types
- Read access rules enforced (L1/L2 readable by all, L3/L4 restricted)
- Write access rules enforced (exclusive ownership required for writes)
- Coordination state synchronized across agents with conflict resolution
Prompt Engineering Integration:
- System prompt structure follows SKILL-003 principles
- System prompt includes: role, L1 summary, active task, critical knowledge, resource links
- System prompt token budget: 5,000-10,000 tokens (2.5-5% of context window)
- Dynamic context loading in user messages based on relevance scoring
- Progressive disclosure pattern implemented (load details on-demand)
- Context pruning pattern enabled (explicit PRUNE statements)
- Summary elevation pattern used for large files (>10,000 tokens)
General:
- All quantitative thresholds explicitly defined (no vague guidance)
- All algorithms include implementation details
- All schemas include complete field specifications
- All procedures are step-by-step executable
- Automation-friendly rules (threshold-based, not subjective)
- Examples provided with actual token counts
- Integration with SKILL-003 clearly documented
Common Pitfalls to Avoid
Premature Loading: Loading all files at session start without relevance assessment
- Problem: Wastes 30-40% of context window on unused files
- Solution: Use file loading prioritization algorithm, load just-in-time for score < 70
No Capacity Monitoring: Ignoring context usage until hitting hard limit
- Problem: Emergency compaction loses information, disrupts workflow
- Solution: Implement monitoring with thresholds, compact proactively at 80%
Discarding Architectural Decisions: Removing decisions during compaction to save tokens
- Problem: Loss of rationale leads to contradictory future decisions
- Solution: Always preserve preservation-score 100 items, use validation checklist
Verbose Tool Output Retention: Keeping complete logs of successful operations
- Problem: Redundant confirmations consume 10-15% of context
- Solution: Summarize successful operations, keep only actionable data
No Cross-Session Memory: Starting each session from scratch
- Problem: Repeatedly re-analyzing same codebase, forgetting past decisions
- Solution: Implement session summary generation and memory rehydration
Flat Information Structure: Organizing all information at same detail level
- Problem: Cannot navigate quickly, must read entire context for any query
- Solution: Use 4-level hierarchy with navigation markers
Missing Decision Rationale: Recording "what" without "why"
- Problem: Future agents/sessions don't understand constraints behind decisions
- Solution: Use complete decision logging template with alternatives considered
Over-Aggressive Compaction: Targeting >50% token reduction
- Problem: Loses important context details, breaks continuity
- Solution: Target 20-35% reduction, focus on deduplication and discard rules
No Agent Handoff Protocol: Informal context transfer between agents
- Problem: Knowledge loss, duplicated work, contradictory approaches
- Solution: Use structured handoff package with ownership transfers
Static System Prompts: Loading all context in system prompt regardless of task
- Problem: Wastes tokens on irrelevant information
- Solution: Dynamic context loading based on current task relevance
Ignoring Token Costs: Using approximations when accuracy critical
- Problem: 10-15% estimation errors lead to context overflow
- Solution: Use Claude API tokenization for long sessions (justified marginal cost)
No Memory Lifecycle: Accumulating memories indefinitely
- Problem: Memory search becomes slow, outdated information pollutes results
- Solution: Implement archival and deletion rules with automated maintenance
Duplicate Information Across Levels: Repeating same details in L1, L2, L3
- Problem: Wastes tokens, creates update inconsistency
- Solution: Follow information scoping rules, link between levels instead of duplicating
Poor Section Marker Discipline: Inconsistent or missing navigation markers
- Problem: Cannot implement progressive disclosure, forced to load everything
- Solution: Use consistent §X.Y.Z notation, maintain navigation index
No Validation After Compaction: Trusting compaction didn't lose critical data
- Problem: Silently loses architectural decisions, unresolved issues
- Solution: Run validation checklist, verify preservation-scored items present
Token Budget Examples
Example 1: Small Feature Implementation (Single session, 2-4 hours)
Total Budget: 200,000 tokens
Allocation:
- System prompts & skills: 60,000 (30%)
• Agent definition: 5,000
• Prompt engineering skill: 8,000
• Context management skill: 12,000
• Language-specific skills: 15,000
• Other skills: 20,000
- Active context: 90,000 (45%)
• Level 1 executive summary: 1,000
• Level 2 section summaries: 5,000
• Level 3 relevant details: 10,000
• Level 4 active files (3-5 files): 35,000
• Tool outputs: 15,000
• Working notes: 10,000
• Task state: 2,000
• Recent modifications: 5,000
• Buffer: 7,000
- Session memory: 30,000 (15%)
• Architectural decisions: 8,000
• Unresolved issues: 5,000
• Critical implementation notes: 7,000
• Recent change history: 10,000
- Buffer/overhead: 20,000 (10%)
• Safety margin: 20,000
Compaction Strategy: Likely not needed for single session feature
Memory Persistence: Generate session summary at end (~5,000 tokens)
Example 2: Medium Complexity Project (Multi-session, 2-3 days, 8-12 hours total)
Session 1 Budget: 200,000 tokens
Initial Allocation:
- System prompts & skills: 70,000 (35%)
• Increased due to multi-session requirements
- Active context: 85,000 (42.5%)
• Level 1-2: 6,000
• Level 3: 15,000
• Level 4 files (10-15 files): 50,000
• Tool outputs: 14,000
- Session memory: 25,000 (12.5%)
• Decisions: 10,000
• Issues: 8,000
• State: 7,000
- Buffer: 20,000 (10%)
Compaction Timeline:
- Session 1, Hour 3: 80% capacity → Compact to 65% (-30,000 tokens)
- Session 1 end: Generate summary (8,000 tokens)
Session 2 Budget: 200,000 tokens
Rehydrated Allocation:
- System prompts & skills: 70,000 (35%)
- Rehydrated memory: 20,000 (10%)
• Session 1 summary: 8,000
• Persisted decisions: 7,000
• Unresolved issues: 5,000
- Active context: 90,000 (45%)
- Session memory: 20,000 (10%)
Example 3: Large Codebase Analysis (Research phase, multi-session, 1 week)
Session 1 (Initial exploration) Budget: 200,000 tokens
Allocation:
- System prompts & skills: 65,000 (32.5%)
- Active context: 95,000 (47.5%)
• Hierarchical navigation: 15,000
• File samples (20+ files): 60,000
• Web search results: 15,000
• Analysis notes: 5,000
- Session memory: 25,000 (12.5%)
- Buffer: 15,000 (7.5%)
Compaction Events:
- Hour 2: 85% → Compact web search deduplication (-12,000)
- Hour 4: 82% → Compact file samples, keep summaries (-25,000)
- Session end: Generate comprehensive summary (15,000 tokens)
Sessions 2-5 (Deep dive):
Each session:
- Rehydrate 25,000 tokens from previous sessions
- Compact every 3-4 hours
- Generate summary with learnings (10,000 tokens each)
Final Session (Synthesis):
Budget: 200,000 tokens
Rehydrated: 50,000 tokens (compressed from 5 sessions)
- Key decisions from all sessions: 15,000
- Critical findings: 20,000
- Architecture summary: 15,000
Active work: 120,000 tokens
- Synthesizing final report
- Creating architectural diagrams
- Documenting decisions
Example 4: Multi-Agent Development (Implementation phase, coordinated team)
Project Total: 5 agents × 200,000 = 1,000,000 tokens available
Shared Context (Replicated across all agents): 80,000 tokens
- Level 1 executive summary: 2,000
- Level 2 complete: 10,000
- Critical architectural decisions: 20,000
- System-wide constraints: 8,000
- Agent coordination state: 5,000
- Shared resources: 15,000
- Navigation index: 5,000
- Multi-agent protocols: 15,000
Per-Agent Allocation: 120,000 tokens individual context
- Agent-specific system prompts: 20,000
- Agent task context: 60,000
- Agent working memory: 25,000
- Agent buffer: 15,000
Agent Handoff Budget: 25,000 tokens per handoff
- Handoff metadata: 1,000
- Context summary: 8,000
- Work products: 10,000
- Next actions: 3,000
- Agent notes: 3,000
Coordination Overhead: 40,000 tokens
- Ownership tracking: 10,000
- Conflict resolution state: 10,000
- Global blockers: 5,000
- Agent task queue: 15,000
Total Effective Usage: 80,000 (shared) + (5 × 120,000) (agents) + 40,000 (coordination) = 720,000 tokens
Efficiency: 72% (remaining 28% is protocol overhead, acceptable for coordination)
Quick Reference
Context Allocation (200k tokens)
- 30-40%: Knowledge base & system instructions (60k-80k)
- 40-50%: Active task context (80k-100k)
- 10-15%: Session memory (20k-30k)
- 5-10%: Buffer/overhead (10k-20k)
Capacity Thresholds
- Green (0-79%): Normal operation
- Yellow (80-89%): Plan compaction within 10 operations
- Orange (90-94%): Compact immediately before next major operation
- Red (95-100%): Emergency compaction, shed low-priority content
File Loading Prioritization
SCORE = (RELEVANCE × 0.50) + (RECENCY × 0.30) + (DEPENDENCY × 0.20)
- Score ≥ 70: Preload
- Score 40-69: Just-in-time
- Score < 40: On-demand only
Compaction Preservation (Score: 100)
Must preserve:
- Architectural decisions (with timestamp, rationale, alternatives, impact)
- Active bugs and unresolved issues (Critical/High severity)
- Critical implementation details (security, performance, data integrity)
- Recent file modifications (last 5 operations)
- Current task state (completion %, substeps, next actions, blockers)
Compaction Discard (Score: 0-30)
Can safely discard:
- Redundant tool outputs (85%+ similarity)
- Resolved issues with confirmed fixes (30+ days old)
- Exploratory attempts explicitly abandoned
- Verbose debug logs when summary captures key points
- Successful operation confirmations (keep only summary)
Memory Schema Fields (Required)
memory_id: str # UUID
memory_type: str # "decision" | "issue" | "state" | "learning"
timestamp: str # ISO 8601
project_id: str
agent_id: str
title: str # Max 100 chars
content: dict # Type-specific structured content
tags: list[str]
importance: int # 1-10
Memory Lifecycle
Archive:
- Decisions: Importance ≤5 AND age >90 days
- Issues: Resolved AND age >30 days
- Learnings: Access count=0 AND age >180 days
Delete:
- Decisions: Importance ≤3 AND archived >365 days
- Issues: Resolved AND archived >180 days
- Learnings: Access count=0 AND age >365 days
Hierarchy Levels
- L1 Executive: ≤1,000 tokens - Goal, status, priorities, blockers
- L2 Section: 200-500 tokens/section - Overview, decisions, status, next actions
- L3 Detailed: 1,000-3,000 tokens/section - Specs, rationale, implementation
- L4 Raw: Variable - Complete source files
Navigation Markers
§1.0 - Level 1 (Executive)
§2.1 - Level 2 (Section)
§3.1.2 - Level 3 (Detailed)
§4.file.path - Level 4 (Raw)
Agent Handoff Token Budget
- Complete task: 15k-25k tokens
- Specialist: 5k-10k tokens
- Continuation: 30k-40k tokens
- Emergency: 10k-15k tokens
Handoff Compaction
- Continuation: 40% reduction
- Other types: 20% reduction
Context Ownership Types
- exclusive: Single agent write access
- shared-write: Multiple agents can write
- read-only: All agents can read, none can write
Token Estimation (Approximation)
- Code: 0.75 tokens/char
- Documentation: 0.65 tokens/char
- JSON/data: 0.85 tokens/char
- Logs: 0.70 tokens/char
For critical long sessions: Use Claude API tokenization (justified cost)
System Prompt Budget
Target: 5,000-10,000 tokens (2.5-5% of context window)
- Agent definition: 500
- L1 summary: 1,000
- Task context: 2,000
- Critical knowledge: 3,000
- Success criteria: 1,000
- Resource index: 500
Progressive Disclosure Patterns
- Load on-demand: Reference §markers, load when needed
- Prune explicitly: State "PRUNE: §X.Y - reason"
- Summarize first: 500-token summary before loading large files (>10k tokens)
Document Version: 1.0.0
Last Updated: 2025-11-04
Total Token Count: ~58,000 tokens
Integration: SKILL-003 (Prompt Engineering)