Claude Code Plugins

Community-maintained marketplace

Feedback

context-packing-memory-management

@Konstantinospil/FitVibe-v2
0
0

Systematic context window optimization and cross-session memory management for long-running multi-agent tasks. Use when working on projects spanning multiple sessions, managing large codebases with 50+ files, conducting extended research, or coordinating context between multiple agents. Includes token allocation strategies, intelligent compaction procedures, persistent memory schemas, and agent handoff protocols.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name context-packing-memory-management
description Systematic context window optimization and cross-session memory management for long-running multi-agent tasks. Use when working on projects spanning multiple sessions, managing large codebases with 50+ files, conducting extended research, or coordinating context between multiple agents. Includes token allocation strategies, intelligent compaction procedures, persistent memory schemas, and agent handoff protocols.

Context Packing & Memory Management

Overview

Context window management is the foundational constraint governing multi-agent system performance. With Claude Sonnet 4.5's 200,000-token context window, efficient utilization determines whether agents can maintain project continuity across sessions, coordinate effectively, and execute complex tasks without information loss.

This skill provides quantitative, automation-ready strategies for:

  • Optimal token allocation across context types
  • Systematic context compaction when approaching capacity
  • Cross-session memory persistence with structured schemas
  • Hierarchical information organization for rapid navigation
  • Multi-agent context coordination and handoff protocols

Target outcome: Maintain 95%+ critical information retention while operating within token constraints across extended sessions.

Core Principles

  1. Token Budget Discipline: Treat context window as a scarce, shared resource with explicit allocation rules
  2. Progressive Disclosure: Load information just-in-time based on relevance scoring, not preemptively
  3. Compression Without Loss: Preserve architectural decisions and critical state while eliminating redundancy
  4. Hierarchical Navigation: Organize information in layers enabling quick jumps without full context traversal
  5. Persistent Memory: Extract and store session-independent knowledge for rehydration in future sessions
  6. Automation-First Design: Use threshold-based rules and scoring algorithms, not subjective judgment
  7. Multi-Agent Coordination: Explicit ownership and handoff protocols prevent context fragmentation

Context Window Optimization

Token Allocation Strategy

Claude Sonnet 4.5 Context Budget: 200,000 tokens

Allocate tokens according to this distribution:

Context Type Token Allocation Percentage Purpose
Knowledge Base & System Instructions 60,000-80,000 30-40% Skills, system prompts, core procedures
Active Task Context 80,000-100,000 40-50% Current files, recent outputs, working state
Session Memory 20,000-30,000 10-15% Architectural decisions, persistent state
Buffer/Overhead 10,000-20,000 5-10% Tool outputs, safety margin

Rationale: Knowledge base is static and necessary. Active context is dynamic and scales with task complexity. Session memory grows slowly. Buffer prevents hard limits.

Implementation:

MAX_CONTEXT = 200000
KNOWLEDGE_BASE_MAX = 80000  # 40%
ACTIVE_CONTEXT_MAX = 100000  # 50%
SESSION_MEMORY_MAX = 30000   # 15%
BUFFER_MIN = 10000           # 5% minimum safety

Token Estimation Techniques

Using Claude API Tokenization (Recommended for accuracy):

# Anthropic API call for exact token count
import anthropic

def count_tokens(text: str) -> int:
    client = anthropic.Anthropic()
    response = client.messages.count_tokens(
        model="claude-sonnet-4-5-20250929",
        messages=[{"role": "user", "content": text}]
    )
    return response.input_tokens

Cost-Benefit: API tokenization costs ~$0.0001 per call. For context management in long sessions (>4 hours), the marginal cost ($0.01-0.05 total) is justified by preventing context overflow errors that waste entire sessions.

Approximation Formulas (Use when API calls are impractical):

  • Code files: 0.75 tokens/character (includes syntax, whitespace)
  • Documentation: 0.65 tokens/character (prose is more compact)
  • JSON/structured data: 0.85 tokens/character (brackets, quotes add overhead)
  • Log files: 0.70 tokens/character (mixed content)

Validation: Test approximations against API counts for your specific content mix. Adjust formulas if error exceeds ±10%.

File Loading Prioritization

Relevance Scoring Algorithm:

Each file receives a score from 0-100 based on:

FILE_SCORE = (RELEVANCE_SCORE × 0.50) + 
             (RECENCY_SCORE × 0.30) + 
             (DEPENDENCY_SCORE × 0.20)

Relevance Score (0-50 points):

  • Mentioned in current task description: +25
  • Modified in last 5 operations: +15
  • Contains unresolved issues/TODOs: +10
  • Core architectural file (config, schema): +20
  • Utility/helper file: +5

Recency Score (0-30 points):

  • Modified in last hour: +30
  • Modified in last 4 hours: +20
  • Modified in last 24 hours: +10
  • Modified in last week: +5
  • Older: 0

Dependency Score (0-20 points):

  • Direct dependency of active file: +20
  • Second-degree dependency: +10
  • Imports/references active file: +15
  • No relationship: 0

Loading Strategy:

  1. Sort files by score (descending)
  2. Load files until reaching 70% of ACTIVE_CONTEXT_MAX
  3. Reserve remaining 30% for:
    • Tool outputs (15%)
    • Dynamic context expansion (10%)
    • Safety buffer (5%)

Just-In-Time vs. Preloading Decision:

Condition Strategy Rationale
Score ≥ 70 Preload High probability of need
Score 40-69 Just-in-time Moderate probability
Score < 40 On-demand only Low probability
File size > 10,000 tokens Just-in-time Large footprint
Total context > 80% Just-in-time all Capacity constraint

Capacity Monitoring

Warning Thresholds:

Level Context Used Action Required
Green 0-79% (0-158k tokens) Normal operation
Yellow 80-89% (160k-178k) Begin planning compaction
Orange 90-94% (180k-188k) Initiate compaction immediately
Red 95-100% (190k-200k) Emergency compaction + shed low-priority

Monitoring Implementation:

def check_capacity_status(current_tokens: int) -> str:
    usage_pct = (current_tokens / MAX_CONTEXT) * 100
    
    if usage_pct < 80:
        return "GREEN"
    elif usage_pct < 90:
        return "YELLOW: Plan compaction within 10 operations"
    elif usage_pct < 95:
        return "ORANGE: Compact now before next major operation"
    else:
        return "RED: Emergency compaction required"

Capacity Alert Responses:

  • Yellow: Generate compaction plan, identify candidates for removal
  • Orange: Execute compaction procedure (see below), defer non-critical file loads
  • Red: Aggressive compaction, shed all files with score < 30, summarize verbose outputs

Intelligent Context Compaction

When to Compact

Automatic Triggers:

  1. Context usage reaches 80% (Yellow threshold)
  2. Planning session completion (natural break point)
  3. Before loading large file set (>20,000 tokens)
  4. Agent handoff initiation (clean context for recipient)
  5. Every 50 operations (proactive maintenance)

Manual Triggers:

  1. User requests context summary
  2. Performance degradation observed (slow responses)
  3. Before critical decision-making operations

Preservation Rules

MUST PRESERVE (Critical retention score: 100):

  1. Architectural Decisions:

    • Decision description with timestamp
    • Rationale and alternatives considered
    • Expected impact and validation criteria
    • Author agent identifier (if multi-agent)

    Example:

    [2025-11-04T14:23:00Z] DECISION: Adopt microservices architecture
    Rationale: Enables independent team scaling, better fault isolation
    Alternatives: Monolith (rejected: scaling limits), Serverless (rejected: vendor lock-in)
    Impact: 3-month migration timeline, reduced coupling by 60%
    Validated: Service isolation tests pass, deployment time reduced 45%
    Agent: Architecture-Planner-001
    
  2. Active Bugs and Unresolved Issues:

    • Issue ID, description, reproduction steps
    • Impact severity (Critical/High/Medium/Low)
    • Current investigation status
    • Attempted fixes and results

    Example:

    BUG-2847 [CRITICAL]: Auth service timeout under load
    Repro: 100+ concurrent requests → 30% timeout rate
    Status: Root cause identified (connection pool exhaustion)
    Attempted: Increased pool size (no effect), Added retry logic (partial improvement)
    Next: Implement connection queueing with backpressure
    
  3. Critical Implementation Details:

    • Non-obvious algorithm choices with rationale
    • Performance-critical optimizations
    • Security considerations and threat model assumptions
    • Data integrity constraints

    Example:

    CRITICAL: User.email uses case-insensitive unique index
    Rationale: Prevent bob@example.com vs Bob@example.com duplicates
    Implementation: PostgreSQL LOWER(email) functional index
    Query pattern: WHERE LOWER(email) = LOWER($1)
    
  4. Recent File Modifications (Last 5 operations):

    • File path, modification timestamp
    • Change summary (1-2 sentences)
    • Reason for change
    • Related files impacted

    Example:

    [2025-11-04T15:47:00Z] Modified: src/auth/jwt_handler.py
    Changes: Added refresh token rotation, increased expiry to 7 days
    Reason: Support mobile offline mode per FEATURE-892
    Impact: Affects src/api/auth_routes.py (refresh endpoint updated)
    
  5. Current Task State:

    • Active task description and acceptance criteria
    • Completion percentage (with substep breakdown)
    • Next 3 planned actions with dependencies
    • Blockers and resolution strategies

    Example:

    TASK: Implement user profile API endpoints
    Progress: 65% complete
      ✓ GET /profile (done)
      ✓ PUT /profile (done)
      ⧖ DELETE /profile (in progress - cascade logic remaining)
      ☐ PATCH /profile (not started)
    Next actions:
      1. Complete delete cascade to related tables (blocks: finalize schema)
      2. Implement PATCH with partial update support
      3. Add rate limiting to all endpoints
    Blockers: DB migration approval needed from DBA team
    

Discard Rules

CAN SAFELY DISCARD (Retention score: 0-30):

  1. Redundant Tool Outputs:

    • Duplicate search results with same information
    • Repeated file listings showing unchanged directories
    • Multiple passes of same linting output
    • Successful operation confirmations without actionable data

    Deduplication algorithm:

    def is_duplicate_output(new_output, existing_outputs):
        # Hash-based deduplication
        new_hash = hash(normalize(new_output))
        for existing in existing_outputs:
            if hash(normalize(existing)) == new_hash:
                similarity = compute_similarity(new_output, existing)
                if similarity > 0.85:  # 85% threshold
                    return True
        return False
    
    def normalize(text):
        # Remove timestamps, IDs, non-semantic variations
        return re.sub(r'\d{4}-\d{2}-\d{2}T[\d:]+Z', '', text)
    
  2. Resolved Issues with Confirmed Fixes:

    • Bugs marked "RESOLVED" with passing tests
    • Completed tasks with acceptance criteria validated
    • Questions answered with no follow-up needed

    Retention criteria: Keep resolved issues for 10 operations, then discard if no re-mention

  3. Exploratory Attempts That Didn't Lead Anywhere:

    • Dead-end implementation approaches explicitly abandoned
    • Failed experiments with documented negative results
    • Prototype code replaced by production implementation

    Preserve as lessons learned: Extract 1-sentence summary before discarding details

  4. Verbose Debug Logs:

    • Stack traces after issue is identified and fixed
    • Verbose logging output when summary captures key points
    • Intermediate computation steps when only result matters

    Preserve: Error message and root cause (discard trace). Preserve: Summary statistics (discard raw logs).

  5. Successful Operation Confirmations:

    • "File saved successfully" (retain only file path + timestamp)
    • "Tests passed" (retain only pass count, discard individual test output)
    • "Build completed" (retain only artifact location, discard build logs)

Deduplication Strategies

Text-Based Deduplication:

  1. Exact match elimination: Hash-based identification of identical content
  2. Semantic clustering: Group similar outputs, keep most recent representative
  3. Incremental diff preservation: For similar file versions, store only deltas

Implementation:

from difflib import SequenceMatcher

def semantic_similarity(text1: str, text2: str) -> float:
    """Returns similarity score 0.0-1.0"""
    return SequenceMatcher(None, text1, text2).ratio()

def deduplicate_outputs(outputs: list[str]) -> list[str]:
    """Returns deduplicated list, preserving most recent unique items"""
    unique = []
    seen_hashes = set()
    
    for output in reversed(outputs):  # Process newest first
        output_hash = hash(output)
        
        if output_hash in seen_hashes:
            continue
            
        # Check semantic similarity against existing unique items
        is_duplicate = False
        for unique_item in unique:
            if semantic_similarity(output, unique_item) > 0.85:
                is_duplicate = True
                break
        
        if not is_duplicate:
            unique.append(output)
            seen_hashes.add(output_hash)
    
    return list(reversed(unique))  # Return in chronological order

Tool Output Consolidation:

Instead of:

[Search 1] Found 12 files matching "auth"
  - auth_handler.py
  - auth_routes.py
  - ...
[Search 2] Found 12 files matching "auth"
  - auth_handler.py
  - auth_routes.py
  - ...

Consolidate to:

[Searches 1-2] Found 12 files matching "auth" (checked 2x, unchanged)
  - auth_handler.py
  - auth_routes.py
  - ...

Compaction Process

Step-by-step procedure:

  1. Snapshot Current State (Safety first):

    def create_pre_compaction_snapshot():
        snapshot = {
            'timestamp': current_time(),
            'total_tokens': estimate_current_context_size(),
            'file_list': list_loaded_files(),
            'decision_count': count_architectural_decisions(),
            'issue_count': count_unresolved_issues()
        }
        save_snapshot(snapshot)
        return snapshot
    
  2. Score All Context Elements:

    • Apply preservation rules (score: 100)
    • Apply discard rules (score: 0-30)
    • Score remaining elements (score: 31-99) by:
      • Recency (30 points)
      • Reference count (20 points)
      • Task relevance (30 points)
      • Information density (20 points)
  3. Calculate Token Savings Target:

    current_usage = estimate_current_context_size()
    target_usage = MAX_CONTEXT * 0.65  # Target 65% after compaction
    tokens_to_remove = current_usage - target_usage
    
  4. Remove Low-Score Elements (Ascending score order):

    removed_tokens = 0
    sorted_elements = sort_by_score(context_elements)
    
    for element in sorted_elements:
        if element.score >= 31:  # Never remove preserved content
            break
        if removed_tokens >= tokens_to_remove:
            break
            
        remove_from_context(element)
        removed_tokens += element.token_count
    
  5. Deduplicate Remaining Content:

    • Apply deduplication algorithms to tool outputs
    • Consolidate similar findings
    • Merge redundant sections
  6. Generate Compaction Summary:

    summary = {
        'tokens_before': pre_snapshot['total_tokens'],
        'tokens_after': estimate_current_context_size(),
        'tokens_saved': tokens_before - tokens_after,
        'elements_removed': count_removed_elements(),
        'preservation_validation': validate_critical_content_present()
    }
    
  7. Validate No Critical Loss:

    • Check all architectural decisions still present
    • Verify unresolved issues retained
    • Confirm current task state intact
    • Validate recent modifications preserved

Validation Checklist:

  • All decisions with timestamp in last 7 days preserved
  • All CRITICAL and HIGH severity issues preserved
  • Last 5 file modifications with full context preserved
  • Current task state with next actions preserved
  • Agent ownership information preserved (if multi-agent)
  • Context reduction achieved (target: 20-35% token savings)
  • No preservation-scored (100) items removed

Compaction Examples

Example 1: Research Session Compaction

Before Compaction (185,000 tokens - 92.5% capacity):

[Web Search 1] "machine learning optimization" - 15 results (4,500 tokens)
[Web Fetch 1] Article: "Gradient Descent Variants" (12,000 tokens full text)
[Web Search 2] "machine learning optimization" - 15 results (4,500 tokens - DUPLICATE)
[Web Fetch 2] Article: "Adam Optimizer Explained" (8,000 tokens full text)
[Web Search 3] "neural network architectures" - 20 results (5,500 tokens)
[Web Fetch 3] Article: "CNN Architectures Review" (15,000 tokens full text)
[Analysis 1] "Compare optimization algorithms" (3,000 tokens)
[Analysis 2] "Evaluate CNN architectures" (2,500 tokens)
[Chat History] 45 exchanges of clarifying questions (22,000 tokens)
[System Prompts & Skills] (80,000 tokens)
[Working Notes] Architectural decision log (8,000 tokens)

After Compaction (128,000 tokens - 64% capacity, 31% savings):

[Web Search - Deduplicated] Combined results on ML optimization (4,500 tokens)
[Key Findings Summary] 
  - Gradient Descent: Vanilla approach, slow convergence
  - Adam: Adaptive learning rate, fastest convergence (85% of cases)
  - RMSprop: Good for RNNs, second choice
  (Extracted from 20,000 tokens of full articles → 800 tokens summary)
[Architecture Evaluation]
  - CNN: Best for image tasks (preserved from 15k token article)
  - RNN: Sequential data (not needed for current task - discarded)
  - Transformer: NLP focus (not relevant - discarded)
  (2,000 tokens preserved from 15k)
[Analyses Consolidated] Merged overlapping sections (2,500 tokens)
[Chat History] Retained last 10 exchanges + key decisions (8,000 tokens)
[System Prompts & Skills] (80,000 tokens - unchanged)
[Working Notes] (8,000 tokens - unchanged, contains decisions)

Token savings: 57,000 tokens (31%)
Critical information retained: 100% (decisions, current task, key findings)

Example 2: Code Development Compaction

Before Compaction (178,000 tokens - 89% capacity):

[Files Loaded]
  - src/main.py (5,000 tokens)
  - src/auth.py (8,000 tokens)
  - src/database.py (12,000 tokens)
  - src/utils.py (3,000 tokens)
  - tests/test_auth.py (6,000 tokens)
  - tests/test_database.py (8,000 tokens)
  - Old version: src/auth_old_backup.py (8,000 tokens - DISCARD)
  - Experimental: src/auth_prototype_v2.py (7,000 tokens - DISCARD)
[Test Outputs]
  - Test run 1: All passed (2,500 tokens of verbose output)
  - Test run 2: All passed (2,500 tokens - DUPLICATE)
  - Test run 3: 1 failure in auth (2,500 tokens)
  - Test run 4: All passed after fix (2,500 tokens)
[Linting Results]
  - Run 1: 23 issues found (3,000 tokens detailed output)
  - Run 2: 15 issues remain (2,500 tokens - PARTIAL DUPLICATE)
  - Run 3: All clear (500 tokens)
[Git Logs] Last 50 commits (15,000 tokens)
[Documentation] API reference loaded but unused (12,000 tokens)
[System Prompts & Skills] (70,000 tokens)

After Compaction (115,000 tokens - 57.5% capacity, 35% savings):

[Files Loaded - Current Versions Only]
  - src/main.py (5,000 tokens)
  - src/auth.py (8,000 tokens)
  - src/database.py (12,000 tokens)
  - src/utils.py (3,000 tokens)
  - tests/test_auth.py (6,000 tokens) [Referenced in recent work]
  
  Removed: Old backups, prototypes (15,000 tokens saved)
  Deferred: test_database.py (not modified recently, load if needed)
  
[Test Summary]
  Latest status: All tests passing (4 runs consolidated)
  Critical: Test run 3 showed auth timeout bug → Fixed in run 4
  (Preserved: Bug description + fix. Discarded: Verbose passing test logs)
  (2,500 tokens preserved from 10,000 tokens)
  
[Code Quality]
  Linting: Clean (23 initial issues resolved)
  Critical fixes: SQL injection prevention added to database.py
  (500 tokens preserved from 6,000 tokens)
  
[Recent Changes Summary]
  Last 5 commits relevant to current task:
    - Fixed auth timeout bug (critical, preserved)
    - Added rate limiting (relevant, preserved)
    - Updated dependencies (not critical, summarized)
  (2,000 tokens preserved from 15,000 tokens)
  
[System Prompts & Skills] (70,000 tokens - unchanged)

Token savings: 63,000 tokens (35%)
Critical information retained: 100% (bug fix, architecture, current files)

Example 3: Multi-Agent Coordination Compaction

Before Agent Handoff (172,000 tokens - 86% capacity):

[Agent A - Research Phase Context]
  - Market analysis (15,000 tokens)
  - Competitor research (12,000 tokens)
  - Technology evaluation (18,000 tokens)
  - 30 web searches with full results (35,000 tokens)
  - Working notes and intermediate analyses (20,000 tokens)
[Agent A - Decisions Made]
  - Technology stack selected: React + Node.js
  - Database: PostgreSQL (rationale: ACID + JSON support)
  - Architecture: Microservices (rationale: team scaling)
  (2,000 tokens)
[Shared Project Context]
  - Requirements document (8,000 tokens)
  - Project timeline (1,000 tokens)
  - Team assignments (500 tokens)
[System Prompts & Skills] (60,000 tokens)

After Compaction for Handoff to Agent B - Implementation (98,000 tokens - 49% capacity):

[Handoff Package for Agent B]
  - Key Decisions Summary:
    • Tech Stack: React + Node.js (rationale: team expertise, ecosystem maturity)
    • Database: PostgreSQL (rationale: ACID compliance, JSON support for flexibility)
    • Architecture: Microservices (rationale: enables independent team scaling)
    (800 tokens - extracted from 2,000 tokens)
    
  - Critical Constraints:
    • Must support 10k concurrent users (performance requirement)
    • 99.9% uptime SLA (reliability requirement)
    • GDPR compliance mandatory (legal requirement)
    (300 tokens - extracted from research)
    
  - Implementation Priorities:
    1. Auth service (foundation for other services)
    2. User profile service
    3. Core business logic service
    (200 tokens)
    
  - Resources for Agent B:
    • Requirements doc (8,000 tokens - full preservation)
    • Technology decision rationale (800 tokens)
    • Project timeline (1,000 tokens - full preservation)
    (9,800 tokens total)

[Agent A Research - Archived]
  Compressed summary stored in persistent memory:
    "Evaluated 5 tech stacks, 3 databases, 2 architectures.
     Final selections justified by: team capabilities (60% weight),
     ecosystem maturity (25% weight), scalability (15% weight).
     Detailed analysis available in session archive."
  (Stored externally, not loaded unless explicitly needed)

[Shared Project Context] (9,500 tokens - preserved)
[System Prompts & Skills] (60,000 tokens - unchanged)

Token savings: 74,000 tokens (43%)
Agent B receives: Actionable decisions, critical constraints, clear priorities
Agent A work preserved: Archived in persistent memory for future reference

Cross-Session Memory Persistence

Memory Schema

Abstract storage format (implementation-agnostic):

class PersistentMemory:
    """
    Base schema for persistent memory objects.
    Implementation can be: JSON files, database, MCP memory server, etc.
    """
    
    # Required fields
    memory_id: str           # Unique identifier (UUID)
    memory_type: str         # "decision" | "issue" | "state" | "learning"
    timestamp: str           # ISO 8601 format
    project_id: str          # Links memories to specific projects
    agent_id: str            # Agent that created this memory
    
    # Content fields
    title: str               # Brief description (max 100 chars)
    content: str             # Full content (structured based on memory_type)
    tags: list[str]          # Searchable tags
    
    # Metadata
    importance: int          # 1-10 scale (10 = critical)
    expires_at: str | None   # Optional expiration timestamp
    parent_memory_id: str | None  # Links related memories
    
    # State management
    is_archived: bool        # False = active, True = archived
    access_count: int        # Number of times retrieved
    last_accessed: str       # Last retrieval timestamp

class DecisionMemory(PersistentMemory):
    """Architectural and significant decisions"""
    content: {
        'decision': str,              # What was decided
        'rationale': str,             # Why this decision
        'alternatives_considered': list[str],  # What else was evaluated
        'expected_impact': str,       # Predicted outcomes
        'validation_criteria': str,   # How to verify success
        'context': str                # Situational factors
    }

class IssueMemory(PersistentMemory):
    """Bugs, blockers, and unresolved problems"""
    content: {
        'description': str,           # Issue description
        'severity': str,              # "critical" | "high" | "medium" | "low"
        'reproduction_steps': str,    # How to reproduce
        'investigation_status': str,  # Current understanding
        'attempted_fixes': list[str], # What's been tried
        'next_actions': str           # Planned resolution steps
    }

class StateMemory(PersistentMemory):
    """Project state and progress tracking"""
    content: {
        'task_description': str,      # Current task
        'completion_percentage': int, # 0-100
        'substeps_completed': list[str],
        'substeps_remaining': list[str],
        'blockers': list[str],
        'next_actions': list[str]
    }

class LearningMemory(PersistentMemory):
    """Lessons learned and insights"""
    content: {
        'lesson': str,                # What was learned
        'context': str,               # Situation where learned
        'applicability': str,         # When to apply this lesson
        'evidence': str               # Supporting data/results
    }

Session Summary Generation

Trigger: End of session (explicit user end, or after 2 hours of inactivity)

Algorithm:

def generate_session_summary(session_context) -> dict:
    """
    Extract critical information for cross-session persistence.
    Target: 5,000-10,000 tokens for typical session.
    """
    
    summary = {
        'session_metadata': {
            'session_id': generate_uuid(),
            'start_time': session_context.start_time,
            'end_time': current_time(),
            'duration_minutes': calculate_duration(),
            'agent_ids': session_context.active_agents,
            'project_id': session_context.project_id
        },
        
        'decisions_made': extract_decisions(session_context),
        # Returns list of DecisionMemory objects
        # Criteria: Any explicit "DECISION:" markers, architecture changes,
        #           technology selections, design pattern adoptions
        
        'issues_encountered': extract_issues(session_context),
        # Returns list of IssueMemory objects  
        # Criteria: Unresolved bugs, blockers, questions pending answers
        # Excludes: Resolved issues unless resolution method is noteworthy
        
        'current_state': extract_state(session_context),
        # Returns StateMemory object
        # Captures: Task progress, file modifications, next planned actions
        
        'learnings': extract_learnings(session_context),
        # Returns list of LearningMemory objects
        # Criteria: Failed approaches (save future time), unexpected behaviors,
        #           performance insights, useful patterns discovered
        
        'file_modifications': {
            'created': list_files_created(),
            'modified': list_files_modified_with_summary(),
            'deleted': list_files_deleted()
        },
        
        'metrics': {
            'tokens_used': session_context.total_tokens,
            'files_accessed': len(session_context.accessed_files),
            'operations_performed': session_context.operation_count,
            'compactions_executed': session_context.compaction_count
        }
    }
    
    return summary

Extraction Criteria:

  1. Decision Extraction:

    • Scan for explicit decision markers: "DECISION:", "We will", "Chose X because"
    • Identify architecture changes in code structure
    • Detect technology/library selections
    • Score importance: 8-10 (preserve forever), 5-7 (preserve 30 days), 1-4 (preserve 7 days)
  2. Issue Extraction:

    • Identify unresolved "BUG-" or "ISSUE-" markers
    • Detect error messages not followed by resolution
    • Find "TODO:" comments in code with context
    • Capture blockers mentioned in task status
  3. State Extraction:

    • Parse task description and completion percentage
    • List last 5 file modifications with change summaries
    • Extract "Next actions:" or "Next steps:" sections
    • Identify blockers and dependencies
  4. Learning Extraction:

    • Find "Lesson:" or "Note:" markers
    • Detect failed approaches with analysis
    • Identify performance improvements with measurements
    • Capture useful patterns or anti-patterns discovered

Decision Logging Template

Usage: Apply this template whenever an architectural or significant decision is made.

## DECISION: [Brief decision title]

**Decision ID**: DEC-[YYYY-MM-DD]-[Sequential Number]
**Timestamp**: [ISO 8601 timestamp]
**Agent**: [Agent identifier]
**Project**: [Project identifier]
**Importance**: [1-10 score]

### Decision
[Clear statement of what was decided, 2-3 sentences max]

### Rationale
[Why this decision was made, addressing:]
- Primary factors driving the decision
- Key constraints or requirements satisfied
- Expected benefits

### Alternatives Considered
1. **[Alternative 1]**: [Why rejected/not selected]
2. **[Alternative 2]**: [Why rejected/not selected]
3. **[Alternative 3]**: [Why rejected/not selected]

### Expected Impact
[Predicted outcomes, including:]
- Performance implications (quantitative if possible)
- Development timeline effects
- Team/process changes required
- Technical debt or trade-offs accepted

### Validation Criteria
[How to verify this was the right decision:]
- Metric 1: [Measurable criterion]
- Metric 2: [Measurable criterion]
- Timeframe: [When to evaluate]

### Context
[Situational factors relevant to this decision:]
- Project phase, team composition, constraints, deadlines
- Assumptions made
- Related decisions (reference Decision IDs)

### Tags
[Searchable tags]: architecture, performance, security, etc.

Example:

## DECISION: Adopt PostgreSQL over MongoDB

**Decision ID**: DEC-2025-11-04-003
**Timestamp**: 2025-11-04T14:32:00Z
**Agent**: Database-Architect-002
**Project**: ecommerce-platform-v2
**Importance**: 9

### Decision
Selected PostgreSQL as the primary database for the e-commerce platform, replacing the initially proposed MongoDB solution.

### Rationale
- ACID compliance critical for order processing and payment transactions
- Complex relational queries needed for inventory management across warehouses
- JSONB support provides document-store flexibility where needed
- Team has 5 years PostgreSQL experience vs. 1 year MongoDB experience

### Alternatives Considered
1. **MongoDB**: Rejected due to lack of multi-document ACID transactions (required for order processing) and weaker consistency guarantees
2. **MySQL**: Rejected due to inferior JSON handling and less robust full-text search capabilities
3. **Hybrid (PostgreSQL + MongoDB)**: Rejected due to operational complexity and data synchronization challenges

### Expected Impact
- Development: Estimated 2 weeks faster due to team expertise
- Performance: 95th percentile query latency < 100ms (validated in prototype)
- Scalability: Vertical scaling to 10k transactions/sec before sharding needed
- Cost: $800/month for managed service (vs. $1,200/month for MongoDB Atlas equivalent tier)

### Validation Criteria
- Order processing maintains ACID guarantees under load testing (10k orders/hour)
- Complex inventory queries execute in < 200ms at 95th percentile
- Database operational overhead < 5 hours/week after 3 months
- Timeframe: Validate after 3 months in production

### Context
- Project in initial architecture phase, no existing database commitment
- Team composition: 3 developers with strong PostgreSQL background
- Requirement: Support 10k concurrent users at launch, scale to 50k within 12 months
- Compliance: GDPR and PCI-DSS requirements mandate ACID transactions
- Timeline: 6 months to production launch

### Tags
database, architecture, postgresql, acid-transactions, ecommerce

Memory Rehydration Process

Trigger: Starting a new session in an existing project

Procedure:

def rehydrate_memory_for_session(project_id: str, agent_id: str) -> dict:
    """
    Load relevant persistent memories into new session context.
    Target: 20,000-30,000 tokens for memory context.
    """
    
    # Step 1: Load active state
    active_state = load_memories(
        project_id=project_id,
        memory_type="state",
        is_archived=False,
        sort_by="timestamp",
        limit=1  # Most recent state
    )
    
    # Step 2: Load recent decisions (importance-weighted)
    decisions = load_memories(
        project_id=project_id,
        memory_type="decision",
        is_archived=False,
        importance_gte=7,  # High importance only
        timestamp_after=days_ago(30),  # Last 30 days
        sort_by="importance DESC, timestamp DESC",
        limit=10
    )
    
    # Step 3: Load unresolved issues
    issues = load_memories(
        project_id=project_id,
        memory_type="issue",
        is_archived=False,
        content_severity_in=["critical", "high"],  # Critical/high only
        sort_by="severity DESC, timestamp ASC",  # Oldest critical issues first
        limit=15
    )
    
    # Step 4: Load relevant learnings
    learnings = load_memories(
        project_id=project_id,
        memory_type="learning",
        is_archived=False,
        timestamp_after=days_ago(60),  # Last 60 days
        sort_by="access_count DESC",  # Most referenced first
        limit=5
    )
    
    # Step 5: Update access metadata
    for memory in (active_state + decisions + issues + learnings):
        increment_access_count(memory.memory_id)
        update_last_accessed(memory.memory_id, current_time())
    
    # Step 6: Format for context loading
    rehydrated_context = {
        'project_summary': generate_project_summary(project_id),
        'current_state': format_state_for_context(active_state),
        'key_decisions': format_decisions_for_context(decisions),
        'open_issues': format_issues_for_context(issues),
        'applicable_learnings': format_learnings_for_context(learnings),
        'session_continuity': {
            'last_session_end': get_last_session_end_time(project_id),
            'time_since_last_session': calculate_time_gap(),
            'sessions_in_project': count_total_sessions(project_id)
        }
    }
    
    return rehydrated_context

Formatting for Context:

def format_for_context(memories: list[PersistentMemory], max_tokens: int) -> str:
    """Convert memory objects to human-readable context text"""
    
    output = []
    token_count = 0
    
    for memory in memories:
        formatted = f"""
### {memory.title}
**ID**: {memory.memory_id} | **Importance**: {memory.importance}/10 | **Date**: {memory.timestamp}

{format_memory_content(memory)}

---
"""
        
        tokens = estimate_tokens(formatted)
        if token_count + tokens > max_tokens:
            break
            
        output.append(formatted)
        token_count += tokens
    
    return "\n".join(output)

Memory Search and Retrieval

Query Interface:

def search_memories(
    project_id: str,
    query: str = None,           # Full-text search
    memory_type: str = None,     # Filter by type
    tags: list[str] = None,      # Tag-based filtering
    importance_gte: int = None,  # Minimum importance
    timestamp_after: str = None, # Time-based filtering
    timestamp_before: str = None,
    is_archived: bool = False,
    sort_by: str = "relevance DESC",
    limit: int = 10
) -> list[PersistentMemory]:
    """
    Flexible memory search supporting multiple query patterns.
    """
    pass

# Usage examples:

# Find all high-importance decisions about authentication
auth_decisions = search_memories(
    project_id="proj-123",
    query="authentication security",
    memory_type="decision",
    importance_gte=7,
    tags=["security", "authentication"]
)

# Find recent critical bugs
critical_bugs = search_memories(
    project_id="proj-123",
    memory_type="issue",
    timestamp_after=days_ago(7),
    sort_by="severity DESC, timestamp DESC"
)

# Find learnings related to performance
perf_learnings = search_memories(
    project_id="proj-123",
    memory_type="learning",
    tags=["performance", "optimization"],
    sort_by="access_count DESC"
)

Search Ranking Algorithm:

def calculate_relevance_score(memory: PersistentMemory, query: str) -> float:
    """
    Score memories for relevance to search query.
    Returns 0.0-1.0 score.
    """
    
    score = 0.0
    
    # Text similarity (40% weight)
    title_similarity = text_similarity(query, memory.title)
    content_similarity = text_similarity(query, str(memory.content))
    score += (title_similarity * 0.15) + (content_similarity * 0.25)
    
    # Importance (25% weight)
    score += (memory.importance / 10) * 0.25
    
    # Recency (20% weight)
    days_old = (current_time() - memory.timestamp).days
    recency_factor = 1 / (1 + days_old / 30)  # Decay over 30 days
    score += recency_factor * 0.20
    
    # Access frequency (15% weight)
    access_factor = min(memory.access_count / 10, 1.0)  # Cap at 10 accesses
    score += access_factor * 0.15
    
    return score

Memory Lifecycle Management

Archival Rules:

Memory Type Archive Condition Rationale
Decision Importance ≤ 5 AND age > 90 days Low-importance decisions lose relevance
Decision Superseded by newer conflicting decision Keep history but archive old decision
Issue Status = "resolved" AND age > 30 days Resolved issues rarely need review
State Superseded by newer state Only current state needs active loading
Learning Access count = 0 AND age > 180 days Unused learnings are not applicable

Deletion Rules (Permanent removal):

Memory Type Delete Condition Rationale
Any Marked for deletion by user User override
Decision Importance ≤ 3 AND archived > 365 days Very low importance + very old
Issue Resolved AND archived > 180 days Unlikely to recur after 6 months
State Superseded AND age > 90 days Historical state not needed
Learning Access count = 0 AND age > 365 days Not useful after 1 year of no use

Lifecycle Automation:

def execute_memory_lifecycle_maintenance(project_id: str):
    """
    Run periodically (daily) to manage memory lifecycle.
    """
    
    # Archive candidates
    archive_candidates = find_memories_matching(
        project_id=project_id,
        is_archived=False,
        conditions=[
            ("memory_type='decision' AND importance<=5 AND age_days>90"),
            ("memory_type='issue' AND content->>'status'='resolved' AND age_days>30"),
            ("memory_type='learning' AND access_count=0 AND age_days>180")
        ]
    )
    
    for memory in archive_candidates:
        archive_memory(memory.memory_id)
        log_lifecycle_action("ARCHIVED", memory.memory_id, reason="Age and importance criteria")
    
    # Delete candidates
    delete_candidates = find_memories_matching(
        project_id=project_id,
        is_archived=True,
        conditions=[
            ("memory_type='decision' AND importance<=3 AND archived_days>365"),
            ("memory_type='issue' AND archived_days>180"),
            ("memory_type='state' AND archived_days>90"),
            ("memory_type='learning' AND access_count=0 AND age_days>365")
        ]
    )
    
    for memory in delete_candidates:
        delete_memory(memory.memory_id)
        log_lifecycle_action("DELETED", memory.memory_id, reason="Retention period expired")

Hierarchical Information Organization

Organization Template

Four-Level Hierarchy:

Level 1: Executive Summary (Target: 500-1,000 tokens)
├─ Level 2: Section Summaries (Target: 2,000-4,000 tokens)
│  ├─ Level 3: Detailed Information (Target: 10,000-20,000 tokens)
│  │  └─ Level 4: Raw Code/Data (Variable: 20,000-80,000 tokens)

Level 1 - Executive Summary:

  • Purpose: 30-second read for project orientation
  • Contents:
    • Project goal (1 sentence)
    • Current status and completion percentage
    • Top 3 priorities
    • Critical blockers (if any)
    • Key architectural decisions (links to Level 2)
# PROJECT EXECUTIVE SUMMARY

**Goal**: Build scalable e-commerce API supporting 50k concurrent users

**Status**: 45% complete - Authentication and user management done, payment processing in progress

**Priorities**:
1. Complete payment service integration [BLOCKED: pending PCI compliance review]
2. Implement order management service
3. Set up production infrastructure

**Architecture**: Microservices on Kubernetes, PostgreSQL database, Redis caching
[Details: §2.1-Architecture](#21-architecture)

**Last Updated**: 2025-11-04T15:30:00Z | **Agent**: Implementation-Lead-003

Level 2 - Section Summaries:

  • Purpose: 2-minute read for context establishment
  • Contents:
    • Subsystem summaries (3-5 sentences each)
    • Key decisions per subsystem with rationale
    • Current progress and next actions
    • Links to Level 3 detailed information
## 2.1 Architecture

**Overview**: Microservices architecture with 5 core services (Auth, User, Payment, Order, Inventory) deployed on Kubernetes. API Gateway handles routing and rate limiting. PostgreSQL for transactional data, Redis for caching and sessions.

**Key Decisions**:
- Microservices over monolith for independent scaling [DEC-2025-10-15-001]
- PostgreSQL over MongoDB for ACID compliance [DEC-2025-10-20-003]
- Kubernetes for orchestration supporting multi-region deployment [DEC-2025-10-22-005]

**Status**: Architecture validated through proof-of-concept. Auth and User services deployed to staging. Payment service 60% complete.

**Next Actions**: Complete payment service, deploy order service, implement circuit breakers between services.

[Detailed Architecture Docs: §3.1](#31-architecture-details)

Level 3 - Detailed Information:

  • Purpose: Deep technical context for implementation
  • Contents:
    • Complete technical specifications
    • Implementation details with code snippets
    • Decision rationale with alternatives
    • Known issues and workarounds
    • References to Level 4 raw files
## 3.1 Architecture Details

### 3.1.1 Service Communication Pattern

Services communicate via synchronous REST APIs for critical paths and asynchronous message queues (RabbitMQ) for non-blocking operations.

**Synchronous Patterns** (Request-Response):
- Auth validation: Gateway → Auth Service
- User data retrieval: Any Service → User Service
- Payment processing: Order Service → Payment Service

**Asynchronous Patterns** (Event-Driven):
- Order confirmation: Order Service → Email Service (via queue)
- Inventory updates: Order Service → Inventory Service (via queue)
- Analytics events: All Services → Analytics Pipeline (via queue)

**Rationale**: Synchronous for operations requiring immediate response (user-facing). Asynchronous for fire-and-forget operations improving perceived performance.

**Implementation**: See [gateway/routing_config.yaml](§4.gateway.routing) for routing rules, [services/order/queue_handlers.py](§4.services.order.queue) for message handling.

### 3.1.2 Database Schema Design

[Detailed schema documentation with ER diagrams, constraints, indexing strategies]

[See full schema: §4.database.schema](#4-database-schema)

Level 4 - Raw Code/Data:

  • Purpose: Source of truth for implementation
  • Contents:
    • Complete source files
    • Configuration files
    • Database schemas
    • API specifications
    • Test suites
# §4.services.auth.jwt_handler
# Complete implementation file loaded from: src/services/auth/jwt_handler.py

import jwt
from datetime import datetime, timedelta
from typing import Dict, Optional

class JWTHandler:
    """
    JWT token generation and validation for authentication service.
    
    Design decisions:
    - 15-minute access token expiry (security vs. UX balance)
    - 7-day refresh token expiry (mobile offline support)
    - RS256 algorithm (asymmetric for multi-service verification)
    """
    
    def __init__(self, private_key: str, public_key: str):
        self.private_key = private_key
        self.public_key = public_key
        self.access_token_expiry = timedelta(minutes=15)
        self.refresh_token_expiry = timedelta(days=7)
    
    def generate_access_token(self, user_id: str, permissions: list[str]) -> str:
        """Generate short-lived access token with user permissions"""
        payload = {
            'user_id': user_id,
            'permissions': permissions,
            'token_type': 'access',
            'exp': datetime.utcnow() + self.access_token_expiry,
            'iat': datetime.utcnow()
        }
        return jwt.encode(payload, self.private_key, algorithm='RS256')
    
    # ... [Additional 200 lines of implementation]

Navigation System

Section Markers:

Use consistent notation for cross-references:

§1.0    - Level 1 (Executive Summary)
§2.1    - Level 2 (Section Summary)
§3.1.2  - Level 3 (Detailed Information)
§4.file.path - Level 4 (Raw Files)

Quick Reference Index (Insert at top of context):

# NAVIGATION INDEX

## Executive Summary
[§1.0 - Project Overview](#10-project-overview) - Goal, status, priorities, blockers

## Section Summaries
[§2.1 - Architecture](#21-architecture) - System design, key decisions
[§2.2 - Authentication](#22-authentication) - Auth service implementation
[§2.3 - Payment](#23-payment) - Payment integration status [IN PROGRESS]
[§2.4 - Infrastructure](#24-infrastructure) - Deployment and operations

## Detailed Documentation
[§3.1 - Architecture Details](#31-architecture-details) - Deep technical specs
[§3.2 - Database Schema](#32-database-schema) - Tables, relationships, indexes
[§3.3 - API Specifications](#33-api-specifications) - Endpoint definitions

## Source Files (Level 4)
[§4.services.auth.*](#4-services-auth) - Authentication service code
[§4.services.payment.*](#4-services-payment) - Payment service code
[§4.database.migrations.*](#4-database-migrations) - DB migration scripts
[§4.tests.*](#4-tests) - Test suites

---

Navigation Shortcuts:

<!-- Quick jump to specific information -->

Need: JWT implementation → [§4.services.auth.jwt_handler](#jwt-handler)
Need: Decision rationale for PostgreSQL → [§2.1 Architecture - Key Decisions](#21-architecture)
Need: Current blockers → [§1.0 Executive Summary](#10-project-overview)
Need: Payment service status → [§2.3 Payment](#23-payment)

Summary Guidelines

Level 1 (Executive) Summary Rules:

  1. Maximum 1,000 tokens
  2. No technical jargon - understandable by non-technical stakeholders
  3. Focus on outcomes, not implementation details
  4. Always include: Goal, Status %, Top priorities, Blockers
  5. Update every major milestone (≥10% progress change)

Level 2 (Section) Summary Rules:

  1. 200-500 tokens per section
  2. Moderate technical detail - understandable by technical generalists
  3. Include: Overview, Key decisions with IDs, Current status, Next actions
  4. Link to Level 3 for details
  5. Update when section changes significantly

Level 3 (Detailed) Summary Rules:

  1. 1,000-3,000 tokens per detailed section
  2. Full technical depth - for implementation
  3. Include: Specifications, Rationale, Implementation notes, References to code
  4. Link to Level 4 raw files
  5. Update when implementation details change

Level 4 (Raw) Summary Rules:

  1. No summary - raw content only
  2. Include file metadata: path, last modified, size, purpose
  3. For large files (>5,000 tokens), provide section markers within file
  4. Keep synchronized with actual files

Information Scoping Rules

What belongs at each level:

Information Type Level 1 Level 2 Level 3 Level 4
Project goals - -
Current status -
Critical blockers -
Architectural decisions Summary only Title + rationale Full analysis -
Implementation details - Brief mention
Code snippets - - Key excerpts Full files
Configuration - - Critical settings Full configs
API endpoints Count only List Specifications Implementation
Database schema Mention only Table names Full schema Migration files
Test results Pass/fail status Summary Detailed results Raw logs

Maintenance During Updates

Propagation Rules:

When updating information, propagate changes according to this matrix:

Update Type Update L4 Update L3 Update L2 Update L1
Code change Always If significant If impacts status If changes status %
Bug fix Always If notable If critical bug If was blocker
Config change Always If architecture If impacts design Rarely
New feature Always Always Always If major feature
Progress update - Status only Status + % Status + %
Decision made Reference Full details Summary Link only

Update Procedure:

def update_hierarchical_documentation(change_type: str, scope: str, details: dict):
    """
    Propagate documentation updates through hierarchy.
    """
    
    # Always update Level 4 (raw files) first
    update_level_4(details['file_path'], details['changes'])
    
    # Determine propagation based on change type and scope
    propagate_to_level_3 = should_propagate_to_level_3(change_type, scope)
    propagate_to_level_2 = should_propagate_to_level_2(change_type, scope)
    propagate_to_level_1 = should_propagate_to_level_1(change_type, scope)
    
    if propagate_to_level_3:
        update_level_3_section(
            section=details['section'],
            change_summary=details['technical_summary']
        )
    
    if propagate_to_level_2:
        update_level_2_summary(
            section=details['section'],
            status_change=details['status_impact'],
            next_actions_change=details['next_actions']
        )
    
    if propagate_to_level_1:
        update_executive_summary(
            status_pct_change=details['completion_delta'],
            priority_change=details['priority_impact'],
            blocker_change=details['blocker_status']
        )
    
    # Update navigation index if structure changed
    if details.get('structure_change', False):
        regenerate_navigation_index()

Consistency Checks:

Run these validations after updates:

def validate_hierarchy_consistency() -> list[str]:
    """Returns list of inconsistencies found"""
    
    issues = []
    
    # Check 1: All Level 3 references to Level 4 are valid
    for ref in extract_level_4_references_from_level_3():
        if not file_exists(ref.file_path):
            issues.append(f"Broken L3→L4 reference: {ref}")
    
    # Check 2: All Level 2 summaries have corresponding Level 3 details
    for summary in get_level_2_summaries():
        if not has_level_3_details(summary.section_id):
            issues.append(f"L2 summary without L3 details: {summary.section_id}")
    
    # Check 3: Status percentages consistent across levels
    l1_status = get_level_1_status_percentage()
    l2_aggregated = aggregate_level_2_status_percentages()
    if abs(l1_status - l2_aggregated) > 5:  # Allow 5% tolerance
        issues.append(f"Status mismatch: L1={l1_status}%, L2 aggregate={l2_aggregated}%")
    
    # Check 4: Decision IDs referenced exist in memory
    for decision_ref in extract_all_decision_references():
        if not memory_exists(decision_ref):
            issues.append(f"Referenced decision not found: {decision_ref}")
    
    return issues

Multi-Agent Context Coordination

Agent Ownership Model

Context Ownership Principles:

  1. Primary Owner: One agent has write access to a context section at a time
  2. Read-Only Access: Other agents can read but not modify owned sections
  3. Ownership Transfer: Explicit handoff protocol required to transfer ownership
  4. Shared Sections: Common reference materials (Level 1, Level 2 summaries) are read-only to all

Ownership Tracking:

class ContextOwnership:
    """Track which agent owns which context sections"""
    
    section_id: str              # e.g., "§2.3-payment", "§4.services.auth"
    owner_agent_id: str          # Current owner
    ownership_type: str          # "exclusive" | "shared-write" | "read-only"
    acquired_at: str             # When ownership acquired
    expires_at: str | None       # Optional expiration for auto-release
    previous_owner: str | None   # For audit trail
    lock_reason: str             # Why this section is owned

class ContextSection:
    """Represents a section of context with ownership metadata"""
    
    section_id: str
    level: int                   # 1-4 hierarchy level
    content: str
    ownership: ContextOwnership
    last_modified_by: str
    last_modified_at: str
    modification_count: int
    agents_with_read_access: list[str]

Agent Handoff Protocols

Handoff Trigger Conditions:

Condition Action Example
Agent A completes assigned task Automatic handoff to coordinator Research agent → Implementation agent
Agent A encounters blocker outside expertise Request handoff to specialist Backend agent → Database agent for schema design
Agent A reaches token capacity Compress and handoff to fresh agent Long-running agent → Continuation agent
Scheduled rotation Planned handoff at milestone Phase 1 agent → Phase 2 agent
Agent A timeout/failure Emergency handoff to recovery agent Failed agent → Supervisor agent

Handoff Protocol Procedure:

def execute_agent_handoff(
    from_agent: str,
    to_agent: str,
    handoff_type: str,  # "complete" | "specialist" | "continuation" | "emergency"
    context_sections: list[str]
) -> dict:
    """
    Execute structured handoff between agents.
    Returns handoff package for recipient agent.
    """
    
    # Step 1: Compact context for handoff
    if handoff_type == "continuation":
        # Heavy compaction for token refresh
        target_reduction = 0.40  # 40% reduction
    else:
        # Light compaction to preserve relevant details
        target_reduction = 0.20  # 20% reduction
    
    compacted_context = compact_context_for_handoff(
        sections=context_sections,
        reduction_target=target_reduction
    )
    
    # Step 2: Generate handoff package
    handoff_package = {
        'handoff_metadata': {
            'from_agent_id': from_agent,
            'to_agent_id': to_agent,
            'handoff_type': handoff_type,
            'timestamp': current_time(),
            'reason': get_handoff_reason(),
            'context_token_count': estimate_tokens(compacted_context)
        },
        
        'agent_expertise_match': {
            'required_skills': identify_required_skills(context_sections),
            'to_agent_capabilities': get_agent_capabilities(to_agent),
            'skill_match_score': calculate_skill_match(to_agent, context_sections)
        },
        
        'context_summary': {
            'executive_summary': extract_level_1_summary(),
            'critical_decisions': extract_decisions_from_context(importance_gte=7),
            'open_issues': extract_unresolved_issues(),
            'current_task_state': extract_current_state(),
            'blocking_dependencies': identify_blockers()
        },
        
        'work_products': {
            'completed': list_completed_items_by_agent(from_agent),
            'in_progress': list_in_progress_items(from_agent),
            'not_started': list_pending_items()
        },
        
        'ownership_transfers': [
            ContextOwnership(
                section_id=section,
                owner_agent_id=to_agent,
                ownership_type="exclusive",
                acquired_at=current_time(),
                previous_owner=from_agent,
                lock_reason=f"Handoff from {from_agent}"
            )
            for section in context_sections
        ],
        
        'next_actions': {
            'immediate': extract_immediate_next_actions(),
            'short_term': extract_short_term_goals(),
            'success_criteria': extract_acceptance_criteria()
        },
        
        'agent_specific_notes': {
            'from_agent_observations': collect_agent_observations(from_agent),
            'suggested_approach': get_suggested_approach(from_agent),
            'known_pitfalls': list_known_pitfalls_for_task(),
            'useful_resources': list_helpful_resources()
        },
        
        'compacted_context': compacted_context
    }
    
    # Step 3: Record handoff in persistent memory
    handoff_memory = create_handoff_memory(handoff_package)
    store_memory(handoff_memory)
    
    # Step 4: Update ownership records
    for transfer in handoff_package['ownership_transfers']:
        update_ownership_record(transfer)
    
    # Step 5: Notify coordinator (if multi-agent orchestration)
    notify_coordinator({
        'event': 'agent_handoff',
        'from': from_agent,
        'to': to_agent,
        'sections': context_sections,
        'timestamp': current_time()
    })
    
    return handoff_package

Handoff Package Token Budget:

Handoff Type Target Token Budget Rationale
Complete task 15,000-25,000 Full context transfer including learnings
Specialist consultation 5,000-10,000 Focused problem scope only
Continuation (token refresh) 30,000-40,000 Preserve max context for continuity
Emergency recovery 10,000-15,000 Critical state only, fast recovery

Context Coordination Rules

Read Access Rules:

  1. Level 1 (Executive Summary): Always readable by all agents
  2. Level 2 (Section Summaries): Readable by all agents in project
  3. Level 3 (Detailed Info): Readable by agents with task relevance
  4. Level 4 (Raw Files): Readable only by owner + agents with explicit access

Write Access Rules:

  1. Exclusive Ownership: Only owner can modify owned sections
  2. Shared Write Sections: Multiple agents can write if designated "shared-write"
  3. Conflict Resolution: Last-write-wins with conflict detection
  4. Audit Trail: All modifications logged with agent ID and timestamp

Conflict Prevention:

def acquire_section_ownership(
    agent_id: str,
    section_id: str,
    operation: str  # "read" | "write"
) -> bool:
    """
    Attempt to acquire ownership or access to a section.
    Returns True if successful, False if denied.
    """
    
    current_ownership = get_section_ownership(section_id)
    
    # Read operations: Always allowed for Levels 1-2, check for 3-4
    if operation == "read":
        if section_id.startswith("§1") or section_id.startswith("§2"):
            return True
        return agent_id in get_section_read_access_list(section_id)
    
    # Write operations: Check ownership
    if operation == "write":
        # No current owner - acquire ownership
        if current_ownership is None:
            set_section_ownership(section_id, agent_id, "exclusive")
            return True
        
        # Shared write section
        if current_ownership.ownership_type == "shared-write":
            return True
        
        # Exclusive ownership by current agent
        if current_ownership.owner_agent_id == agent_id:
            return True
        
        # Owned by another agent - denied
        return False

Coordination State Synchronization:

For distributed multi-agent systems, maintain synchronization:

class CoordinationState:
    """Shared state for multi-agent coordination"""
    
    project_id: str
    active_agents: list[str]
    section_ownership_map: dict[str, ContextOwnership]
    agent_task_assignments: dict[str, list[str]]
    global_blockers: list[str]
    shared_resources: dict[str, any]
    
    last_sync_timestamp: str
    sync_version: int  # Optimistic locking

def synchronize_coordination_state(agent_id: str) -> CoordinationState:
    """
    Fetch latest coordination state and resolve any conflicts.
    """
    
    local_state = get_local_coordination_state(agent_id)
    remote_state = fetch_remote_coordination_state()
    
    # Detect conflicts
    if local_state.sync_version != remote_state.sync_version:
        # Conflict: Resolve using strategy
        resolved_state = resolve_coordination_conflict(
            local_state,
            remote_state,
            resolution_strategy="remote-wins-on-ownership"
        )
        
        # Apply resolved state locally
        apply_coordination_state(agent_id, resolved_state)
        return resolved_state
    
    return remote_state

Integration with Prompt Engineering (SKILL-003)

Context Organization in System Prompts

System Prompt Structure (Leveraging SKILL-003 principles):

# SYSTEM PROMPT - Agent ID: Implementation-Lead-003

## Role and Capabilities
[Agent role definition - 500 tokens]

## Project Context (Hierarchical)
[Level 1 Executive Summary - 1,000 tokens]
- Automatically included in every interaction
- Provides constant orientation

## Active Task Context
[Current task from Level 2 - 2,000 tokens]
- Dynamically updated based on current work
- Links to Level 3 details as needed

## Critical Knowledge
[Key decisions and constraints - 3,000 tokens]
- Architecture decisions with IDs
- Critical issues and blockers
- Must-follow constraints

## Available Resources
[Links to Level 3, Level 4 content]
- Load on-demand using section markers
- "For database schema details, see §3.2"
- "For JWT implementation, see §4.services.auth.jwt_handler"

## Success Criteria
[Acceptance criteria for current task - 1,000 tokens]

Total system prompt: ~7,500 tokens (3.75% of context window)

Dynamic Context Loading:

def construct_system_prompt_with_context(
    agent_id: str,
    project_id: str,
    current_task: str
) -> str:
    """
    Build system prompt with appropriate context for agent and task.
    Target: 5,000-10,000 tokens for system prompt portion.
    """
    
    # Core agent definition (static)
    agent_definition = load_agent_definition(agent_id)  # 500 tokens
    
    # Level 1 summary (always included)
    executive_summary = get_level_1_summary(project_id)  # 1,000 tokens
    
    # Task-relevant Level 2 sections
    relevant_sections = identify_relevant_sections(current_task)
    section_summaries = load_level_2_summaries(relevant_sections)  # 2,000 tokens
    
    # Critical decisions and constraints
    critical_knowledge = load_critical_knowledge(
        project_id=project_id,
        importance_gte=8,
        relevance_to_task=current_task
    )  # 3,000 tokens
    
    # Acceptance criteria
    success_criteria = extract_acceptance_criteria(current_task)  # 1,000 tokens
    
    # Resource links (Level 3, Level 4)
    resource_index = generate_resource_index(relevant_sections)  # 500 tokens
    
    prompt = f"""
{agent_definition}

# PROJECT CONTEXT
{executive_summary}

# CURRENT FOCUS
{section_summaries}

# CRITICAL KNOWLEDGE
{critical_knowledge}

# SUCCESS CRITERIA
{success_criteria}

# AVAILABLE RESOURCES
{resource_index}

---
"""
    
    return prompt

Dynamic State in User Messages

User Message Context Loading Strategy:

Instead of loading everything in system prompt, dynamically include in user messages:

def construct_user_message_with_context(
    user_query: str,
    required_context: list[str]
) -> str:
    """
    Augment user query with just-in-time context.
    """
    
    # Score and prioritize context items
    scored_context = [
        (ctx, calculate_relevance_score(ctx, user_query))
        for ctx in required_context
    ]
    
    # Sort by relevance and load until token budget
    scored_context.sort(key=lambda x: x[1], reverse=True)
    
    context_sections = []
    token_count = estimate_tokens(user_query)
    max_tokens = 50000  # Reserve 50k tokens for user message context
    
    for context_item, score in scored_context:
        if score < 0.3:  # Relevance threshold
            break
            
        content = load_context_content(context_item)
        content_tokens = estimate_tokens(content)
        
        if token_count + content_tokens > max_tokens:
            break
        
        context_sections.append(content)
        token_count += content_tokens
    
    # Construct message
    message = f"""
{user_query}

<relevant_context>
{''.join(context_sections)}
</relevant_context>
"""
    
    return message

Token-Aware Prompt Design

Prompt Engineering Patterns for Context Management:

  1. Progressive Disclosure Pattern:
System Prompt: "You have access to detailed documentation via section markers (§).
When you need specific information, indicate which section you need, and it will be
loaded into context. Do not request all sections at once."

User Message: "Implement the payment service endpoint."

Agent Response: "I'll need the payment service specifications. Please load §3.3.2
Payment API Specifications."

[System loads §3.3.2 into next user message]
  1. Context Pruning Pattern:
System Prompt: "Periodically review your context and identify information that is
no longer needed. When you identify such information, explicitly state:
'PRUNE: [section_id] - [reason]' and it will be removed to free tokens."

Agent: "PRUNE: §4.services.user.old_implementation - Replaced by new version,
no longer needed for reference."

[System removes pruned section]
  1. Summary Elevation Pattern:
System Prompt: "When working with large files (>10,000 tokens), first generate
a 500-token summary and propose working with the summary. Only load full file
if summary is insufficient."

Agent: "I've analyzed §4.database.migration_001 (15,000 tokens). Here's a summary:
[500-token summary]. This summary should be sufficient for current task. Load full
file only if we need to modify the migration."

Best Practices Checklist

Context Window Optimization:

  • Token allocation follows 40% knowledge / 50% active / 10% session / 5% buffer distribution
  • File loading uses relevance scoring algorithm (50% relevance, 30% recency, 20% dependency)
  • Capacity monitoring implemented with thresholds (80% yellow, 90% orange, 95% red)
  • Just-in-time loading strategy for files scoring < 70
  • Token estimation uses Claude API or validated approximation formulas

Context Compaction:

  • Compaction triggered automatically at 80% capacity
  • All preservation rules (score: 100) enforced - no critical data discarded
  • Architectural decisions preserved with full context (decision, rationale, alternatives, impact)
  • Unresolved issues (critical/high severity) retained with investigation status
  • Last 5 file modifications preserved with change summaries
  • Current task state includes completion %, substeps, next actions, blockers
  • Deduplication applied to tool outputs (85% similarity threshold)
  • Compaction achieves 20-35% token savings target
  • Validation confirms no score-100 items removed

Cross-Session Memory:

  • Session summary generated at session end capturing decisions, issues, state, learnings
  • Decision logging uses complete template with all required fields
  • Persistent memory schema implemented with all required fields (memory_id, type, timestamp, project_id, agent_id, content, importance)
  • Memory rehydration loads: active state (most recent), high-importance decisions (last 30 days), critical/high issues (unresolved), relevant learnings (last 60 days)
  • Memory search supports query, type filtering, tag filtering, importance filtering, time filtering
  • Memory lifecycle automation runs daily (archive aged/low-importance, delete expired)
  • Access count tracking implemented for usage-based retention

Hierarchical Organization:

  • Four-level hierarchy implemented (Executive → Section → Detailed → Raw)
  • Level 1 (Executive) ≤ 1,000 tokens with goal, status, priorities, blockers
  • Level 2 (Section) 200-500 tokens per section with overview, decisions, status, next actions
  • Level 3 (Detailed) 1,000-3,000 tokens per section with specs, rationale, implementation notes
  • Level 4 (Raw) contains complete source files with metadata
  • Navigation index provided with section markers (§) for all levels
  • Section markers used consistently (§1.0, §2.1, §3.1.2, §4.file.path)
  • Update propagation rules followed (L4 → L3 → L2 → L1 based on change significance)
  • Consistency validation run after updates (references valid, status percentages aligned)

Multi-Agent Coordination:

  • Context ownership tracking implemented per section
  • Ownership types defined (exclusive | shared-write | read-only)
  • Agent handoff protocol implemented with structured handoff package
  • Handoff package includes: metadata, context summary, work products, ownership transfers, next actions, agent notes, compacted context
  • Handoff compaction: 40% reduction for continuation, 20% for other types
  • Read access rules enforced (L1/L2 readable by all, L3/L4 restricted)
  • Write access rules enforced (exclusive ownership required for writes)
  • Coordination state synchronized across agents with conflict resolution

Prompt Engineering Integration:

  • System prompt structure follows SKILL-003 principles
  • System prompt includes: role, L1 summary, active task, critical knowledge, resource links
  • System prompt token budget: 5,000-10,000 tokens (2.5-5% of context window)
  • Dynamic context loading in user messages based on relevance scoring
  • Progressive disclosure pattern implemented (load details on-demand)
  • Context pruning pattern enabled (explicit PRUNE statements)
  • Summary elevation pattern used for large files (>10,000 tokens)

General:

  • All quantitative thresholds explicitly defined (no vague guidance)
  • All algorithms include implementation details
  • All schemas include complete field specifications
  • All procedures are step-by-step executable
  • Automation-friendly rules (threshold-based, not subjective)
  • Examples provided with actual token counts
  • Integration with SKILL-003 clearly documented

Common Pitfalls to Avoid

  1. Premature Loading: Loading all files at session start without relevance assessment

    • Problem: Wastes 30-40% of context window on unused files
    • Solution: Use file loading prioritization algorithm, load just-in-time for score < 70
  2. No Capacity Monitoring: Ignoring context usage until hitting hard limit

    • Problem: Emergency compaction loses information, disrupts workflow
    • Solution: Implement monitoring with thresholds, compact proactively at 80%
  3. Discarding Architectural Decisions: Removing decisions during compaction to save tokens

    • Problem: Loss of rationale leads to contradictory future decisions
    • Solution: Always preserve preservation-score 100 items, use validation checklist
  4. Verbose Tool Output Retention: Keeping complete logs of successful operations

    • Problem: Redundant confirmations consume 10-15% of context
    • Solution: Summarize successful operations, keep only actionable data
  5. No Cross-Session Memory: Starting each session from scratch

    • Problem: Repeatedly re-analyzing same codebase, forgetting past decisions
    • Solution: Implement session summary generation and memory rehydration
  6. Flat Information Structure: Organizing all information at same detail level

    • Problem: Cannot navigate quickly, must read entire context for any query
    • Solution: Use 4-level hierarchy with navigation markers
  7. Missing Decision Rationale: Recording "what" without "why"

    • Problem: Future agents/sessions don't understand constraints behind decisions
    • Solution: Use complete decision logging template with alternatives considered
  8. Over-Aggressive Compaction: Targeting >50% token reduction

    • Problem: Loses important context details, breaks continuity
    • Solution: Target 20-35% reduction, focus on deduplication and discard rules
  9. No Agent Handoff Protocol: Informal context transfer between agents

    • Problem: Knowledge loss, duplicated work, contradictory approaches
    • Solution: Use structured handoff package with ownership transfers
  10. Static System Prompts: Loading all context in system prompt regardless of task

    • Problem: Wastes tokens on irrelevant information
    • Solution: Dynamic context loading based on current task relevance
  11. Ignoring Token Costs: Using approximations when accuracy critical

    • Problem: 10-15% estimation errors lead to context overflow
    • Solution: Use Claude API tokenization for long sessions (justified marginal cost)
  12. No Memory Lifecycle: Accumulating memories indefinitely

    • Problem: Memory search becomes slow, outdated information pollutes results
    • Solution: Implement archival and deletion rules with automated maintenance
  13. Duplicate Information Across Levels: Repeating same details in L1, L2, L3

    • Problem: Wastes tokens, creates update inconsistency
    • Solution: Follow information scoping rules, link between levels instead of duplicating
  14. Poor Section Marker Discipline: Inconsistent or missing navigation markers

    • Problem: Cannot implement progressive disclosure, forced to load everything
    • Solution: Use consistent §X.Y.Z notation, maintain navigation index
  15. No Validation After Compaction: Trusting compaction didn't lose critical data

    • Problem: Silently loses architectural decisions, unresolved issues
    • Solution: Run validation checklist, verify preservation-scored items present

Token Budget Examples

Example 1: Small Feature Implementation (Single session, 2-4 hours)

Total Budget: 200,000 tokens

Allocation:
- System prompts & skills: 60,000 (30%)
  • Agent definition: 5,000
  • Prompt engineering skill: 8,000
  • Context management skill: 12,000
  • Language-specific skills: 15,000
  • Other skills: 20,000

- Active context: 90,000 (45%)
  • Level 1 executive summary: 1,000
  • Level 2 section summaries: 5,000
  • Level 3 relevant details: 10,000
  • Level 4 active files (3-5 files): 35,000
  • Tool outputs: 15,000
  • Working notes: 10,000
  • Task state: 2,000
  • Recent modifications: 5,000
  • Buffer: 7,000

- Session memory: 30,000 (15%)
  • Architectural decisions: 8,000
  • Unresolved issues: 5,000
  • Critical implementation notes: 7,000
  • Recent change history: 10,000

- Buffer/overhead: 20,000 (10%)
  • Safety margin: 20,000

Compaction Strategy: Likely not needed for single session feature
Memory Persistence: Generate session summary at end (~5,000 tokens)

Example 2: Medium Complexity Project (Multi-session, 2-3 days, 8-12 hours total)

Session 1 Budget: 200,000 tokens

Initial Allocation:
- System prompts & skills: 70,000 (35%)
  • Increased due to multi-session requirements
  
- Active context: 85,000 (42.5%)
  • Level 1-2: 6,000
  • Level 3: 15,000
  • Level 4 files (10-15 files): 50,000
  • Tool outputs: 14,000

- Session memory: 25,000 (12.5%)
  • Decisions: 10,000
  • Issues: 8,000
  • State: 7,000

- Buffer: 20,000 (10%)

Compaction Timeline:
- Session 1, Hour 3: 80% capacity → Compact to 65% (-30,000 tokens)
- Session 1 end: Generate summary (8,000 tokens)

Session 2 Budget: 200,000 tokens

Rehydrated Allocation:
- System prompts & skills: 70,000 (35%)
- Rehydrated memory: 20,000 (10%)
  • Session 1 summary: 8,000
  • Persisted decisions: 7,000
  • Unresolved issues: 5,000
- Active context: 90,000 (45%)
- Session memory: 20,000 (10%)

Example 3: Large Codebase Analysis (Research phase, multi-session, 1 week)

Session 1 (Initial exploration) Budget: 200,000 tokens

Allocation:
- System prompts & skills: 65,000 (32.5%)
- Active context: 95,000 (47.5%)
  • Hierarchical navigation: 15,000
  • File samples (20+ files): 60,000
  • Web search results: 15,000
  • Analysis notes: 5,000
- Session memory: 25,000 (12.5%)
- Buffer: 15,000 (7.5%)

Compaction Events:
- Hour 2: 85% → Compact web search deduplication (-12,000)
- Hour 4: 82% → Compact file samples, keep summaries (-25,000)
- Session end: Generate comprehensive summary (15,000 tokens)

Sessions 2-5 (Deep dive):

Each session:
- Rehydrate 25,000 tokens from previous sessions
- Compact every 3-4 hours
- Generate summary with learnings (10,000 tokens each)

Final Session (Synthesis):

Budget: 200,000 tokens
Rehydrated: 50,000 tokens (compressed from 5 sessions)
- Key decisions from all sessions: 15,000
- Critical findings: 20,000
- Architecture summary: 15,000

Active work: 120,000 tokens
- Synthesizing final report
- Creating architectural diagrams
- Documenting decisions

Example 4: Multi-Agent Development (Implementation phase, coordinated team)

Project Total: 5 agents × 200,000 = 1,000,000 tokens available

Shared Context (Replicated across all agents): 80,000 tokens
- Level 1 executive summary: 2,000
- Level 2 complete: 10,000
- Critical architectural decisions: 20,000
- System-wide constraints: 8,000
- Agent coordination state: 5,000
- Shared resources: 15,000
- Navigation index: 5,000
- Multi-agent protocols: 15,000

Per-Agent Allocation: 120,000 tokens individual context
- Agent-specific system prompts: 20,000
- Agent task context: 60,000
- Agent working memory: 25,000
- Agent buffer: 15,000

Agent Handoff Budget: 25,000 tokens per handoff
- Handoff metadata: 1,000
- Context summary: 8,000
- Work products: 10,000
- Next actions: 3,000
- Agent notes: 3,000

Coordination Overhead: 40,000 tokens
- Ownership tracking: 10,000
- Conflict resolution state: 10,000
- Global blockers: 5,000
- Agent task queue: 15,000

Total Effective Usage: 80,000 (shared) + (5 × 120,000) (agents) + 40,000 (coordination) = 720,000 tokens
Efficiency: 72% (remaining 28% is protocol overhead, acceptable for coordination)

Quick Reference

Context Allocation (200k tokens)

  • 30-40%: Knowledge base & system instructions (60k-80k)
  • 40-50%: Active task context (80k-100k)
  • 10-15%: Session memory (20k-30k)
  • 5-10%: Buffer/overhead (10k-20k)

Capacity Thresholds

  • Green (0-79%): Normal operation
  • Yellow (80-89%): Plan compaction within 10 operations
  • Orange (90-94%): Compact immediately before next major operation
  • Red (95-100%): Emergency compaction, shed low-priority content

File Loading Prioritization

SCORE = (RELEVANCE × 0.50) + (RECENCY × 0.30) + (DEPENDENCY × 0.20)
- Score ≥ 70: Preload
- Score 40-69: Just-in-time
- Score < 40: On-demand only

Compaction Preservation (Score: 100)

Must preserve:

  1. Architectural decisions (with timestamp, rationale, alternatives, impact)
  2. Active bugs and unresolved issues (Critical/High severity)
  3. Critical implementation details (security, performance, data integrity)
  4. Recent file modifications (last 5 operations)
  5. Current task state (completion %, substeps, next actions, blockers)

Compaction Discard (Score: 0-30)

Can safely discard:

  1. Redundant tool outputs (85%+ similarity)
  2. Resolved issues with confirmed fixes (30+ days old)
  3. Exploratory attempts explicitly abandoned
  4. Verbose debug logs when summary captures key points
  5. Successful operation confirmations (keep only summary)

Memory Schema Fields (Required)

memory_id: str           # UUID
memory_type: str         # "decision" | "issue" | "state" | "learning"
timestamp: str           # ISO 8601
project_id: str
agent_id: str
title: str               # Max 100 chars
content: dict            # Type-specific structured content
tags: list[str]
importance: int          # 1-10

Memory Lifecycle

Archive:

  • Decisions: Importance ≤5 AND age >90 days
  • Issues: Resolved AND age >30 days
  • Learnings: Access count=0 AND age >180 days

Delete:

  • Decisions: Importance ≤3 AND archived >365 days
  • Issues: Resolved AND archived >180 days
  • Learnings: Access count=0 AND age >365 days

Hierarchy Levels

  1. L1 Executive: ≤1,000 tokens - Goal, status, priorities, blockers
  2. L2 Section: 200-500 tokens/section - Overview, decisions, status, next actions
  3. L3 Detailed: 1,000-3,000 tokens/section - Specs, rationale, implementation
  4. L4 Raw: Variable - Complete source files

Navigation Markers

§1.0      - Level 1 (Executive)
§2.1      - Level 2 (Section)
§3.1.2    - Level 3 (Detailed)
§4.file.path - Level 4 (Raw)

Agent Handoff Token Budget

  • Complete task: 15k-25k tokens
  • Specialist: 5k-10k tokens
  • Continuation: 30k-40k tokens
  • Emergency: 10k-15k tokens

Handoff Compaction

  • Continuation: 40% reduction
  • Other types: 20% reduction

Context Ownership Types

  • exclusive: Single agent write access
  • shared-write: Multiple agents can write
  • read-only: All agents can read, none can write

Token Estimation (Approximation)

  • Code: 0.75 tokens/char
  • Documentation: 0.65 tokens/char
  • JSON/data: 0.85 tokens/char
  • Logs: 0.70 tokens/char

For critical long sessions: Use Claude API tokenization (justified cost)

System Prompt Budget

Target: 5,000-10,000 tokens (2.5-5% of context window)

  • Agent definition: 500
  • L1 summary: 1,000
  • Task context: 2,000
  • Critical knowledge: 3,000
  • Success criteria: 1,000
  • Resource index: 500

Progressive Disclosure Patterns

  1. Load on-demand: Reference §markers, load when needed
  2. Prune explicitly: State "PRUNE: §X.Y - reason"
  3. Summarize first: 500-token summary before loading large files (>10k tokens)

Document Version: 1.0.0
Last Updated: 2025-11-04
Total Token Count: ~58,000 tokens
Integration: SKILL-003 (Prompt Engineering)