name	maker-methodology
description	Apply MAKER (Massively Decomposed Agentic Processes) to solve long sequential tasks using task decomposition, multi-agent voting, and error correction. Use when facing complex multi-step problems, sequential planning, constraint satisfaction, or tasks requiring many consecutive decisions.
allowed-tools	Read, Write, Edit, Bash, Grep, Glob

MAKER Methodology

Solve million-step tasks with zero errors using Massively Decomposed Agentic Processes.

Based on: "Solving a Million-Step LLM Task with Zero Errors"

Core Principles

1. Maximal Agentic Decomposition (MAD)

Break complex tasks into minimal single-step subtasks, not monolithic solutions.

Instead of: "Generate entire solution" Do: "Determine next single step" repeated N times

2. First-to-Ahead-by-k Voting

Use multiple independent agents voting on each step:

Continue sampling until one option leads by k votes
k grows logarithmically with task complexity: Θ(ln s)
Prevents error propagation through consensus

3. Red-Flagging

Detect and discard unreliable responses:

Check length (too short/long)
Validate format
Detect failure patterns
Domain-specific validation

When to Use MAKER

✅ Good Fit

Task has >10 sequential steps
Each step has enumerable options
State is trackable between steps
Progress is measurable
Intermediate states are verifiable
Single sophisticated approach struggles

❌ Poor Fit

Creative/open-ended generation
Requires holistic understanding
Continuous optimization
Tasks completing in <10 steps
Highly parallel tasks (order doesn't matter)

Task Types MAKER Excels At

Constraint Satisfaction: Sudoku, scheduling, resource allocation
Sequential Planning: Route planning, multi-step refactoring
Code Generation: Multi-file implementation, test generation
Mathematical Reasoning: Proof construction, equation solving
Data Pipelines: ETL workflows, data cleaning sequences

Implementation Steps

Step 1: Define Task Interface

Every MAKER task needs these components:

class YourTask:
    def get_current_state(self) -> State:
        """Return current task state."""
        pass

    def get_possible_actions(self) -> List[Action]:
        """Return valid actions from current state."""
        pass

    def apply_action(self, action: Action) -> bool:
        """Apply action and update state. Return success."""
        pass

    def is_complete(self) -> bool:
        """Check if task is finished."""
        pass

    def get_progress(self) -> float:
        """Return completion percentage (0.0 to 1.0)."""
        pass

    def format_for_agent(self) -> str:
        """Format state for LLM consumption (minimal context)."""
        pass

Step 2: Compute Voting Margin

def compute_k(num_steps: int) -> int:
    """Voting margin grows logarithmically."""
    if num_steps <= 10:
        return 2
    elif num_steps <= 100:
        return 3
    elif num_steps <= 1000:
        return 4
    else:
        return max(3, int(math.log(num_steps)) + 1)

Step 3: Create Minimal Agent Prompts

Key: Each agent sees ONLY what's needed for the current step.

You are solving {task_name}. This is step {step_num}/{expected_steps}.

Current state:
{minimal_state_representation}

What is the next action? Respond ONLY with the action in format: {expected_format}
Do not explain. Just give the action.

Step 4: Implement Voting

def vote_on_next_action(state, k=3, max_agents=50):
    votes = Counter()
    agents_sampled = 0

    while agents_sampled < max_agents:
        action = get_agent_vote(state)  # LiteLLM call

        if action and not should_red_flag(action):
            votes[action] += 1

            # Check for k-vote lead
            sorted_votes = votes.most_common()
            if sorted_votes:
                leader, leader_count = sorted_votes[0]
                second_count = sorted_votes[1][1] if len(sorted_votes) > 1 else 0

                if leader_count - second_count >= k:
                    return leader  # Consensus!

        agents_sampled += 1

    return votes.most_common(1)[0][0] if votes else None

Step 5: Configure Red-Flagging

def should_red_flag(response: str, context: dict) -> bool:
    # Length checks
    if len(response) > 200 or len(response) < 1:
        return True

    # Failure patterns
    if any(pattern in response.lower() for pattern in
           ["i cannot", "i don't know", "error", "invalid"]):
        return True

    # Format validation (task-specific)
    if not matches_expected_format(response):
        return True

    # Domain-specific checks
    return not domain_validator(response, context)

Step 6: Execute MAKER Loop

state = initialize_task()
k = compute_k(estimated_steps)

while not state.is_complete():
    # Vote on next action
    action = vote_on_next_action(state, k=k)

    if action is None:
        # No consensus - may need to backtrack or increase k
        handle_voting_failure()
        continue

    # Apply action
    success = state.apply_action(action)

    if not success:
        # Invalid action - this shouldn't happen with good voting
        handle_invalid_action()
        continue

# Verify final solution
verify_solution(state)

Adaptation Patterns

Pattern A: Constraint Satisfaction

Example: Solving Sudoku

class SudokuTask:
    def get_possible_actions(self):
        # Return valid numbers for next empty cell
        cell = self.next_empty_cell()
        return [num for num in range(1, 10)
                if self.is_valid(cell, num)]

    def format_for_agent(self):
        return f"""
Grid state: {self.grid}
Next cell to fill: {self.next_cell}
Valid options: {self.get_possible_actions()}
Constraints: Row/Column/Box must have 1-9 exactly once
"""

Agent Prompt:

You are solving Sudoku. This is step {step}/{81}.

Current grid:
{grid_visualization}

Which number should go in cell ({row}, {col})?
Valid options: {valid_numbers}

Respond ONLY with the number (1-9). No explanation.

Pattern B: Sequential Planning

Example: Multi-step code refactoring

class CodeRefactorTask:
    def get_possible_actions(self):
        return [
            "rename_function(old_name, new_name)",
            "extract_method(lines, new_name)",
            "move_to_module(function, target)",
            "update_imports()"
        ]

    def format_for_agent(self):
        return f"""
Current file: {self.current_file}
Function to refactor: {self.target_function}
Available refactorings: {self.get_possible_actions()}
Tests passing: {self.test_status}
"""

Agent Prompt:

You are refactoring {project_name}. This is step {step}.

Current situation:
- File: {filename}
- Function: {function_name}
- Issue: {code_smell}

What refactoring should be applied next?
Options:
{numbered_options}

Respond ONLY with the option number. No explanation.

Pattern C: Mathematical Reasoning

Example: Constructing a proof

class ProofTask:
    def get_possible_actions(self):
        # Return applicable inference rules
        return [rule for rule in self.inference_rules
                if rule.can_apply(self.current_statement)]

    def format_for_agent(self):
        return f"""
Current statement: {self.current}
Goal statement: {self.goal}
Available axioms: {self.axioms}
Available rules: {self.get_possible_actions()}
"""

Pattern D: Data Processing Pipeline

Example: ETL workflow

class ETLTask:
    def get_possible_actions(self):
        return [
            "remove_duplicates(column)",
            "fill_missing(column, strategy)",
            "normalize(column, method)",
            "merge_tables(table1, table2, key)"
        ]

    def format_for_agent(self):
        return f"""
Data shape: {self.df.shape}
Missing values: {self.missing_summary()}
Data quality score: {self.quality_score()}
Next transformation options: {self.get_possible_actions()}
"""

Red-Flagging by Task Type

For Code Generation

Check syntax validity
Ensure imports are defined
Verify function signatures match
Flag overly long responses (likely hallucination)

For Mathematical Reasoning

Verify notation consistency
Check logical structure
Flag undefined symbols
Ensure rule application is valid

For Planning Tasks

Verify preconditions are met
Check action is in allowed set
Flag circular dependencies
Ensure resources are available

For Constraint Satisfaction

Verify constraints not violated
Check value in domain
Flag contradictions
Ensure progress toward goal

Cost Analysis

MAKER is cost-effective when:

(cheap_model_cost × avg_votes × num_steps) < (expensive_model_cost × num_steps)

Key Insight: Even with 10-50 votes per step, cheap models (gpt-4o-mini) are often cheaper than one expensive model (gpt-4, o1).

Example:

GPT-4: $0.015/step
MAKER (gpt-4o-mini, avg 5 votes): $0.00015/step
100× cheaper!

Implementation Checklist

When applying MAKER to your task:

Define clear state representation
Enumerate possible actions per state
Create minimal agent prompts (only current step context)
Implement state validation
Configure red-flagging for your domain
Compute appropriate k based on task length
Set up progress tracking
Implement final solution verification
Estimate cost vs single-model approach
Test with small instances first

Debugging MAKER Implementations

Issue: Agents don't converge (no consensus)

Causes:

k too high for task complexity
Ambiguous state representation
Multiple valid solutions

Solutions:

Reduce k or use adaptive k
Add more context to agent prompts
Add tie-breaking rules

Issue: Agents converge to wrong answer

Causes:

Insufficient red-flagging
Misleading state representation
Correlated errors (agents make same mistake)

Solutions:

Tighten red-flagging criteria
Clarify prompt formatting
Increase temperature for diversity
Add validation after each step

Issue: Too slow / too expensive

Causes:

k too high
Too many agents per vote
Expensive model selected

Solutions:

Use cheaper model (gpt-4o-mini)
Reduce k if possible
Parallelize agent calls
Cache repeated states

Examples

Example 1: Solving Towers of Hanoi (4 disks)

from maker import MAKER, MAKERConfig
from towers_of_hanoi import GameState

# Configure
config = MAKERConfig(
    model="gpt-4o-mini",
    k=3,  # For 15 steps: k=3 is sufficient
    verbose=True
)

# Solve
maker = MAKER(config)
success, moves, stats = maker.solve_towers_of_hanoi(num_disks=4)

# Expected: 15 moves, zero errors

Example 2: Code Refactoring

class RefactorTask:
    def __init__(self, codebase, target_pattern):
        self.codebase = codebase
        self.target = target_pattern
        self.changes = []

    def get_possible_actions(self):
        # Find all instances needing refactoring
        instances = find_pattern(self.codebase, self.target)
        return [f"refactor_{i}" for i in instances]

config = MAKERConfig(
    model="gpt-4o-mini",
    k=compute_k(len(instances)),
    task_type="code_refactoring"
)

maker = MAKER(config, task=RefactorTask(codebase, pattern))
success, changes, stats = maker.solve()

Key Takeaways

Decompose maximally: Smallest possible steps
Minimize context: Each agent sees only current step
Vote for consensus: Prevents error propagation
Red-flag aggressively: Catch errors early
Scale logarithmically: k grows as Θ(ln s)
Use cheap models: They work better with voting!

Reference Implementation

See MAKER_GENERALIZATION.md for:

Universal task interface
Adaptation patterns for different domains
Detailed cost analysis
Real-world examples
Troubleshooting guide

Install Skill

SKILL.md

MAKER Methodology

Core Principles

1. Maximal Agentic Decomposition (MAD)

2. First-to-Ahead-by-k Voting

3. Red-Flagging

When to Use MAKER

✅ Good Fit

❌ Poor Fit

Task Types MAKER Excels At

Implementation Steps

Step 1: Define Task Interface

Step 2: Compute Voting Margin

Step 3: Create Minimal Agent Prompts

Step 4: Implement Voting

Step 5: Configure Red-Flagging

Step 6: Execute MAKER Loop

Adaptation Patterns

Pattern A: Constraint Satisfaction

Pattern B: Sequential Planning

Pattern C: Mathematical Reasoning

Pattern D: Data Processing Pipeline

Red-Flagging by Task Type

For Code Generation

For Mathematical Reasoning

For Planning Tasks

For Constraint Satisfaction

Cost Analysis

Implementation Checklist

Debugging MAKER Implementations

Issue: Agents don't converge (no consensus)

Issue: Agents converge to wrong answer

Issue: Too slow / too expensive

Examples

Example 1: Solving Towers of Hanoi (4 disks)

Example 2: Code Refactoring

Key Takeaways

Reference Implementation

Further Reading