| name | maker-methodology |
| description | Apply MAKER (Massively Decomposed Agentic Processes) to solve long sequential tasks using task decomposition, multi-agent voting, and error correction. Use when facing complex multi-step problems, sequential planning, constraint satisfaction, or tasks requiring many consecutive decisions. |
| allowed-tools | Read, Write, Edit, Bash, Grep, Glob |
MAKER Methodology
Solve million-step tasks with zero errors using Massively Decomposed Agentic Processes.
Based on: "Solving a Million-Step LLM Task with Zero Errors"
Core Principles
1. Maximal Agentic Decomposition (MAD)
Break complex tasks into minimal single-step subtasks, not monolithic solutions.
Instead of: "Generate entire solution" Do: "Determine next single step" repeated N times
2. First-to-Ahead-by-k Voting
Use multiple independent agents voting on each step:
- Continue sampling until one option leads by k votes
- k grows logarithmically with task complexity: Θ(ln s)
- Prevents error propagation through consensus
3. Red-Flagging
Detect and discard unreliable responses:
- Check length (too short/long)
- Validate format
- Detect failure patterns
- Domain-specific validation
When to Use MAKER
✅ Good Fit
- Task has >10 sequential steps
- Each step has enumerable options
- State is trackable between steps
- Progress is measurable
- Intermediate states are verifiable
- Single sophisticated approach struggles
❌ Poor Fit
- Creative/open-ended generation
- Requires holistic understanding
- Continuous optimization
- Tasks completing in <10 steps
- Highly parallel tasks (order doesn't matter)
Task Types MAKER Excels At
- Constraint Satisfaction: Sudoku, scheduling, resource allocation
- Sequential Planning: Route planning, multi-step refactoring
- Code Generation: Multi-file implementation, test generation
- Mathematical Reasoning: Proof construction, equation solving
- Data Pipelines: ETL workflows, data cleaning sequences
Implementation Steps
Step 1: Define Task Interface
Every MAKER task needs these components:
class YourTask:
def get_current_state(self) -> State:
"""Return current task state."""
pass
def get_possible_actions(self) -> List[Action]:
"""Return valid actions from current state."""
pass
def apply_action(self, action: Action) -> bool:
"""Apply action and update state. Return success."""
pass
def is_complete(self) -> bool:
"""Check if task is finished."""
pass
def get_progress(self) -> float:
"""Return completion percentage (0.0 to 1.0)."""
pass
def format_for_agent(self) -> str:
"""Format state for LLM consumption (minimal context)."""
pass
Step 2: Compute Voting Margin
def compute_k(num_steps: int) -> int:
"""Voting margin grows logarithmically."""
if num_steps <= 10:
return 2
elif num_steps <= 100:
return 3
elif num_steps <= 1000:
return 4
else:
return max(3, int(math.log(num_steps)) + 1)
Step 3: Create Minimal Agent Prompts
Key: Each agent sees ONLY what's needed for the current step.
You are solving {task_name}. This is step {step_num}/{expected_steps}.
Current state:
{minimal_state_representation}
What is the next action? Respond ONLY with the action in format: {expected_format}
Do not explain. Just give the action.
Step 4: Implement Voting
def vote_on_next_action(state, k=3, max_agents=50):
votes = Counter()
agents_sampled = 0
while agents_sampled < max_agents:
action = get_agent_vote(state) # LiteLLM call
if action and not should_red_flag(action):
votes[action] += 1
# Check for k-vote lead
sorted_votes = votes.most_common()
if sorted_votes:
leader, leader_count = sorted_votes[0]
second_count = sorted_votes[1][1] if len(sorted_votes) > 1 else 0
if leader_count - second_count >= k:
return leader # Consensus!
agents_sampled += 1
return votes.most_common(1)[0][0] if votes else None
Step 5: Configure Red-Flagging
def should_red_flag(response: str, context: dict) -> bool:
# Length checks
if len(response) > 200 or len(response) < 1:
return True
# Failure patterns
if any(pattern in response.lower() for pattern in
["i cannot", "i don't know", "error", "invalid"]):
return True
# Format validation (task-specific)
if not matches_expected_format(response):
return True
# Domain-specific checks
return not domain_validator(response, context)
Step 6: Execute MAKER Loop
state = initialize_task()
k = compute_k(estimated_steps)
while not state.is_complete():
# Vote on next action
action = vote_on_next_action(state, k=k)
if action is None:
# No consensus - may need to backtrack or increase k
handle_voting_failure()
continue
# Apply action
success = state.apply_action(action)
if not success:
# Invalid action - this shouldn't happen with good voting
handle_invalid_action()
continue
# Verify final solution
verify_solution(state)
Adaptation Patterns
Pattern A: Constraint Satisfaction
Example: Solving Sudoku
class SudokuTask:
def get_possible_actions(self):
# Return valid numbers for next empty cell
cell = self.next_empty_cell()
return [num for num in range(1, 10)
if self.is_valid(cell, num)]
def format_for_agent(self):
return f"""
Grid state: {self.grid}
Next cell to fill: {self.next_cell}
Valid options: {self.get_possible_actions()}
Constraints: Row/Column/Box must have 1-9 exactly once
"""
Agent Prompt:
You are solving Sudoku. This is step {step}/{81}.
Current grid:
{grid_visualization}
Which number should go in cell ({row}, {col})?
Valid options: {valid_numbers}
Respond ONLY with the number (1-9). No explanation.
Pattern B: Sequential Planning
Example: Multi-step code refactoring
class CodeRefactorTask:
def get_possible_actions(self):
return [
"rename_function(old_name, new_name)",
"extract_method(lines, new_name)",
"move_to_module(function, target)",
"update_imports()"
]
def format_for_agent(self):
return f"""
Current file: {self.current_file}
Function to refactor: {self.target_function}
Available refactorings: {self.get_possible_actions()}
Tests passing: {self.test_status}
"""
Agent Prompt:
You are refactoring {project_name}. This is step {step}.
Current situation:
- File: {filename}
- Function: {function_name}
- Issue: {code_smell}
What refactoring should be applied next?
Options:
{numbered_options}
Respond ONLY with the option number. No explanation.
Pattern C: Mathematical Reasoning
Example: Constructing a proof
class ProofTask:
def get_possible_actions(self):
# Return applicable inference rules
return [rule for rule in self.inference_rules
if rule.can_apply(self.current_statement)]
def format_for_agent(self):
return f"""
Current statement: {self.current}
Goal statement: {self.goal}
Available axioms: {self.axioms}
Available rules: {self.get_possible_actions()}
"""
Pattern D: Data Processing Pipeline
Example: ETL workflow
class ETLTask:
def get_possible_actions(self):
return [
"remove_duplicates(column)",
"fill_missing(column, strategy)",
"normalize(column, method)",
"merge_tables(table1, table2, key)"
]
def format_for_agent(self):
return f"""
Data shape: {self.df.shape}
Missing values: {self.missing_summary()}
Data quality score: {self.quality_score()}
Next transformation options: {self.get_possible_actions()}
"""
Red-Flagging by Task Type
For Code Generation
- Check syntax validity
- Ensure imports are defined
- Verify function signatures match
- Flag overly long responses (likely hallucination)
For Mathematical Reasoning
- Verify notation consistency
- Check logical structure
- Flag undefined symbols
- Ensure rule application is valid
For Planning Tasks
- Verify preconditions are met
- Check action is in allowed set
- Flag circular dependencies
- Ensure resources are available
For Constraint Satisfaction
- Verify constraints not violated
- Check value in domain
- Flag contradictions
- Ensure progress toward goal
Cost Analysis
MAKER is cost-effective when:
(cheap_model_cost × avg_votes × num_steps) < (expensive_model_cost × num_steps)
Key Insight: Even with 10-50 votes per step, cheap models (gpt-4o-mini) are often cheaper than one expensive model (gpt-4, o1).
Example:
- GPT-4: $0.015/step
- MAKER (gpt-4o-mini, avg 5 votes): $0.00015/step
- 100× cheaper!
Implementation Checklist
When applying MAKER to your task:
- Define clear state representation
- Enumerate possible actions per state
- Create minimal agent prompts (only current step context)
- Implement state validation
- Configure red-flagging for your domain
- Compute appropriate k based on task length
- Set up progress tracking
- Implement final solution verification
- Estimate cost vs single-model approach
- Test with small instances first
Debugging MAKER Implementations
Issue: Agents don't converge (no consensus)
Causes:
- k too high for task complexity
- Ambiguous state representation
- Multiple valid solutions
Solutions:
- Reduce k or use adaptive k
- Add more context to agent prompts
- Add tie-breaking rules
Issue: Agents converge to wrong answer
Causes:
- Insufficient red-flagging
- Misleading state representation
- Correlated errors (agents make same mistake)
Solutions:
- Tighten red-flagging criteria
- Clarify prompt formatting
- Increase temperature for diversity
- Add validation after each step
Issue: Too slow / too expensive
Causes:
- k too high
- Too many agents per vote
- Expensive model selected
Solutions:
- Use cheaper model (gpt-4o-mini)
- Reduce k if possible
- Parallelize agent calls
- Cache repeated states
Examples
Example 1: Solving Towers of Hanoi (4 disks)
from maker import MAKER, MAKERConfig
from towers_of_hanoi import GameState
# Configure
config = MAKERConfig(
model="gpt-4o-mini",
k=3, # For 15 steps: k=3 is sufficient
verbose=True
)
# Solve
maker = MAKER(config)
success, moves, stats = maker.solve_towers_of_hanoi(num_disks=4)
# Expected: 15 moves, zero errors
Example 2: Code Refactoring
class RefactorTask:
def __init__(self, codebase, target_pattern):
self.codebase = codebase
self.target = target_pattern
self.changes = []
def get_possible_actions(self):
# Find all instances needing refactoring
instances = find_pattern(self.codebase, self.target)
return [f"refactor_{i}" for i in instances]
config = MAKERConfig(
model="gpt-4o-mini",
k=compute_k(len(instances)),
task_type="code_refactoring"
)
maker = MAKER(config, task=RefactorTask(codebase, pattern))
success, changes, stats = maker.solve()
Key Takeaways
- Decompose maximally: Smallest possible steps
- Minimize context: Each agent sees only current step
- Vote for consensus: Prevents error propagation
- Red-flag aggressively: Catch errors early
- Scale logarithmically: k grows as Θ(ln s)
- Use cheap models: They work better with voting!
Reference Implementation
See MAKER_GENERALIZATION.md for:
- Universal task interface
- Adaptation patterns for different domains
- Detailed cost analysis
- Real-world examples
- Troubleshooting guide
Further Reading
- Paper: https://arxiv.org/html/2511.09030v1
- Implementation: See
maker.pyfor working code - Examples: See
test_maker.pyfor different scenarios