| name | multi-agent-coordination-framework |
| description | Advanced multi-agent coordination for managing AI agent pods and human teams. Includes agent architectures, communication patterns (pub/sub, message queues), task distribution, consensus mechanisms, conflict resolution, agent specialization, collaborative problem-solving, shared state management, lifecycle management, and multi-agent observability. Supports LangGraph, AutoGen, CrewAI, and distributed systems patterns. |
| allowed-tools | Read, Write, Edit, Bash, Glob, Grep, WebFetch |
Multi-Agent Coordination Framework
Purpose
Managing multiple AI agents or human teams requires sophisticated coordination mechanisms. This Skill provides comprehensive capabilities for:
- Multi-Agent System Architectures - Hub-spoke, peer-to-peer, hierarchical coordination
- Agent Communication Patterns - Pub/sub, message queues, direct messaging, broadcast
- Task Distribution Algorithms - Load balancing, capability-based routing, priority queues
- Consensus and Voting Mechanisms - Agreement protocols, quorum-based decisions
- Conflict Resolution - Handling disagreements, resource contention, priority conflicts
- Agent Specialization and Routing - Role-based agents, skill matching, dynamic routing
- Collaborative Problem-Solving - Multi-agent reasoning, distributed search, collective intelligence
- Shared State Management - Distributed state, CRDTs, event sourcing
- Agent Lifecycle Management - Registration, health checks, scaling, retirement
- Multi-Agent Debugging and Observability - Distributed tracing, agent metrics, visualization
When to Use This Skill
Use this skill when you need to:
- Build multi-agent AI systems with specialized agents
- Coordinate human teams with AI assistance
- Implement distributed problem-solving requiring multiple perspectives
- Design complex workflows requiring agent collaboration
- Create agent swarms for parallel processing
- Implement human-in-the-loop AI systems
- Orchestrate multi-model systems (GPT-4, Claude, local models)
- Build competitive agent systems (agents voting/competing)
- Design hierarchical agent organizations
- Create agent mesh networks for resilience
- Implement collaborative code review by multiple agents
- Build ensemble AI systems for improved accuracy
Quick Start
1. Choose Your Architecture
Start by selecting the right architecture for your use case:
Hub-Spoke (Centralized) - Simple coordinator routes tasks to specialized agents
- Use when: Single point of coordination is acceptable, simple debugging needed
- Example: Supervisor agent coordinating code review by specialized agents
Peer-to-Peer (Distributed) - Agents communicate directly without central coordinator
- Use when: High availability needed, no single point of failure tolerated
- Example: Agent mesh for distributed data processing
Hierarchical (Tree) - Multi-level supervision with delegation
- Use when: Complex workflows, need clear responsibility hierarchy
- Example: Engineering organization simulation with managers and workers
Mesh (Fully Connected) - All agents can communicate with all others
- Use when: Maximum resilience required, communication overhead acceptable
- Example: Consensus-based decision systems
See REFERENCE.md for detailed architecture diagrams.
2. Select Your Framework
Choose the framework that matches your needs:
| Framework | Best For | Complexity | Key Strength |
|---|---|---|---|
| LangGraph | Complex workflows, state management | Medium | Graph-based coordination |
| AutoGen | Conversations, human-in-loop | Low | Easy multi-agent chat |
| CrewAI | Role-based teams | Low | Task delegation |
| Custom | Full control, unique requirements | High | Maximum flexibility |
See KNOWLEDGE.md for detailed comparison.
3. Implement Your First Multi-Agent System
Example: Simple supervisor pattern with LangGraph
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
next_agent: str
# Create specialized agents
def researcher(state):
# Research logic
return {"messages": ["Research complete"], "next_agent": "writer"}
def writer(state):
# Writing logic
return {"messages": ["Report written"], "next_agent": "FINISH"}
# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)
workflow.set_entry_point("researcher")
app = workflow.compile()
result = app.invoke({"messages": [], "next_agent": "researcher"})
See EXAMPLES.md for complete working examples.
4. Add Consensus Mechanism (Optional)
For critical decisions, implement voting:
from multi_agent_coordination import ConsensusEngine, Vote, VoteType
consensus = ConsensusEngine(agents=["agent_1", "agent_2", "agent_3"])
votes = [
Vote("agent_1", VoteType.YES, 0.9, "High confidence"),
Vote("agent_2", VoteType.YES, 0.8, "Agree"),
Vote("agent_3", VoteType.NO, 0.7, "Concerns exist"),
]
result = consensus.simple_majority(votes)
# Returns: {"result": "PASS", "yes": 2, "no": 1, "percentage": 66.7}
See PATTERNS.md for full consensus patterns.
Implementation Patterns
This skill provides 6 battle-tested patterns:
Pattern 1: LangGraph Multi-Agent with Supervisor
When to use: Complex workflows with state persistence and conditional routing Complexity: Medium Key features: Graph-based coordination, state management, cycle prevention
View Pattern Details | View Code Example
Pattern 2: AutoGen Multi-Agent Conversation
When to use: Conversational agents, human-in-the-loop, group chat scenarios Complexity: Low Key features: Natural conversation flow, easy human interaction, code execution
View Pattern Details | View Code Example
Pattern 3: CrewAI Role-Based Teams
When to use: Clear role assignments, task delegation, sequential workflows Complexity: Low Key features: Role specialization, task dependencies, built-in tools
View Pattern Details | View Code Example
Pattern 4: Consensus and Voting Mechanisms
When to use: Critical decisions, multiple agent perspectives, conflict resolution Complexity: Medium Key features: Multiple voting types, weighted decisions, quorum support
View Pattern Details | View Code Example
Pattern 5: Shared State with Event Sourcing
When to use: Distributed state, audit trail needed, state replay required Complexity: High Key features: Immutable events, state reconstruction, time-travel debugging
View Pattern Details | View Code Example
Pattern 6: Agent Lifecycle Management
When to use: Dynamic agent pools, health monitoring, auto-scaling needed Complexity: High Key features: Health checks, registration, auto-scaling, metrics
View Pattern Details | View Code Example
Top Gotchas
1. Coordination Overhead
Problem: Too much agent communication slows everything down Solution: Batch communications, use async patterns, minimize chatter Detection: Monitor message count and latency between agents
2. Agent Deadlock
Problem: Agents waiting for each other in circular dependency Solution: Timeout on all waits, detect cycles, use coordinator to break deadlocks Detection: Trace agent state transitions, look for circular waits
3. State Inconsistency
Problem: Agents have different views of shared state Solution: Event sourcing, CRDTs, eventual consistency, versioning Detection: Compare agent state snapshots, look for divergence
View All 10 Gotchas - Detailed troubleshooting guide
Communication Patterns
Synchronous (Request-Response)
Direct agent-to-agent communication with immediate response.
Agent A ──request──► Agent B
Agent A ◄─response── Agent B
Asynchronous (Message Queue)
Fire-and-forget messaging via queue.
Agent A ──msg──► Queue ──msg──► Agent B
Pub/Sub (Broadcast)
One-to-many broadcasting to subscribers.
Publisher ──event──► Topic ──┬──► Subscriber 1
├──► Subscriber 2
└──► Subscriber 3
See REFERENCE.md for detailed patterns.
Best Practices
DO's
- Start Simple - Begin with single agent, add multi-agent only when needed
- Clear Contracts - Define explicit communication protocols between agents
- Timeout Everything - All agent interactions should have timeouts
- Monitor Conversations - Log all agent-to-agent communications
- Use Voting - For critical decisions, use consensus mechanisms
- Specialize Agents - Each agent should have clear, focused responsibility
- Handle Failures - Expect agents to fail, implement graceful degradation
- Version Protocols - Use versioned message formats for compatibility
- Test in Isolation - Test each agent independently before integration
- Implement Observability - Trace multi-agent interactions for debugging
DON'Ts
- Don't Create Agent Explosion - Resist urge to create too many agents
- Don't Share Mutable State - Use message passing, not shared memory
- Don't Ignore Deadlocks - Test for circular dependencies
- Don't Skip Health Checks - Monitor agent health continuously
- Don't Hardcode Routing - Use dynamic agent discovery and routing
- Don't Trust All Agents - Validate agent responses, especially in open systems
- Don't Forget Cleanup - Properly shutdown and cleanup agent resources
- Don't Over-Engineer - Simple coordination often beats complex protocols
Documentation Structure
This skill uses progressive disclosure - start here and drill down as needed:
- KNOWLEDGE.md - Framework deep-dives, theory, research, protocols
- PATTERNS.md - Implementation pattern details, architecture guidance
- EXAMPLES.md - Complete, runnable code examples for all patterns
- GOTCHAS.md - All 10 common pitfalls with detailed solutions
- REFERENCE.md - Architecture diagrams, API reference, feature matrix
Production Deployment Checklist
Before deploying multi-agent systems:
- Define agent roles and responsibilities
- Design communication protocols (sync vs async)
- Implement agent registration and discovery
- Set up health monitoring and alerts
- Configure timeouts for all operations
- Implement consensus mechanisms for critical decisions
- Set up distributed tracing (trace ID propagation)
- Add circuit breakers for agent-to-agent calls
- Implement agent authentication/authorization
- Configure resource limits (CPU, memory, concurrency)
- Set up dead letter queues for failed messages
- Implement agent versioning and compatibility
- Add metrics and dashboards for agent performance
- Test failure scenarios (agent crashes, network partition)
- Document agent handoff protocols
- Implement graceful shutdown procedures
Related Skills
orchestration-coordination-framework- General orchestration patternsevaluation-reporting-framework- Evaluating multi-agent performancemcp-integration-toolkit- Agent communication via MCPai-evaluation-suite- Testing agent interactionsarchitecture-evaluation-framework- System architecture assessment
Quick Reference
Key Concepts
- Agent: Autonomous entity that can perceive, decide, and act
- Coordinator: Agent that orchestrates other agents
- Consensus: Agreement mechanism among multiple agents
- State: Shared or distributed data accessed by agents
- Handoff: Transfer of control from one agent to another
Framework URLs
Common Tasks
- Create supervisor agent: See Example 1
- Implement voting: See Example 4
- Manage shared state: See Example 5
- Auto-scale agents: See Example 6
- Debug agent interactions: See GOTCHAS.md