| name | coordination |
| description | Advanced multi-agent coordination patterns including mesh, hierarchical, and adaptive topologies with state synchronization, consensus mechanisms, and distributed task execution. Use when orchestrating complex multi-agent workflows requiring sophisticated coordination, consensus building, or distributed decision-making. |
| version | 2.0.0 |
| tier | gold |
| tags | coordination, multi-agent, orchestration, topology, consensus, distributed |
| category | orchestration |
| agents | hierarchical-coordinator, mesh-coordinator, adaptive-coordinator, collective-intelligence-coordinator, consensus-validator |
| complexity | advanced |
| estimated_duration | 30-120 minutes |
| prerequisites | Claude Flow MCP installed, Understanding of distributed systems, Multi-agent architecture knowledge |
| outputs | Coordination topologies, Agent network configurations, Consensus protocols, Distributed execution plans |
| author | ruv |
Orchestration Skill Guidelines
When to Use This Skill
- Multi-agent coordination requiring topology-aware task distribution
- Cloud-based orchestration with Flow Nexus platform integration
- Event-driven workflows needing message-based coordination
- Distributed systems spanning multiple execution environments
- Scalable swarms with adaptive topology management
When NOT to Use This Skill
- Single-agent tasks with no coordination overhead
- Local-only execution not using cloud features
- Simple sequential work without event-driven needs
- Static topologies not requiring adaptive scaling
Success Criteria
- Coordination topology established (mesh/hierarchical/star)
- All agents registered in coordination namespace
- Event routing functional with <50ms message latency
- No coordination deadlocks - All agents progressing
- Scalability validated - Handles target agent count
Edge Cases to Handle
- Network partitions - Implement partition tolerance with eventual consistency
- Message loss - Add message acknowledgment and retry logic
- Agent disconnection - Detect disconnects, redistribute work
- Topology reconfiguration - Support live topology changes without restart
- Rate limiting - Handle cloud API rate limits with backoff
Guardrails (NEVER Violate)
- NEVER lose coordination state - Persist topology and agent registry
- ALWAYS validate topology - Check for cycles, orphaned nodes
- ALWAYS monitor message queues - Prevent queue overflow
- NEVER skip health checks - Continuous agent liveness monitoring
- ALWAYS handle failures gracefully - No cascading failures
Evidence-Based Validation
- Verify topology structure - Validate graph properties (connected, acyclic if needed)
- Check message delivery - Confirm all messages reached targets
- Measure coordination overhead - Calculate % time spent on coordination vs work
- Validate agent reachability - Ping all agents, verify responses
- Audit scalability - Test with max agent count, measure performance
Agent Coordination Skill
Overview
Advanced multi-agent coordination system implementing sophisticated topologies (mesh, hierarchical, ring, star), consensus mechanisms (Byzantine, Raft, gossip), and distributed task execution patterns. This skill enables orchestration of complex workflows requiring multiple agents to collaborate, synchronize state, reach consensus, and execute tasks in a coordinated manner.
When to Use This Skill
Activate this skill when:
- Complex Multi-Agent Workflows: Need to coordinate 3+ agents with interdependencies
- Distributed Decision-Making: Require consensus among multiple agents
- Sophisticated Topologies: Need mesh, hierarchical, or adaptive network structures
- State Synchronization: Agents must share and synchronize state
- Fault-Tolerant Execution: Require Byzantine fault tolerance or graceful degradation
- Dynamic Scaling: Agent count or topology changes during execution
- Performance Optimization: Need optimal agent allocation and load balancing
Core Coordination Patterns
1. Topology Patterns
Mesh Topology
When: All agents need to communicate directly with each other Characteristics:
- Peer-to-peer communication
- No single point of failure
- High redundancy
- Complex coordination overhead
- Best for: Small agent counts (3-8), consensus building
Script: resources/scripts/init-mesh-topology.js
Template: resources/templates/mesh-topology.yaml
Hierarchical Topology
When: Clear command structure with parent-child relationships Characteristics:
- Tree-like structure
- Clear authority hierarchy
- Efficient delegation
- Centralized coordination
- Best for: Large agent counts (8+), task delegation
Script: resources/scripts/init-hierarchical-topology.js
Template: resources/templates/hierarchical-topology.yaml
Adaptive Topology
When: Topology needs to evolve based on task requirements Characteristics:
- Dynamic reconfiguration
- Self-organizing structure
- Performance-driven optimization
- Complexity management
- Best for: Variable workloads, unpredictable requirements
Script: resources/scripts/init-adaptive-topology.js
Template: resources/templates/adaptive-topology.yaml
2. Consensus Mechanisms
Byzantine Consensus
When: Need fault tolerance against malicious or faulty agents Characteristics:
- 2f+1 agents required for f faults
- Cryptographic verification
- High security guarantees
- Moderate performance overhead
- Best for: Critical decisions, security-sensitive operations
Script: resources/scripts/byzantine-consensus.js
Raft Consensus
When: Need leader-based consensus with strong consistency Characteristics:
- Leader election
- Log replication
- Strong consistency
- Simple mental model
- Best for: State machine replication, distributed logs
Script: resources/scripts/raft-consensus.js
3. Execution Patterns
Parallel Execution
When: Tasks are independent and can run simultaneously Benefits: Maximum throughput, reduced latency Trade-offs: Higher resource usage, complex coordination
Sequential Execution
When: Tasks have strict dependencies Benefits: Simple coordination, deterministic order Trade-offs: Lower throughput, longer total time
Adaptive Execution
When: Mix of dependent and independent tasks Benefits: Balanced throughput and coordination Trade-offs: Complex orchestration logic
Implementation Workflow
Phase 1: Topology Selection & Initialization
1.1 Analyze Requirements
# Use topology selector script
node resources/scripts/topology-selector.js \
--agents 6 \
--task-type distributed \
--fault-tolerance high \
--output topology-recommendation.json
1.2 Initialize Topology
# Initialize selected topology (example: mesh)
node resources/scripts/init-mesh-topology.js \
--max-agents 6 \
--strategy balanced \
--config topology-config.yaml
Key Considerations:
- Agent count: <8 → mesh, >8 → hierarchical
- Fault tolerance: High → Byzantine, Medium → Raft, Low → simple consensus
- Communication overhead: Minimize for large swarms
- Task dependencies: Strong → hierarchical, Weak → mesh
Phase 2: Agent Spawning & Network Formation
2.1 Spawn Specialized Agents
# Spawn agents with role assignments
node resources/scripts/spawn-coordinated-agents.js \
--topology mesh \
--roles researcher,coder,analyst,optimizer \
--config agent-roles.yaml
2.2 Establish Communication Channels
- Mesh: All-to-all connections
- Hierarchical: Parent-child links
- Ring: Circular connections
- Star: Hub-and-spoke
2.3 Initialize Shared State
# Initialize distributed state store
npx claude-flow@alpha memory store \
--key "coordination/shared-state" \
--value '{
"topology": "mesh",
"agent_count": 6,
"consensus": "byzantine",
"session_id": "coord-2024-001"
}'
Phase 3: Task Orchestration
3.1 Task Decomposition Break complex tasks into agent-specific subtasks:
- Analyze dependencies between subtasks
- Identify parallel vs. sequential execution opportunities
- Assign subtasks to specialized agents
- Define success criteria for each subtask
3.2 Distributed Task Assignment
# Orchestrate task across agents
node resources/scripts/orchestrate-distributed-task.js \
--task "Full-stack feature development" \
--strategy adaptive \
--priority high \
--agents researcher,coder,tester,reviewer
3.3 Execution Monitoring
# Monitor distributed execution
npx claude-flow@alpha swarm_monitor \
--duration 60 \
--interval 5 \
--output execution-metrics.json
Phase 4: Consensus & Decision Making
4.1 Initiate Consensus Protocol For critical decisions requiring agreement among agents:
# Run Byzantine consensus
node resources/scripts/byzantine-consensus.js \
--proposal "Deploy to production" \
--agents agent-1,agent-2,agent-3,agent-4,agent-5 \
--threshold 0.67 \
--timeout 30
4.2 Validate Consensus
- Verify quorum reached (e.g., 2f+1 for Byzantine)
- Check signature validity
- Confirm majority agreement
- Handle dissenting opinions
4.3 Execute Consensus Decision Once consensus achieved, coordinate execution across agents.
Phase 5: State Synchronization
5.1 Continuous State Sync
# Synchronize state across agents
node resources/scripts/sync-agent-state.js \
--topology mesh \
--interval 10 \
--conflict-resolution last-write-wins
5.2 Conflict Resolution
- Last-Write-Wins: Simple, eventual consistency
- Vector Clocks: Causal consistency
- CRDT: Conflict-free replicated data types
- Consensus-Based: Strong consistency, higher latency
Phase 6: Performance Optimization
6.1 Analyze Bottlenecks
# Identify coordination bottlenecks
npx claude-flow@alpha agent_metrics \
--metric all \
--output performance-analysis.json
6.2 Optimize Topology
- Rebalance agent load
- Adjust communication patterns
- Scale agent count
- Switch topology if needed
6.3 Measure Improvements Track key metrics:
- Task completion time
- Agent utilization
- Communication overhead
- Consensus latency
- Error rates
Advanced Features
Dynamic Reconfiguration
Adapt topology during execution:
# Reconfigure topology on-the-fly
node resources/scripts/reconfigure-topology.js \
--from mesh \
--to hierarchical \
--preserve-state true \
--migration-strategy gradual
Fault Detection & Recovery
Implement automatic fault handling:
# Enable fault detection
node resources/scripts/enable-fault-detection.js \
--heartbeat-interval 5 \
--failure-threshold 3 \
--recovery-strategy spawn-replacement
Load Balancing
Distribute work evenly:
# Enable dynamic load balancing
node resources/scripts/load-balancer.js \
--strategy adaptive \
--rebalance-interval 30 \
--target-utilization 0.75
Success Criteria
- Appropriate topology selected and initialized
- Agents spawned with correct role assignments
- Communication channels established
- Tasks distributed and executed
- Consensus reached on critical decisions
- State synchronized across agents
- Performance metrics within targets
- Fault tolerance validated
Performance Targets
- Agent Spawning: <5 seconds per agent
- Task Assignment: <2 seconds
- Consensus Latency: <10 seconds (Byzantine), <5 seconds (Raft)
- State Sync Frequency: Every 5-10 seconds
- Agent Utilization: 70-85%
- Task Success Rate: >95%
Best Practices
- Topology Selection: Match topology to task characteristics
- Consensus Usage: Only for critical decisions (overhead cost)
- State Management: Use shared memory for coordination state
- Error Handling: Implement retry logic and graceful degradation
- Monitoring: Continuous performance tracking
- Scaling: Add agents incrementally, test at each scale
- Testing: Validate with chaos engineering (agent failures, network partitions)
- Documentation: Record topology decisions and rationale
Common Patterns & Solutions
Pattern: Research → Implementation → Validation
Topology: Hierarchical (coordinator → specialist agents) Execution: Sequential with parallel validation Consensus: Simple majority vote on validation
Pattern: Distributed Code Review
Topology: Mesh (peer review among 4-6 agents) Execution: Parallel analysis, consensus on approval Consensus: Byzantine (2f+1 security checks)
Pattern: Multi-Model Inference
Topology: Star (hub routes requests to model agents) Execution: Parallel inference, ensemble aggregation Consensus: Weighted voting based on confidence
Troubleshooting
Issue: Consensus Timeout
Symptoms: Consensus protocol fails to reach agreement Solutions:
- Increase timeout threshold
- Reduce agent count
- Check network connectivity
- Verify agent health
Issue: State Desynchronization
Symptoms: Agents have conflicting state Solutions:
- Increase sync frequency
- Use stronger consistency (CRDT, consensus)
- Implement conflict resolution
- Reset shared state
Issue: Performance Degradation
Symptoms: Slow task completion, high latency Solutions:
- Analyze bottlenecks (use metrics)
- Optimize topology (fewer connections)
- Reduce consensus overhead
- Scale horizontally (more agents)
Integration Points
- when-chaining-workflows-use-cascade-orchestrator: For workflow cascades
- when-coordinating-collective-intelligence-use-hive-mind: For hierarchical queen-led coordination
- swarm-orchestration: For basic swarm operations
- performance-analysis: For bottleneck detection
Resources
Scripts (resources/scripts/)
topology-selector.js- Recommend optimal topologyinit-mesh-topology.js- Initialize mesh coordinationinit-hierarchical-topology.js- Initialize hierarchical coordinationinit-adaptive-topology.js- Initialize adaptive coordinationspawn-coordinated-agents.js- Spawn agents with rolesorchestrate-distributed-task.js- Distribute tasksbyzantine-consensus.js- Run Byzantine consensusraft-consensus.js- Run Raft consensussync-agent-state.js- Synchronize agent statereconfigure-topology.js- Dynamic reconfigurationenable-fault-detection.js- Fault detection & recoveryload-balancer.js- Dynamic load balancing
Templates (resources/templates/)
mesh-topology.yaml- Mesh coordination confighierarchical-topology.yaml- Hierarchical coordination configadaptive-topology.yaml- Adaptive coordination configagent-roles.yaml- Agent role definitions
Tests (tests/)
test-mesh-topology.js- Mesh coordination teststest-hierarchical-topology.js- Hierarchical coordination teststest-consensus-mechanisms.js- Consensus protocol tests
References
- Distributed Systems: Concepts and Design (Coulouris et al.)
- Byzantine Fault Tolerance in Practice (Castro & Liskov)
- Raft Consensus Algorithm (Ongaro & Ousterhout)
- Multi-Agent Systems: A Modern Approach (Wooldridge)
Core Principles
1. CAP Theorem Applies to Agent Coordination
In distributed agent systems, you cannot simultaneously guarantee Consistency (all agents see the same state), Availability (every request receives a response), and Partition tolerance (system functions despite network splits). Choose based on requirements: Strong consistency (Raft consensus, higher latency) for financial transactions. Eventual consistency (gossip protocols, lower latency) for analytics dashboards. Partition tolerance is non-negotiable in distributed systems - network failures are inevitable.
2. Topology Encodes Coordination Strategy
Topology is not cosmetic - it defines communication patterns and failure modes. Mesh topology (all-to-all) enables Byzantine fault tolerance but requires O(N^2) messages. Hierarchical topology (tree structure) enables efficient delegation but creates single points of failure at coordinator nodes. Star topology (hub-and-spoke) centralizes coordination but bottlenecks at the hub. Ring topology (circular chain) enables pipeline processing but suffers from head-of-line blocking. Select topology to match your coordination requirements, then accept the inherent tradeoffs.
3. State Synchronization Frequency is a Tradeoff
High-frequency sync (every 1-2 seconds) provides near-real-time consistency but generates massive coordination traffic (hundreds of messages per minute in large swarms). Low-frequency sync (every 30-60 seconds) reduces traffic but increases divergence windows where agents operate on stale state. Optimal frequency depends on task characteristics: Real-time collaboration (high frequency), batch processing (low frequency), mixed workloads (adaptive frequency based on phase).
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Synchronous Consensus for Every Decision | Agent A blocks 10 seconds waiting for consensus on variable naming. Agent B blocks 10 seconds on file organization. 80% of execution time spent in consensus protocols for non-critical decisions. | Reserve consensus for critical decisions only (deployment, architecture, security). Use leader-decides pattern (coordinator makes unilateral decision) for routine choices. Async notification (inform agents of decision, don't block) for non-blocking updates. |
| Ignoring Network Partitions | Agent 5 disconnects due to network split. Coordination assumes all agents reachable. System deadlocks waiting for Agent 5's consensus vote that never arrives. | Implement partition detection (heartbeat + timeout), partition handling (quorum with majority), and partition recovery (state reconciliation when agent reconnects). Partitions are inevitable - design for them, don't ignore them. |
| Static Topology for Dynamic Workloads | Initialize 10-agent hierarchical swarm for research task (needs peer collaboration). Coordinator becomes bottleneck as agents attempt peer-to-peer discussions. Workflow stalls on coordination overhead. | Dynamic topology reconfiguration: Start with topology matching initial phase (hierarchical for planning), reconfigure mid-execution when requirements change (mesh for brainstorming), restore original topology for later phases (hierarchical for integration). |
Enhanced Conclusion
Agent coordination is distributed systems engineering applied to AI workflows. The patterns documented here (mesh, hierarchical, star, ring topologies; Byzantine, Raft, gossip consensus; state synchronization protocols) are not novel inventions - they are proven distributed systems techniques adapted for multi-agent orchestration.
The fundamental insight: coordination is overhead, not value. Every consensus round, state sync, and topology reconfiguration consumes time that could be spent on productive work. Effective coordination minimizes this overhead while preserving correctness guarantees. This means selecting appropriate topology (mesh for small collaborative swarms <8 agents, hierarchical for large coordinated swarms >8 agents), using consensus sparingly (only for critical decisions), and synchronizing state at task-appropriate frequencies (high for real-time collaboration, low for batch processing).
The workflow documented here (topology selection, agent spawning, task orchestration, consensus decision-making, state synchronization, performance optimization) provides a systematic approach to distributed agent coordination. Follow this workflow for complex multi-agent tasks (6+ agents, sophisticated dependencies, fault tolerance requirements). For simpler scenarios (2-5 agents, loose coupling, no consensus requirements), direct agent spawning without formal coordination infrastructure is sufficient.
Coordination patterns scale agent capabilities from isolated execution to distributed collaboration. The cost is coordination overhead. The benefit is parallelized execution across specialized agents. The art is balancing these tradeoffs to match task requirements.
Common Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Synchronous Consensus for Every Decision | Agent A blocks 10 seconds waiting for consensus on variable naming. Agent B blocks 10 seconds on file organization. 80% of execution time spent in consensus protocols for non-critical decisions. | Reserve consensus for critical decisions only (deployment, architecture, security). Use leader-decides pattern (coordinator makes unilateral decision) for routine choices. Async notification (inform agents of decision, don't block) for non-blocking updates. |
| Ignoring Network Partitions | Agent 5 disconnects due to network split. Coordination assumes all agents reachable. System deadlocks waiting for Agent 5's consensus vote that never arrives. | Implement partition detection (heartbeat + timeout), partition handling (quorum with majority), and partition recovery (state reconciliation when agent reconnects). Partitions are inevitable - design for them, don't ignore them. |
| Static Topology for Dynamic Workloads | Initialize 10-agent hierarchical swarm for research task (needs peer collaboration). Coordinator becomes bottleneck as agents attempt peer-to-peer discussions. Workflow stalls on coordination overhead. | Dynamic topology reconfiguration: Start with topology matching initial phase (hierarchical for planning), reconfigure mid-execution when requirements change (mesh for brainstorming), restore original topology for later phases (hierarchical for integration). |
Conclusion
Advanced multi-agent coordination provides sophisticated topology patterns, consensus mechanisms, and distributed task execution for complex multi-agent workflows requiring synchronized collaboration.
Key takeaways:
- Topology selection determines communication patterns and failure modes - mesh for collaboration (3-8 agents), hierarchical for delegation (8+ agents), star for centralized coordination, ring for pipeline processing
- Consensus mechanisms are expensive - reserve Byzantine/Raft consensus for critical decisions only, use eventual consistency for routine coordination
- State synchronization frequency trades consistency for overhead - high frequency for real-time collaboration, low frequency for batch processing
- Coordination is overhead, not value - minimize through appropriate topology, sparse consensus, and task-appropriate sync frequency
- CAP theorem applies - choose between consistency, availability, and partition tolerance based on requirements
Use this skill when orchestrating complex multi-agent workflows (6+ agents) requiring sophisticated coordination, consensus building, fault tolerance, or distributed decision-making across mesh, hierarchical, or adaptive topologies. Avoid for simple 2-5 agent tasks with loose coupling where direct spawning without formal coordination infrastructure is more efficient.