name	multi-agent-coordination-framework
description	Advanced multi-agent coordination for managing AI agent pods and human teams. Includes agent architectures, communication patterns (pub/sub, message queues), task distribution, consensus mechanisms, conflict resolution, agent specialization, collaborative problem-solving, shared state management, lifecycle management, and multi-agent observability. Supports LangGraph, AutoGen, CrewAI, and distributed systems patterns.
allowed-tools	Read, Write, Edit, Bash, Glob, Grep, WebFetch

Multi-Agent Coordination Framework

Purpose

Managing multiple AI agents or human teams requires sophisticated coordination mechanisms. This Skill provides comprehensive capabilities for:

Multi-Agent System Architectures - Hub-spoke, peer-to-peer, hierarchical coordination
Agent Communication Patterns - Pub/sub, message queues, direct messaging, broadcast
Task Distribution Algorithms - Load balancing, capability-based routing, priority queues
Consensus and Voting Mechanisms - Agreement protocols, quorum-based decisions
Conflict Resolution - Handling disagreements, resource contention, priority conflicts
Agent Specialization and Routing - Role-based agents, skill matching, dynamic routing
Collaborative Problem-Solving - Multi-agent reasoning, distributed search, collective intelligence
Shared State Management - Distributed state, CRDTs, event sourcing
Agent Lifecycle Management - Registration, health checks, scaling, retirement
Multi-Agent Debugging and Observability - Distributed tracing, agent metrics, visualization

When to Use This Skill

Use this skill when you need to:

Build multi-agent AI systems with specialized agents
Coordinate human teams with AI assistance
Implement distributed problem-solving requiring multiple perspectives
Design complex workflows requiring agent collaboration
Create agent swarms for parallel processing
Implement human-in-the-loop AI systems
Orchestrate multi-model systems (GPT-4, Claude, local models)
Build competitive agent systems (agents voting/competing)
Design hierarchical agent organizations
Create agent mesh networks for resilience
Implement collaborative code review by multiple agents
Build ensemble AI systems for improved accuracy

Quick Start

1. Choose Your Architecture

Start by selecting the right architecture for your use case:

Hub-Spoke (Centralized) - Simple coordinator routes tasks to specialized agents
- Use when: Single point of coordination is acceptable, simple debugging needed
- Example: Supervisor agent coordinating code review by specialized agents
Peer-to-Peer (Distributed) - Agents communicate directly without central coordinator
- Use when: High availability needed, no single point of failure tolerated
- Example: Agent mesh for distributed data processing
Hierarchical (Tree) - Multi-level supervision with delegation
- Use when: Complex workflows, need clear responsibility hierarchy
- Example: Engineering organization simulation with managers and workers
Mesh (Fully Connected) - All agents can communicate with all others
- Use when: Maximum resilience required, communication overhead acceptable
- Example: Consensus-based decision systems

See REFERENCE.md for detailed architecture diagrams.

2. Select Your Framework

Choose the framework that matches your needs:

Framework	Best For	Complexity	Key Strength
LangGraph	Complex workflows, state management	Medium	Graph-based coordination
AutoGen	Conversations, human-in-loop	Low	Easy multi-agent chat
CrewAI	Role-based teams	Low	Task delegation
Custom	Full control, unique requirements	High	Maximum flexibility

See KNOWLEDGE.md for detailed comparison.

3. Implement Your First Multi-Agent System

Example: Simple supervisor pattern with LangGraph

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    next_agent: str

# Create specialized agents
def researcher(state):
    # Research logic
    return {"messages": ["Research complete"], "next_agent": "writer"}

def writer(state):
    # Writing logic
    return {"messages": ["Report written"], "next_agent": "FINISH"}

# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)
workflow.set_entry_point("researcher")

app = workflow.compile()
result = app.invoke({"messages": [], "next_agent": "researcher"})

See EXAMPLES.md for complete working examples.

4. Add Consensus Mechanism (Optional)

For critical decisions, implement voting:

from multi_agent_coordination import ConsensusEngine, Vote, VoteType

consensus = ConsensusEngine(agents=["agent_1", "agent_2", "agent_3"])

votes = [
    Vote("agent_1", VoteType.YES, 0.9, "High confidence"),
    Vote("agent_2", VoteType.YES, 0.8, "Agree"),
    Vote("agent_3", VoteType.NO, 0.7, "Concerns exist"),
]

result = consensus.simple_majority(votes)
# Returns: {"result": "PASS", "yes": 2, "no": 1, "percentage": 66.7}

See PATTERNS.md for full consensus patterns.

Implementation Patterns

This skill provides 6 battle-tested patterns:

Pattern 1: LangGraph Multi-Agent with Supervisor

When to use: Complex workflows with state persistence and conditional routing Complexity: Medium Key features: Graph-based coordination, state management, cycle prevention

View Pattern Details | View Code Example

Pattern 2: AutoGen Multi-Agent Conversation

When to use: Conversational agents, human-in-the-loop, group chat scenarios Complexity: Low Key features: Natural conversation flow, easy human interaction, code execution

View Pattern Details | View Code Example

Pattern 3: CrewAI Role-Based Teams

When to use: Clear role assignments, task delegation, sequential workflows Complexity: Low Key features: Role specialization, task dependencies, built-in tools

View Pattern Details | View Code Example

Pattern 4: Consensus and Voting Mechanisms

When to use: Critical decisions, multiple agent perspectives, conflict resolution Complexity: Medium Key features: Multiple voting types, weighted decisions, quorum support

View Pattern Details | View Code Example

Pattern 5: Shared State with Event Sourcing

When to use: Distributed state, audit trail needed, state replay required Complexity: High Key features: Immutable events, state reconstruction, time-travel debugging

View Pattern Details | View Code Example

Pattern 6: Agent Lifecycle Management

When to use: Dynamic agent pools, health monitoring, auto-scaling needed Complexity: High Key features: Health checks, registration, auto-scaling, metrics

View Pattern Details | View Code Example

Top Gotchas

1. Coordination Overhead

Problem: Too much agent communication slows everything down Solution: Batch communications, use async patterns, minimize chatter Detection: Monitor message count and latency between agents

2. Agent Deadlock

Problem: Agents waiting for each other in circular dependency Solution: Timeout on all waits, detect cycles, use coordinator to break deadlocks Detection: Trace agent state transitions, look for circular waits

3. State Inconsistency

Problem: Agents have different views of shared state Solution: Event sourcing, CRDTs, eventual consistency, versioning Detection: Compare agent state snapshots, look for divergence

View All 10 Gotchas - Detailed troubleshooting guide

Communication Patterns

Synchronous (Request-Response)

Direct agent-to-agent communication with immediate response.

Agent A ──request──► Agent B
Agent A ◄─response── Agent B

Asynchronous (Message Queue)

Fire-and-forget messaging via queue.

Agent A ──msg──► Queue ──msg──► Agent B

Pub/Sub (Broadcast)

One-to-many broadcasting to subscribers.

Publisher ──event──► Topic ──┬──► Subscriber 1
                             ├──► Subscriber 2
                             └──► Subscriber 3

See REFERENCE.md for detailed patterns.

Best Practices

DO's

Start Simple - Begin with single agent, add multi-agent only when needed
Clear Contracts - Define explicit communication protocols between agents
Timeout Everything - All agent interactions should have timeouts
Monitor Conversations - Log all agent-to-agent communications
Use Voting - For critical decisions, use consensus mechanisms
Specialize Agents - Each agent should have clear, focused responsibility
Handle Failures - Expect agents to fail, implement graceful degradation
Version Protocols - Use versioned message formats for compatibility
Test in Isolation - Test each agent independently before integration
Implement Observability - Trace multi-agent interactions for debugging

DON'Ts

Don't Create Agent Explosion - Resist urge to create too many agents
Don't Share Mutable State - Use message passing, not shared memory
Don't Ignore Deadlocks - Test for circular dependencies
Don't Skip Health Checks - Monitor agent health continuously
Don't Hardcode Routing - Use dynamic agent discovery and routing
Don't Trust All Agents - Validate agent responses, especially in open systems
Don't Forget Cleanup - Properly shutdown and cleanup agent resources
Don't Over-Engineer - Simple coordination often beats complex protocols

Documentation Structure

This skill uses progressive disclosure - start here and drill down as needed:

KNOWLEDGE.md - Framework deep-dives, theory, research, protocols
PATTERNS.md - Implementation pattern details, architecture guidance
EXAMPLES.md - Complete, runnable code examples for all patterns
GOTCHAS.md - All 10 common pitfalls with detailed solutions
REFERENCE.md - Architecture diagrams, API reference, feature matrix

Production Deployment Checklist

Before deploying multi-agent systems:

Define agent roles and responsibilities
Design communication protocols (sync vs async)
Implement agent registration and discovery
Set up health monitoring and alerts
Configure timeouts for all operations
Implement consensus mechanisms for critical decisions
Set up distributed tracing (trace ID propagation)
Add circuit breakers for agent-to-agent calls
Implement agent authentication/authorization
Configure resource limits (CPU, memory, concurrency)
Set up dead letter queues for failed messages
Implement agent versioning and compatibility
Add metrics and dashboards for agent performance
Test failure scenarios (agent crashes, network partition)
Document agent handoff protocols
Implement graceful shutdown procedures

Related Skills

orchestration-coordination-framework - General orchestration patterns
evaluation-reporting-framework - Evaluating multi-agent performance
mcp-integration-toolkit - Agent communication via MCP
ai-evaluation-suite - Testing agent interactions
architecture-evaluation-framework - System architecture assessment

Quick Reference

Key Concepts

Agent: Autonomous entity that can perceive, decide, and act
Coordinator: Agent that orchestrates other agents
Consensus: Agreement mechanism among multiple agents
State: Shared or distributed data accessed by agents
Handoff: Transfer of control from one agent to another

Framework URLs

Common Tasks

Create supervisor agent: See Example 1
Implement voting: See Example 4
Manage shared state: See Example 5
Auto-scale agents: See Example 6
Debug agent interactions: See GOTCHAS.md

multi-agent-coordination-framework

Install Skill

SKILL.md

Multi-Agent Coordination Framework

Purpose

When to Use This Skill

Quick Start

1. Choose Your Architecture

2. Select Your Framework

3. Implement Your First Multi-Agent System

4. Add Consensus Mechanism (Optional)

Implementation Patterns

Pattern 1: LangGraph Multi-Agent with Supervisor

Pattern 2: AutoGen Multi-Agent Conversation

Pattern 3: CrewAI Role-Based Teams

Pattern 4: Consensus and Voting Mechanisms

Pattern 5: Shared State with Event Sourcing

Pattern 6: Agent Lifecycle Management

Top Gotchas

1. Coordination Overhead

2. Agent Deadlock

3. State Inconsistency

Communication Patterns

Synchronous (Request-Response)

Asynchronous (Message Queue)

Pub/Sub (Broadcast)

Best Practices

DO's

DON'Ts

Documentation Structure

Production Deployment Checklist

Related Skills

Quick Reference

Key Concepts

Framework URLs

Common Tasks