| name | ai-agent-frameworks |
| description | Expert guidance on building production-ready multi-agent AI systems using CrewAI, LangChain, AutoGen, and custom architectures. Use when building agent systems, selecting frameworks, designing multi-agent workflows, debugging agent behavior, or deploying agents to production. |
AI Agent Frameworks Skill
When to Use This Skill
Invoke this skill when working with multi-agent AI systems, including:
- Selecting between CrewAI, LangChain, AutoGen, LangGraph, or custom agent architectures
- Designing multi-agent collaboration patterns
- Implementing agent workflows with proper observability
- Debugging stuck agents, infinite loops, or runaway costs
- Integrating agents with MCP servers and external tools
- Deploying agent systems to production
- Recognizing when agents add unnecessary complexity
Framework Selection Decision Process
Decision Tree
Start by determining if agents are truly needed:
Question 1: Is multi-step reasoning or tool orchestration required?
- If NO → Use a single LLM call or simple chain
- If YES → Continue
Question 2: Does the workflow map to clear team roles (research, write, edit)?
- If YES → Prefer CrewAI for its role-based simplicity
- If NO → Continue
Question 3: Is RAG (Retrieval-Augmented Generation) central to the application?
- If YES → Prefer LangChain for best-in-class RAG capabilities
- If NO → Continue
Question 4: Do agents need conversational debate or consensus-reaching?
- If YES → Consider AutoGen for conversation patterns
- If NO → Continue
Question 5: Does the workflow require complex state machines with loops and branching?
- If YES → Consider LangGraph for stateful control
- If NO → Continue
Question 6: Are minimal dependencies and maximum control essential?
- If YES → Build custom agent system
- If NO → Default to LangChain (most versatile)
Quick Framework Comparison Reference
For detailed comparisons, refer to references/framework-comparison.md
- CrewAI: Best for role-based teams, sequential workflows, quick prototypes
- LangChain: Best for RAG, extensive integrations, production tooling (LangSmith)
- AutoGen: Best for conversational agents, code execution, research projects
- LangGraph: Best for complex stateful workflows requiring loops and retries
- Custom: Best for simple use cases, performance-critical systems, unique requirements
Agent Architecture Patterns
Single-Agent vs Multi-Agent Criteria
Use single agent when:
- Task is straightforward with clear steps
- No benefit from specialization
- Cost and complexity must be minimized
Use multi-agent when:
- Tasks benefit from specialized expertise (research vs writing vs review)
- Parallel execution would improve performance
- Different agents need different models or tools
- Workflow involves handoffs or collaboration
Common Orchestration Patterns
Sequential Pattern:
Agent A → Agent B → Agent C
- Use when each step depends on previous output
- Examples: Research → Write → Edit → Publish
Parallel Pattern:
→ Agent A →
Input → Agent B → Merge → Output
→ Agent C →
- Use when tasks are independent and can run concurrently
- Examples: Parallel research from multiple sources
Hierarchical Pattern:
Manager Agent
/ | \
Agent A Agent B Agent C
- Use when dynamic task allocation is needed
- Manager delegates and coordinates
Conversational Pattern:
Agent A ⟷ Agent B ⟷ Agent C
- Use for debate, consensus, or iterative refinement
- Common in AutoGen workflows
Framework-Specific Guidance
When Working with CrewAI
Refer to references/crewai-patterns.md for comprehensive patterns.
Key principles:
- Design agents with specific roles (not generic "helper")
- Keep 2-5 agents per workflow (avoid over-engineering)
- Use sequential process for predictable workflows
- Use hierarchical process when dynamic delegation is needed
- Set
max_iterationsto prevent runaway costs
Common anti-pattern to avoid:
# ❌ Don't: Vague agents with overlapping responsibilities
agent = Agent(role="Helper", goal="Help with stuff")
# ✅ Do: Specific agents with clear boundaries
researcher = Agent(
role="Technical Researcher",
goal="Find accurate data on Python frameworks",
tools=[search_tool, docs_tool]
)
When Working with LangChain
Refer to references/langchain-patterns.md for comprehensive patterns.
Key principles:
- Always set
max_iterationson AgentExecutor to prevent loops - Use LCEL (
|operator) for composing chains - Enable LangSmith tracing in production
- Implement proper error handling with
handle_parsing_errors=True - Consider using GPT-3.5-turbo for simple agents to reduce costs
RAG pattern (common use case):
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
return_source_documents=True
)
Critical: Always limit iterations:
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=10, # REQUIRED
max_execution_time=60,
handle_parsing_errors=True
)
When Working with AutoGen
Refer to references/autogen-patterns.md for comprehensive patterns.
Key principles:
- Always use Docker for code execution in production (
use_docker=True) - Set
max_roundon GroupChat to prevent infinite conversations - Use
human_input_modeappropriately (NEVER, TERMINATE, or ALWAYS) - Keep group chats to 3-5 agents maximum
- Track token usage as conversations can be expensive
Safe code execution:
user_proxy = UserProxyAgent(
name="executor",
code_execution_config={
"use_docker": True, # CRITICAL for production
"timeout": 60
}
)
When Building Custom Agents
Refer to references/custom-agent-guide.md for complete implementations.
Build custom when:
- Single-purpose agent with minimal complexity
- Framework overhead is unjustified
- Need maximum performance control
- Learning exercise to understand agents deeply
Minimal implementation pattern:
- Prompt template
- LLM client with retry logic
- Tool execution with error handling
- Response parsing
Production-ready custom agent includes:
- Timeout protection
- Retry logic with exponential backoff
- Cost tracking
- Comprehensive logging
- Circuit breakers for external APIs
Debugging Stuck or Looping Agents
Detection Strategies
Loop detection:
- Track action history (last 3-5 actions)
- Alert when same action repeats consecutively
- Implement max_iterations as hard stop
Stuck detection:
- Set execution timeouts (30-60 seconds per agent)
- Monitor for agents waiting indefinitely
- Check for missing tool results
Tools available:
- Use
scripts/agent-debugger.pyto analyze trace files - Examine span durations to identify bottlenecks
- Look for repeated patterns in execution logs
Common Causes and Solutions
Cause: Agent keeps calling the same tool with same parameters Solution: Improve prompt to guide agent away from repetition, implement action history in context
Cause: Agent waiting for tool that never completes Solution: Add timeouts to all tool calls, implement circuit breakers
Cause: Agent confused about when to terminate Solution: Clear termination conditions, explicit TERMINATE signals
Tool Integration Best Practices
MCP Server Integration
When integrating MCP servers with agent frameworks:
With LangChain:
from langchain.tools import Tool
# Wrap MCP server function as LangChain tool
mcp_tool = Tool(
name="mcp_server_query",
func=mcp_server.query,
description="Query the MCP server for data"
)
# Add to agent
agent = create_openai_functions_agent(llm, tools=[mcp_tool], prompt=prompt)
With CrewAI:
- Define MCP functions as @tool decorated functions
- Register with specific agents that need MCP access
General principles:
- Always handle MCP server errors gracefully (return error strings, not exceptions)
- Set timeouts on MCP calls (don't let agents hang)
- Log all MCP interactions for debugging
Tool Error Handling Pattern
@tool
def resilient_tool(query: str) -> str:
"""Tool with proper error handling"""
try:
result = external_api.call(query, timeout=10)
return json.dumps(result)
except TimeoutError:
return "Error: Request timed out. Please try again."
except Exception as e:
logger.error(f"Tool failed: {e}")
return f"Error: {str(e)}"
Never let tools crash the agent - always return strings, even for errors.
Cost Optimization
Estimation Before Building
Use scripts/cost-estimator.py to estimate costs before implementing:
python scripts/cost-estimator.py --config workflow.json --requests 1000 --optimize
Optimization Strategies
1. Use cheaper models for simple agents:
- GPT-4 for complex reasoning (strategist, analyst)
- GPT-3.5-turbo for simple tasks (formatter, router)
2. Limit iterations strictly:
- Set
max_iterationsto 5-10 (not 50) - Set
max_execution_timeto prevent runaway costs
3. Cache repeated work:
- Enable LLM response caching
- Store results of expensive operations
- Reuse research across similar queries
4. Implement budgets:
cost_tracker = CostTracker()
if cost_tracker.total_cost > budget:
raise BudgetExceededError()
5. Monitor costs in real-time:
- Track token usage per agent
- Alert when costs approach thresholds
- Analyze cost breakdowns to identify expensive agents
Production Deployment Considerations
Refer to references/production-patterns.md for deployment architectures and references/observability-guide.md for monitoring.
Essential Production Patterns
1. Async Task Queue (Recommended):
- Use Celery + Redis for long-running workflows
- Return task ID immediately, poll for results
- Enables horizontal scaling
2. Always Enable Observability:
- Structured logging (JSON format)
- Distributed tracing (OpenTelemetry or LangSmith)
- Cost tracking per request
- Error monitoring (Sentry)
3. Implement Fault Tolerance:
- Retry with exponential backoff
- Circuit breakers for external APIs
- Graceful degradation (fallback to cheaper models)
4. Security:
- API authentication (JWT tokens)
- Rate limiting (10-100 requests/minute per user)
- Input validation (prevent prompt injection)
- Sandboxed code execution (Docker for AutoGen)
5. Set Hard Limits:
LIMITS = {
"max_iterations": 10,
"max_execution_time": 60, # seconds
"max_cost_per_request": 1.0, # USD
"max_concurrent_agents": 5
}
Anti-Patterns to Avoid
❌ Over-Engineering with Agents
Problem: Using multiple agents when a single LLM call would suffice
Example:
# ❌ Overkill for simple task
agents = [InputParser(), Processor(), OutputFormatter()]
# Just to format text!
Solution: Start simple. One LLM call. Add agents only when truly beneficial.
❌ No Termination Conditions
Problem: Agents run forever or hit arbitrary iteration limits
Solution:
# ✅ Always set termination conditions
max_iterations=10
is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", "")
❌ Vague Agent Roles
Problem: Agents with unclear responsibilities cause overlap and confusion
Solution: Specific roles with clear boundaries. "Senior Python Developer" not "Helper".
❌ Ignoring Costs
Problem: Running expensive models without tracking or budgets
Solution: Track costs per request, set budgets, alert on anomalies
❌ No Observability
Problem: Can't debug when things go wrong
Solution: Implement logging, tracing, and metrics from day one
When NOT to Use Agents
Recognize situations where agents add unnecessary complexity:
Use a single LLM call instead of agents when:
- Task is straightforward (translation, summarization, Q&A)
- No tool usage required
- Single-step reasoning suffices
- Cost and latency must be minimized
Example decision:
- "Translate this text" → Single LLM call
- "Research 10 sources and synthesize findings" → Multi-agent system
Resource Reference Guide
This skill includes comprehensive reference materials. Load them as needed:
Framework Comparisons
references/framework-comparison.md- Detailed CrewAI vs LangChain vs AutoGen comparison with migration paths
Framework-Specific Patterns
references/crewai-patterns.md- Role design, task patterns, processes, production tipsreferences/langchain-patterns.md- LCEL chains, RAG patterns, LangGraph workflowsreferences/autogen-patterns.md- Conversational agents, group chats, code execution
Advanced Topics
references/custom-agent-guide.md- Building agents from scratch, minimal implementationsreferences/observability-guide.md- Tracing, logging, metrics, debugging toolsreferences/production-patterns.md- Deployment architectures, scaling, security, monitoring
Helper Scripts
scripts/cost-estimator.py- Estimate workflow costs before buildingscripts/agent-debugger.py- Analyze traces to identify loops and bottlenecksscripts/agent-system-template/- Boilerplate for new agent systems
Getting Started Workflow
For new agent system development:
- Validate need for agents - Can this be solved with a single LLM call?
- Choose framework - Use decision tree above
- Estimate costs - Run cost-estimator.py with expected workflow
- Implement with observability - Enable logging/tracing from start
- Test for loops - Use agent-debugger.py on traces
- Set hard limits - max_iterations, budgets, timeouts
- Monitor in production - Track costs, errors, performance
Framework-Specific Quick References
CrewAI Essentials
- Agents need: role, goal, backstory, tools
- Tasks need: description, expected_output, agent
- Processes: Sequential (linear) or Hierarchical (manager delegates)
- Limit: 2-5 agents per crew
LangChain Essentials
- Always set max_iterations on AgentExecutor
- Use LCEL for chain composition:
prompt | model | parser - Enable LangSmith for production:
LANGCHAIN_TRACING_V2=true - RAG: Use RetrievalQA or ConversationalRetrievalChain
AutoGen Essentials
- UserProxyAgent executes code, AssistantAgent generates it
- Always use Docker in production:
use_docker=True - GroupChat needs max_round limit
- Human input modes: NEVER (autonomous), TERMINATE (confirm end), ALWAYS (every step)
LangGraph Essentials
- Define state, nodes (functions), and edges (transitions)
- Use conditional edges for branching logic
- Supports cycles (retries, loops)
- Requires StateGraph setup and compilation
Summary
Multi-agent systems add complexity. Use this skill to:
✅ Make informed framework selections ✅ Design effective agent architectures ✅ Avoid common pitfalls (loops, runaway costs) ✅ Implement production-ready observability ✅ Recognize when agents are overkill
Prioritize simplicity. Start with the minimum viable agent system. Add complexity only when justified by concrete benefits.