name	ai-agents
description	Production-grade AI agent patterns with MCP integration, agentic RAG, handoff orchestration, multi-layer guardrails, and observability (modern best practices)

AI Agents Development — Production Skill Hub

Modern Best Practices: MCP-based tool integration, agentic RAG, handoff-first orchestration, multi-layer guardrails, LangGraph workflows, OpenTelemetry observability, and human-in-the-loop controls.

This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.

No theory. No narrative. Only what Claude can execute.

When to Use This Skill

Claude should activate this skill whenever the user asks for:

Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
Scoping capability maturity and rollout risk for new agent behaviors.
Creating action loops, plans, workflows, or delegation logic.
Writing tool definitions, MCP tools, schemas, or validation logic.
Generating RAG pipelines, retrieval modules, or context injection.
Building memory systems (session, long-term, episodic, task).
Creating evaluation harnesses, observability plans, or safety gates.
Preparing CI/CD, rollout, deployment, or production operational specs.
Producing any template in /resources/ or /templates/.
Implementing MCP servers or integrating Model Context Protocol.
Setting up agent handoffs and orchestration patterns.
Configuring multi-layer guardrails and safety controls.
For prompt scaffolds, retrieval tuning, or security depth, see Scope Boundaries below.

Scope Boundaries (Use These Skills for Depth)

Prompt scaffolds & structured outputs → ai-prompt-engineering
RAG retrieval & chunking → ai-rag
Search tuning (BM25/HNSW/hybrid) → ai-rag
Security/guardrails → ai-mlops
Inference optimization → ai-llm-inference

Quick Reference

Agent Type	Capability	Frameworks	MCP/A2A	When to Use
RAG Agent	Knowledge-grounded responses	LangChain, LlamaIndex	MCP for tools	Answering questions from knowledge base
Tool-Using	API/function calls	LangGraph, Autogen	MCP for tools	External actions (search, DB, APIs)
Multi-Agent	Task delegation, collaboration	CrewAI, AutoGen, ADK	A2A for handoffs	Complex workflows requiring specialization
OS Agent	Computer/browser control	Anthropic Computer Use	MCP for system	Desktop automation, web browsing
Agentic RAG	Dynamic multi-step retrieval	Custom (ReAct + RAG)	MCP for data	Complex queries requiring iterative search
Planning Agent	Strategic decomposition	LangGraph (ReAct/Plan-Execute)	A2A for delegation	Multi-step problems, long-horizon tasks
Code/SWE Agent	Autonomous coding, PR creation	HyperAgent, Devin, Claude Code	MCP for git/fs	Issue resolution, feature implementation

Decision Tree: Choosing Agent Architecture

What does the agent need to do?
    ├─ Answer questions from knowledge base?
    │   ├─ Simple lookup? → RAG Agent (LangChain/LlamaIndex + vector DB)
    │   └─ Complex multi-step? → Agentic RAG (iterative retrieval + reasoning)
    │
    ├─ Perform external actions (APIs, tools, functions)?
    │   ├─ 1-3 tools, linear flow? → Tool-Using Agent (LangGraph + MCP)
    │   └─ Complex workflows, branching? → Planning Agent (ReAct/Plan-Execute)
    │
    ├─ Write/modify code autonomously?
    │   ├─ Single file edits? → Tool-Using Agent with code tools
    │   └─ Multi-file, issue resolution? → Code/SWE Agent (HyperAgent pattern)
    │
    ├─ Delegate tasks to specialists?
    │   ├─ Fixed workflow? → Multi-Agent Sequential (A → B → C)
    │   ├─ Manager-Worker? → Multi-Agent Hierarchical (Manager + Workers)
    │   └─ Dynamic routing? → Multi-Agent Group Chat (collaborative)
    │
    ├─ Control desktop/browser?
    │   └─ OS Agent (Anthropic Computer Use + MCP for system access)
    │
    └─ Hybrid (combination of above)?
        └─ Planning Agent that coordinates:
            - Tool-using for actions (MCP)
            - RAG for knowledge (MCP)
            - Multi-agent for delegation (A2A)
            - Code agents for implementation

Protocol Selection:

Use MCP for: Tool access, data retrieval, single-agent integration
Use A2A for: Agent-to-agent handoffs, multi-agent coordination, task delegation

Navigation: Core Concepts & Patterns

Governance & Maturity

Agent Maturity & Governance - `resources/agent-maturity-governance.md`
- Capability maturity levels (L0-L4)
- Identity & policy enforcement
- Fleet control and registry management
- Deprecation rules and kill switches

Modern Best Practices

Modern Best Practices - `resources/modern-best-practices.md`
- Model Context Protocol (MCP)
- Agent-to-Agent Protocol (A2A)
- Agentic RAG (Dynamic Retrieval)
- Multi-layer guardrails
- LangGraph over LangChain
- OpenTelemetry for agents

Context Management

Context Engineering - `resources/context-engineering.md`
- Progressive disclosure
- Session management
- Memory provenance
- Retrieval timing
- Multimodal context

Core Operational Patterns

Operational Patterns - `resources/operational-patterns.md`
- Agent loop pattern (PLAN → ACT → OBSERVE → UPDATE)
- OS agent action loop
- RAG pipeline pattern
- Tool specification
- Memory system pattern
- Multi-agent workflow
- Safety & guardrails
- Observability
- Evaluation patterns
- Deployment & CI/CD

Navigation: Protocol Implementation

MCP Practical Guide - `resources/mcp-practical-guide.md` Building MCP servers, tool integration, and standardized data access
MCP Server Builder - `resources/mcp-server-builder.md` End-to-end checklist for workflow-focused MCP servers (design → build → test)
A2A Handoff Patterns - `resources/a2a-handoff-patterns.md` Agent-to-agent communication, task delegation, and coordination protocols
Protocol Decision Tree - `resources/protocol-decision-tree.md` When to use MCP vs A2A, decision framework, and selection criteria

Navigation: Agent Capabilities

Agent Operations - `resources/agent-operations-best-practices.md` Action loops, planning, observation, and execution patterns
RAG Patterns - `resources/rag-patterns.md` Contextual retrieval, agentic RAG, and hybrid search strategies
Memory Systems - `resources/memory-systems.md` Session, long-term, episodic, and task memory architectures
Tool Design & Validation - `resources/tool-design-specs.md` Tool schemas, validation, error handling, and MCP integration

Skill Packaging & Sharing

Skill Lifecycle - `resources/skill-lifecycle.md` Scaffold, validate, package, and share Claude skills with teams (Slack-ready)
API Contracts for Agents - `resources/api-contracts-for-agents.md` Request/response envelopes, safety gates, streaming/async patterns, error taxonomy
Multi-Agent Patterns - `resources/multi-agent-patterns.md` Manager-worker, sequential, handoff, and group chat orchestration
OS Agent Capabilities - `resources/os-agent-capabilities.md` Desktop automation, UI grounding, and computer use patterns
Code/SWE Agents - `resources/code-swe-agents.md` SE 3.0 paradigm, autonomous coding patterns, SWE-Bench, HyperAgent architecture

Navigation: Production Operations

Evaluation & Observability - `resources/evaluation-and-observability.md` OpenTelemetry GenAI, metrics, LLM-as-judge, and monitoring
Deployment, CI/CD & Safety - `resources/deployment-ci-cd-and-safety.md` Multi-layer guardrails, HITL controls, NIST AI RMF, production checklists

Navigation: Templates (Copy-Paste Ready)

Core Agent Templates

Standard Agent Template - `templates/core/agent-template-standard.md` Full production spec: memory, tools, RAG, evaluation, observability, safety
Specialized Agent Template - `templates/core/agent-template-specialized.md` Domain-specific agents with custom capabilities and constraints
Quick Agent Template - `templates/core/agent-template-quick.md` Minimal viable agent for rapid prototyping

RAG Templates

Basic RAG - `templates/rag/rag-basic.md` Simple retrieval-augmented generation pipeline
Advanced RAG - `templates/rag/rag-advanced.md` Contextual retrieval, reranking, and agentic RAG patterns
Hybrid Retrieval - `templates/rag/hybrid-retrieval.md` Semantic + keyword search with BM25 fusion

Tool Templates

Tool Definition - `templates/tools/tool-definition.md` MCP-compatible tool schemas with validation and error handling
Tool Validation Checklist - `templates/tools/tool-validation-checklist.md` Testing, security, and production readiness checks

Multi-Agent Templates

Manager-Worker Template - `templates/multi-agent/manager-worker-template.md` Orchestration pattern with task delegation and result aggregation
Evaluator-Router Template - `templates/multi-agent/evaluator-router-template.md` Dynamic routing with quality assessment and domain classification

Service Layer Templates

FastAPI Agent Service - `../dev-api-design/templates/fastapi/fastapi-complete-api.md` Auth, pagination, validation, error handling; extend with model lifespan loads, SSE, background tasks

External Sources Metadata

Curated References - `data/sources.json` 95 authoritative sources across 13 categories including arXiv research papers and Code/SWE agents

Shared Utilities (Centralized patterns — extract, don't duplicate)

../_shared/utilities/llm-utilities.md — Token counting, streaming, cost estimation
../_shared/utilities/error-handling.md — Effect Result types, correlation IDs
../_shared/utilities/resilience-utilities.md — p-retry v6, circuit breaker for API calls
../_shared/utilities/logging-utilities.md — pino v9 + OpenTelemetry integration
../_shared/utilities/observability-utilities.md — OpenTelemetry SDK, tracing, metrics
../_shared/utilities/testing-utilities.md — Test factories, fixtures, mocks
../_shared/resources/code-quality-operational-playbook.md — Canonical coding rules & LLM code review

Related Skills

This skill integrates with complementary Claude Code skills:

Core Dependencies

`../ai-llm/` - LLM patterns, prompt engineering, and model selection for agents
`../ai-rag/` - Deep RAG implementation: chunking, embedding, reranking
`../ai-prompt-engineering/` - System prompt design, few-shot patterns, reasoning strategies

Production & Operations

`../qa-observability/` - OpenTelemetry, metrics, distributed tracing
`../software-security-appsec/` - OWASP Top 10, input validation, secure tool design
`../ops-devops-platform/` - CI/CD pipelines, deployment strategies, infrastructure

Supporting Patterns

`../dev-api-design/` - REST/GraphQL design for agent APIs and tool interfaces
`../ai-mlops/` - Model deployment, monitoring, drift detection
`../qa-debugging/` - Agent debugging, error analysis, root cause investigation

Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.

Usage Notes for Claude

Modern Standards: Default to MCP for tools, agentic RAG for retrieval, handoff-first for multi-agent
Lightweight SKILL.md: Use this file for quick reference and navigation
Drill-down resources: Reference detailed resources for implementation guidance
Copy-paste templates: Use templates when the user asks for structured artifacts
External sources: Reference data/sources.json for authoritative documentation links
No theory: Never include theoretical explanations; only operational steps

Key Modern Migrations

Traditional → Modern:

Custom APIs → Model Context Protocol (MCP)
Static RAG → Agentic RAG with contextual retrieval
Ad-hoc handoffs → Versioned handoff APIs with JSON Schema
Single guardrail → Multi-layer defense (5+ layers)
LangChain agents → LangGraph stateful workflows
Custom observability → OpenTelemetry GenAI standards
Model-centric → Context engineering-centric

AI-Native SDLC Pattern (Delegate → Review → Own)

Plan: Have the agent draft PLAN.md or use a planning tool; require code-path trace, dependency map, and risk/edge-case list before build starts.
Design: Convert mocks to components; enforce design tokens/style guides; surface accessibility gaps; keep MCP-linked component libraries in context.
Build: Let the agent scaffold end-to-end (models/APIs/UI/tests/docs); enforce long-run guardrails (time cap, allowed commands/tools, commit/PR gating, kill switch).
Test: Demand failing test first; agent generates and runs suites; require coverage deltas and flaky-test notes; human reviews assertions and fixtures.
Review: Agent runs first-pass review tuned for P0/P1; human focuses on architecture, performance, safety, and migration risk; always own final merge.
Document: Agent drafts PR summaries, module/file notes, and mermaid diagrams; require doc updates in the same run; human adds “why” and approvals.
Deploy & Maintain: Agent links logs/metrics via MCP for triage; propose hotfixes with rollback plans; human approves rollouts; track drift/regressions with evals.

Executive Briefing (Optional)

Value: Coding agents compress SDLC time; delegate mechanical work, keep humans on intent/architecture; measurable gains come from tight guardrails plus eval loops.
Cost & Risk: Training vs inference economics; long runs need caps/kill switches; data/secret handling and supply-chain policies stay human-owned.
Governance: Multi-layer guardrails (policy prompt, tool allowlist, auth scopes, eval gates, audit logs); require human sign-off for deploys and safety-sensitive changes.

Install Skill

SKILL.md