Claude Code Plugins

Community-maintained marketplace

Feedback

Interactive AI agent development workflow orchestrator. Guides users through complete agent creation from brainstorming to production-ready implementation. Use when user wants to build a new AI agent, design an agentic system, scaffold an agent project, or needs help with agent architecture decisions. Supports three-phase workflow - design documentation, implementation scaffolding, and operations (deploy/observe/evaluate).

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name agent-builder
description Interactive AI agent development workflow orchestrator. Guides users through complete agent creation from brainstorming to production-ready implementation. Use when user wants to build a new AI agent, design an agentic system, scaffold an agent project, or needs help with agent architecture decisions. Supports three-phase workflow - design documentation, implementation scaffolding, and operations (deploy/observe/evaluate).
version 1.1.0
last-updated Fri Jan 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Agent Builder Skill

Overview

The Agent Builder skill is an interactive workflow orchestrator that guides you through building production-ready AI agents from concept to deployment. It implements a proven 9-stage methodology and supports two-phase operation for maximum flexibility:

  • Phase 1 (Design): Interactive planning and comprehensive design documentation generation
  • Phase 2 (Scaffolding): Project setup and boilerplate code generation in any target location
  • Phase 3 (Operations): Deploy, observe, and evaluate in production with continuous improvement

The 9 Stages:

  1. BRAINSTORM → 2. DESIGN → 3. IMPLEMENT → 4. TEST → 5. DOCUMENT → 6. DEPLOY → 7. OBSERVE → 8. EVALUATE → 9. ITERATE

This skill helps you make informed architectural decisions, apply proven patterns, avoid common pitfalls, and produce high-quality documentation from the start. It integrates 6 foundational principles and organizational standards through optional progressive hints at key decision points.

When to Use This Skill

Activate this skill when you:

  • Want to build a new AI agent from scratch
  • Need to design an agentic system or multi-agent architecture
  • Are planning an agent project and need architecture guidance
  • Want to scaffold an agent project with proper structure
  • Need help with SDK selection (Claude SDK, OpenAI Agents, Strands, LangGraph, etc.)
  • Want to apply proven agent patterns to your use case
  • Need comprehensive design documentation before implementation
  • Want to separate design and implementation phases

Trigger phrases:

  • "Help me build an agent for..."
  • "I want to create an AI agent that..."
  • "Guide me through designing an agent for..."
  • "What's the best way to build an agent that..."
  • "I need to scaffold a new agent project"
  • "Help me choose the right SDK for..."

How This Skill Works

Two-Phase Workflow

Phase 1: Design & Documentation (Interactive)

Purpose: Plan and document your agent thoroughly before writing code

Process: Interactive guided questioning through:

  1. BRAINSTORM - Define problem, use case, success criteria, choose SDK/language
  2. DESIGN - Define agent persona, design tools, plan workflow, choose patterns

Output:

  • agent-design-document.md - Comprehensive human-readable design specification
  • scaffolding-manifest.json - Machine-readable specification for code generation

Key Benefits:

  • Document in any location (doesn't require target project to exist)
  • Review and approve design before implementation
  • Version control design decisions
  • Share designs with team for feedback
  • Portable between projects

Phase 2: Scaffolding & Implementation

Purpose: Generate project structure and boilerplate from design document

Input: Design document from Phase 1

Process:

  1. Parse design document and manifest
  2. Create directory structure
  3. Generate boilerplate code with placeholders
  4. Create configuration files
  5. Generate pre-filled documentation
  6. Setup testing structure

Output: Complete project structure ready for customization

Key Benefits:

  • Can scaffold in different location than design
  • Consistent project structure
  • Pre-filled documentation
  • Validation schemas scaffolded
  • 2-4 hours of setup time saved

Interactive Questioning Framework

The skill asks targeted questions at each stage to gather requirements and guide decisions. Questions are based on proven best practices and real-world patterns.

Question Types:

  • Decision questions - Choose between well-defined options (SDK, language, complexity)
  • Definition questions - Describe use case, success criteria, tool specifications
  • Validation questions - Confirm decisions, check for missing requirements
  • Pattern questions - Identify which proven patterns apply to your use case

Complete Workflow Stages

Stage 1: BRAINSTORM (Define Problem Space)

Objective: Understand the problem and make foundational decisions

Questions Asked:

  1. Use Case Definition

    • What problem does this agent solve?
    • Who are the primary users?
    • What makes this valuable?
    • What would success look like?
  2. Success Criteria

    • What are 3-5 measurable goals?
    • What are the key edge cases to handle?
    • What are acceptable limitations?
  3. SDK Selection

    • Need multi-agent orchestration?
    • Model flexibility required?
    • Complex state management needed?
    • Performance requirements?
    • Recommendation: Claude SDK / OpenAI Agents / Strands / LangGraph
  4. Language Selection

    • TypeScript (type safety, web/API focus, Bun runtime)
    • Python (data science, ML libraries, asyncio)
  5. Complexity Estimation

    • Low (1-3 tools, no subagents, single purpose)
    • Medium (4-7 tools, 0-2 subagents, moderate workflows)
    • High (8+ tools, 3+ subagents, complex orchestration)

Output: Clear problem statement, success criteria, SDK/language selected, complexity assessed

Stage 2: DESIGN (Create Architectural Blueprint)

Objective: Design the complete agent architecture

Questions Asked:

  1. Agent Persona & Role

    • What expertise does the agent have?
    • What communication style is appropriate?
    • What are the agent's limitations?
    • What should the agent never do?
  2. Tool Design (3-5 recommended to start)

    • Tool name and purpose
    • Input parameters and types
    • Output structure and types
    • Validation requirements (Zod/Pydantic schemas)
    • Error conditions to handle
  3. Workflow Design

    • What are the conversation stages?
    • How does the conversation progress?
    • What information is collected at each stage?
    • How does the workflow handle failures?
  4. Subagent Planning (if needed)

    • What specialized tasks need delegation?
    • Which model for which subagent? (Sonnet for analysis, Haiku for optimization)
    • How do subagents communicate?
    • What context do they need?
  5. Data Model Design

    • What data structures are needed?
    • How is state managed?
    • What needs persistence?
    • Type safety approach (Zod for TS, Pydantic for Python)
  6. MCP Integration (if applicable)

    • Which MCP servers are needed?
    • What documentation sources?
    • Custom MCP tools required?
  7. Pattern Selection

    • Which proven patterns apply? (see Pattern Catalog)
    • Validation pattern (Zod/Pydantic)
    • Subagent pattern (Specialization, Parallel, Handoff)
    • Data pattern (Mock-first, Service-based, Direct context)
    • Scoring pattern (Weighted, Multi-criteria, Suitability)

Output: Complete architectural blueprint with all decisions documented

Stage 3: IMPLEMENT (Guided Implementation)

Note: This stage begins Phase 2 (Scaffolding)

Objective: Generate project structure and guide implementation

Activities:

  1. Project Setup - Create directory structure, install dependencies
  2. Tool Scaffolding - Generate tool templates with validation
  3. Agent Configuration - Create agent config with system prompt
  4. Data Model Setup - Generate type definitions and schemas
  5. Mock Data - Create realistic mock data structures
  6. Testing Structure - Setup test framework and initial tests

Guidance Provided:

  • Best practices for tool implementation
  • Error handling patterns
  • Validation schema examples
  • Type safety recommendations
  • Naming conventions
  • Code organization

Stage 4: TEST (Validation Strategy)

Objective: Plan comprehensive testing approach

Test Levels:

  1. Unit Tests - Individual tool testing
  2. Integration Tests - End-to-end workflows
  3. Edge Case Tests - Invalid inputs, failures, timeouts
  4. Performance Tests - Response times, token usage
  5. Manual QA - Realistic scenarios, UX verification

Guidance Provided:

  • Test structure templates
  • Critical test scenarios
  • Edge cases to consider
  • Performance benchmarks
  • QA checklist

Stage 5: DOCUMENT (Comprehensive Documentation)

Objective: Create production-grade documentation

Documents Generated:

  1. Agent CLAUDE.md - Complete agent documentation

    • Overview and purpose
    • Directory structure
    • Setup instructions
    • Tool descriptions
    • Workflow explanation
    • Testing approach
    • Known limitations
    • Key decisions and rationale
  2. README.md - Quick start guide

    • Installation steps
    • Basic usage
    • Configuration
    • Examples
  3. Comparison Matrix Entry - Template for comparison documentation

  4. Memory System Entries - Knowledge graph templates (entities, relations, observations)

  5. Tool Catalog Contributions - Reusable pattern documentation

Stage 6: DEPLOY (Production Release)

Objective: Safely release agent to production

Note: This stage begins Phase 3 (Operations)

Activities:

  1. Deployment Gates - Offline evals → Red team → Staging → Production
  2. Rollout Strategy - Canary (1%) → Limited (10%) → Gradual (50%) → Full (100%)
  3. Rollback Planning - Automatic triggers, manual procedures
  4. Agent Identity - Register SPIFFE ID (if applicable)
  5. Inventory Documentation - Add to central agent inventory

Guidance Provided:

  • Deployment gates checklist
  • Rollout strategy templates
  • Rollback trigger criteria
  • Security hardening requirements

💡 Progressive Hint: "Apply deployment gates checklist?" → See references/operations-guide.md

Stage 7: OBSERVE (Production Monitoring)

Objective: Monitor agent behavior and performance in production

Activities:

  1. Tracing Setup - Root span per request with OTel taxonomy
  2. Metrics Configuration - Latency (p50/p90/p95/p99), token usage, cost
  3. Logging Standards - Structured JSON, no PII, appropriate levels
  4. Alerting - Define thresholds and notification channels

Guidance Provided:

  • OpenTelemetry configuration templates
  • Required trace attributes
  • Metric collection standards
  • Log sanitization guidelines

💡 Progressive Hint: "Setup OpenTelemetry observability?" → See references/operations-guide.md

Stage 8: EVALUATE (Performance Assessment)

Objective: Comprehensively assess agent performance and quality

Activities:

  1. Offline Evaluation - Benchmarks, RAG metrics, safety testing
  2. Online Evaluation - A/B tests, shadow traffic, canary analysis
  3. Human Feedback - In-product feedback, review queues
  4. Success Metrics - Acceptance rate, win rate, task completion

Guidance Provided:

  • Evaluation framework templates
  • Metric calculation methods
  • Feedback collection patterns
  • Feedback → Eval Case pipeline

💡 Progressive Hint: "Define evaluation framework (Offline/Online/Human)?" → See references/operations-guide.md

Stage 9: ITERATE (Continuous Improvement)

Objective: Plan for feedback and iteration

Activities:

  1. Feedback Collection - User input, metrics, errors, usage patterns
  2. Improvement Identification - What worked/didn't, missing features
  3. Incremental Updates - Bug fixes, new tools, prompt improvements
  4. Documentation Updates - Keep docs current with changes
  5. Learning Capture - Update catalogs and patterns
  6. Feedback Loop Closure - Convert feedback into durable eval cases

💡 Progressive Hint: "Setup feedback loops?" → See references/operations-guide.md

Decision Frameworks

SDK Selection Matrix

Choose Claude SDK when:

  • ✅ Excellent reasoning required (complex decision-making)
  • ✅ Anthropic models preferred
  • ✅ Simple subagent support sufficient
  • ✅ Conversational agents
  • ⚠️ Multi-agent orchestration limited
  • ⚠️ Anthropic-only (not multi-provider)

Choose OpenAI Agents when:

  • ✅ Multi-agent coordination critical
  • ✅ Parallel + handoff patterns needed
  • ✅ Built-in session memory valuable (SQLite)
  • ✅ OpenAI models preferred
  • ⚠️ OpenAI-only (not multi-provider)

Choose Strands Agents when:

  • ✅ Model flexibility required (multi-provider)
  • ✅ Lightweight deployment
  • ✅ Extensive MCP integration
  • ✅ Model-driven architecture
  • ✅ Google Drive, Slack, etc. integrations

Choose LangGraph when:

  • ✅ Complex state management required
  • ✅ Multi-step workflows with branching
  • ✅ Workflow visualization needed
  • ✅ Python-only acceptable
  • ✅ LangChain ecosystem integration

Language Selection Criteria

Choose TypeScript when:

  • Web/API integrations primary focus
  • Production web agents
  • Bun runtime benefits valued
  • Strict compile-time typing preferred
  • Validation via Zod

Choose Python when:

  • Data science/ML integrations needed
  • Rich scientific computing ecosystem required
  • Async patterns (asyncio) important
  • Validation via Pydantic
  • NumPy, Pandas, SciPy, etc. needed

Complexity Level Guidance

Low Complexity (1-3 tools, 0 subagents)

  • Single-purpose agents
  • Direct context passing
  • Simple workflows (1-3 stages)
  • Example: Document Q&A, simple search

Medium Complexity (4-7 tools, 0-2 subagents)

  • Multi-stage workflows
  • Subagent specialization
  • Moderate state management
  • Example: Financial analysis, product comparison

High Complexity (8+ tools, 3+ subagents)

  • Multi-agent orchestration
  • Parallel execution patterns
  • Complex state management
  • Example: Portfolio analysis, multi-agent collaboration

Proven Patterns Catalog

Pattern 1: Zod/Pydantic Validation

Purpose: Type-safe input validation with runtime checking

When to use: All agents should validate tool inputs

Implementation:

// TypeScript with Zod
const ToolInputSchema = z.object({
  param: z.number().positive(),
  option: z.enum(['a', 'b', 'c'])
});
const validated = ToolInputSchema.parse(input);
# Python with Pydantic
class ToolInput(BaseModel):
    param: int = Field(gt=0)
    option: Literal['a', 'b', 'c']

Pattern 2: Subagent Specialization

Purpose: Delegate tasks by complexity or domain

When to use: Different tasks require different reasoning depth

Implementation:

  • Sonnet for complex reasoning and deep analysis
  • Haiku for optimization and quick calculations
  • Pass clear context in subagent prompts
  • Define specific success criteria

Pattern 3: Mock Data First

Purpose: Development independence from external APIs

When to use: Always (accelerates testing, enables local development)

Implementation:

  • Structure mock data identically to real API responses
  • Use clear TODO comments for replacement points
  • Support both mock and real via environment config
  • Test with mock, deploy with real

Pattern 4: Service-Based Architecture

Purpose: Separation of concerns, testability, reusability

When to use: Medium to high complexity agents

Implementation:

  • Models layer (Pydantic validation)
  • Services layer (business logic)
  • Tools layer (Claude-callable wrappers)

Pattern 5: Parallel Execution

Purpose: Performance improvement for independent tasks

When to use: Multiple agents/tools with no dependencies

Implementation:

# Python with asyncio
results = await asyncio.gather(
    agent1.run(task1),
    agent2.run(task2),
    agent3.run(task3)
)

Results: 65%+ faster for 3+ parallel tasks

Pattern 6: Weighted Scoring

Purpose: Multi-dimensional assessment with transparent calculation

When to use: Suitability analysis, ranking, prioritization

Implementation:

score = (
    compliance_score * 0.35 +
    risk_score * 0.25 +
    performance_score * 0.25 +
    time_horizon_score * 0.15
)

Pattern 7: Direct Context Passing (No RAG)

Purpose: Avoid RAG complexity for small document collections

When to use:

  • Working with 1 file at a time
  • Documents are 5-20 pages each
  • Entire document fits in LLM context window
  • User doesn't need semantic search across 100s of documents

Benefits: 3-step workflow vs 7+ steps with RAG

Pattern 8: Graceful Degradation

Purpose: Partial success better than total failure

When to use: Always (production-grade systems)

Implementation:

  • Handle missing data → Skip specific calculations, continue
  • Handle API unavailable → Use cached/mock data
  • Handle timeouts → Return partial results with warnings
  • Always provide user feedback on degraded operation

Quality Checklists

Before Starting Checklist

  • Can explain agent purpose in 1-2 sentences
  • Success criteria clearly defined (3-5 measurable goals)
  • SDK/language selection justified
  • Checked for similar existing patterns
  • Confirmed this is the simplest solution

Design Complete Checklist

  • Agent persona clearly defined
  • 3-5 tools designed with input/output specs
  • Workflow stages identified and ordered
  • Data model designed with type safety
  • Pattern selections justified
  • Edge cases identified
  • Known limitations documented

Implementation Ready Checklist

  • All tools have validation schemas
  • Error handling strategy defined
  • Mock data structure planned
  • Test strategy outlined
  • Documentation templates ready

Production Ready Checklist

  • All tools handle errors gracefully
  • Input validation on all tools (Zod/Pydantic)
  • Type safety throughout
  • Naming conventions followed
  • Comprehensive CLAUDE.md exists
  • README.md with quick start
  • Manual QA scenarios tested
  • Edge cases handled
  • Performance acceptable (<2min workflows)

How to Use This Skill

Starting Phase 1: Design & Documentation

Simply say:

  • "Help me design an agent for [use case]"
  • "I want to build an agent that [does something]"
  • "Guide me through planning an agent for [problem]"

The skill will:

  1. Ask interactive questions through BRAINSTORM and DESIGN stages
  2. Help you make informed decisions using frameworks and patterns
  3. Generate comprehensive design documentation
  4. Create scaffolding manifest for Phase 2

Starting Phase 2: Scaffolding

Provide the design document from Phase 1:

  • "Scaffold this agent design in [target directory]"
  • "Generate the project structure for this design"
  • "Create the boilerplate from this design document"

The skill will:

  1. Parse your design document
  2. Create directory structure
  3. Generate boilerplate code
  4. Setup configuration files
  5. Create pre-filled documentation

Outputs

Phase 1 Outputs

1. Agent Design Document (agent-design-document.md)

  • Complete agent specification
  • All architectural decisions documented
  • Tool specifications with validation schemas
  • Workflow stages defined
  • Pattern selections justified
  • Scaffolding instructions included
  • Human-readable and comprehensive

2. Scaffolding Manifest (scaffolding-manifest.json)

  • Machine-readable specification
  • File structure definition
  • Configuration parameters
  • Code generation instructions
  • Used by Phase 2 for automation

Phase 2 Outputs

1. Project Structure

project-name/
├── src/
│   ├── index.ts (or main.py)
│   ├── types/
│   ├── tools/
│   └── data/
├── .claude/
│   └── CLAUDE.md
├── tests/
├── package.json (or requirements.txt)
├── .env.example
├── README.md
└── tsconfig.json (or pyproject.toml)

2. Boilerplate Code

  • Agent configuration with system prompt
  • Tool templates with validation
  • Type definitions and schemas
  • Mock data structures
  • Error handling patterns

3. Configuration Files

  • Package manager config (package.json / requirements.txt)
  • TypeScript config (tsconfig.json) or Python config (pyproject.toml)
  • Environment template (.env.example)
  • Git ignore rules

4. Documentation

  • Pre-filled CLAUDE.md from design
  • README.md with quick start
  • Inline code comments
  • Test structure

Best Practices

Keep It Simple

  • Start with 3-5 tools, add more only if needed
  • Use 0-2 subagents for most agents
  • Apply patterns only when they solve real problems
  • Avoid over-engineering

Document as You Go

  • Update design document if requirements change
  • Keep CLAUDE.md current
  • Document "why" not just "what"
  • Capture learnings in memory system

Validate Early

  • Use Zod/Pydantic from the start
  • Handle errors at tool boundaries
  • Test with invalid inputs immediately
  • Don't defer error handling

Think Production-Grade

  • Error handling from day one
  • No "fix it later" mentality
  • Mock data structured like production data
  • Documentation complete before deployment

Iterate Incrementally

  • Build tools one at a time
  • Test each tool individually
  • Add complexity gradually
  • Commit often

Common Pitfalls to Avoid

Over-engineering - Building for imaginary future needs ✅ Do instead: Build exactly what's needed now, refactor later

Too many tools - Starting with 10+ tools ✅ Do instead: Start with 3-5 focused tools, add as needed

Skipping validation - "I'll add validation later" ✅ Do instead: Zod/Pydantic schemas from the first tool

Unclear workflows - Agent does "everything" ✅ Do instead: Define clear stages (Discovery → Analysis → Recommendation)

Poor error handling - Assuming APIs always work ✅ Do instead: Graceful degradation, clear error messages

Documentation debt - "I'll document it when it's done" ✅ Do instead: Design document before code, update as you build

Premature optimization - Complex caching before measuring performance ✅ Do instead: Build simple version, measure, then optimize

Examples

See the examples/ directory for:

  • Complete design document example
  • Sample scaffolding output
  • Before/after comparisons
  • Real-world agent designs

Progressive Standards Integration

This skill offers optional progressive hints at key decision points throughout the workflow. When you see a hint, you can choose to apply the referenced standard or skip it based on your project's needs.

How Progressive Hints Work

At specific stages, you'll be prompted with questions like:

💡 "Apply 6 Principles decision framework?" 💡 "Classify with Agent Taxonomy (Level 0-4)?" 💡 "Apply deployment gates checklist?"

If you answer Yes: The skill will guide you through applying that standard, referencing the appropriate documentation.

If you answer No: The skill continues without enforcement—standards are guidance, not gates.

When Standards Apply

Scope Apply These Standards
All Agents Zod/Pydantic validation, error handling, basic logging
Production 6 Principles check, testing pyramid (80/15/5), deployment gates
Enterprise Full OTel observability, evaluation framework, SPIFFE identity, A2A protocol

Hint Trigger Points

Stage Hints Available
BRAINSTORM "Apply 6 Principles decision framework?", "Classify with Agent Taxonomy?"
DESIGN "Apply A2A protocol standards?", "Need memory strategy guidance?"
IMPLEMENT "Apply coding standards (lint/validation)?"
TEST "Apply testing pyramid (80/15/5)?"
DEPLOY "Apply deployment gates checklist?"
OBSERVE "Setup OpenTelemetry observability?"
EVALUATE "Define evaluation framework?"
ITERATE "Setup feedback loops?"

Standards & Principles

This skill integrates organizational standards and foundational principles. Reference documents provide quick application guidance; source documents contain full specifications.

Quick Reference Files (Application Guides)

  • references/principles-framework.md - 6 foundational principles (TRUTH, HONOR, EXCELLENCE, SERVE, PERSEVERE, SHARPEN)
  • references/agent-taxonomy.md - Agent classification (Level 0-4), A2A protocol
  • references/memory-strategies.md - STM/LTM/Hybrid, RAG vs NL2SQL decisions
  • references/coding-standards.md - Validation (Zod/Pydantic), testing pyramid, linting
  • references/operations-guide.md - Deploy, Observe, Evaluate guidance
  • references/standards-integration.md - Stage-to-standards mapping, quick links

Source Documents (Full Standards)

  • docs/principles/CORE_FRAMEWORK.md - Biblical principles, cascade framework
  • docs/standards/AGENTIC.md - Agent architecture, lifecycle, memory, A2A
  • docs/standards/DEVELOPMENT.md - Coding standards, testing, git workflow
  • docs/standards/OPERATIONS.md - Observability, evaluation, security

Support References

  • references/workflow-stages.md - Detailed stage instructions and questioning templates
  • references/decision-frameworks.md - Complete SDK selection matrices and criteria
  • references/pattern-catalog.md - All proven patterns with implementation details
  • references/quality-checklists.md - Comprehensive quality validation criteria
  • references/principles-framework.md - 6 foundational principles application guide
  • references/agent-taxonomy.md - Agent classification and A2A protocol
  • references/memory-strategies.md - Memory strategy selection guide
  • references/coding-standards.md - Coding standards and validation guide
  • references/operations-guide.md - Operations, deployment, and evaluation guide
  • references/standards-integration.md - Stage-to-standards mapping
  • templates/design-document-template.md - Template for design output
  • templates/scaffolding-manifest.json - Template for manifest output

Version History

v1.1.0 (2025-01-24)

  • Expanded to 9-stage workflow (added DEPLOY, OBSERVE, EVALUATE)
  • Added Phase 3: Operations
  • Integrated 6 foundational principles (TRUTH, HONOR, EXCELLENCE, SERVE, PERSEVERE, SHARPEN)
  • Added progressive hints system for optional standards application
  • Added reference files for principles, agent taxonomy, memory strategies, coding standards, operations
  • Added standards-integration.md for stage-to-standards mapping

v1.0.0 (2025-01-24)

  • Initial release
  • Complete 6-stage workflow
  • Two-phase operation (Design + Scaffolding)
  • Interactive questioning framework
  • Proven patterns catalog
  • Decision frameworks
  • Quality checklists

Remember: Excellence is not about complexity. It's about doing simple things consistently well, with transparency, clear documentation, and continuous improvement.