name	agent-builder
description	Interactive AI agent development workflow orchestrator. Guides users through complete agent creation from brainstorming to production-ready implementation. Use when user wants to build a new AI agent, design an agentic system, scaffold an agent project, or needs help with agent architecture decisions. Supports three-phase workflow - design documentation, implementation scaffolding, and operations (deploy/observe/evaluate).
version	1.1.0
last-updated	Fri Jan 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Agent Builder Skill

Overview

The Agent Builder skill is an interactive workflow orchestrator that guides you through building production-ready AI agents from concept to deployment. It implements a proven 9-stage methodology and supports two-phase operation for maximum flexibility:

Phase 1 (Design): Interactive planning and comprehensive design documentation generation
Phase 2 (Scaffolding): Project setup and boilerplate code generation in any target location
Phase 3 (Operations): Deploy, observe, and evaluate in production with continuous improvement

The 9 Stages:

BRAINSTORM → 2. DESIGN → 3. IMPLEMENT → 4. TEST → 5. DOCUMENT → 6. DEPLOY → 7. OBSERVE → 8. EVALUATE → 9. ITERATE

This skill helps you make informed architectural decisions, apply proven patterns, avoid common pitfalls, and produce high-quality documentation from the start. It integrates 6 foundational principles and organizational standards through optional progressive hints at key decision points.

When to Use This Skill

Activate this skill when you:

Want to build a new AI agent from scratch
Need to design an agentic system or multi-agent architecture
Are planning an agent project and need architecture guidance
Want to scaffold an agent project with proper structure
Need help with SDK selection (Claude SDK, OpenAI Agents, Strands, LangGraph, etc.)
Want to apply proven agent patterns to your use case
Need comprehensive design documentation before implementation
Want to separate design and implementation phases

Trigger phrases:

"Help me build an agent for..."
"I want to create an AI agent that..."
"Guide me through designing an agent for..."
"What's the best way to build an agent that..."
"I need to scaffold a new agent project"
"Help me choose the right SDK for..."

How This Skill Works

Two-Phase Workflow

Phase 1: Design & Documentation (Interactive)

Purpose: Plan and document your agent thoroughly before writing code

Process: Interactive guided questioning through:

BRAINSTORM - Define problem, use case, success criteria, choose SDK/language
DESIGN - Define agent persona, design tools, plan workflow, choose patterns

Output:

agent-design-document.md - Comprehensive human-readable design specification
scaffolding-manifest.json - Machine-readable specification for code generation

Key Benefits:

Document in any location (doesn't require target project to exist)
Review and approve design before implementation
Version control design decisions
Share designs with team for feedback
Portable between projects

Phase 2: Scaffolding & Implementation

Purpose: Generate project structure and boilerplate from design document

Input: Design document from Phase 1

Process:

Parse design document and manifest
Create directory structure
Generate boilerplate code with placeholders
Create configuration files
Generate pre-filled documentation
Setup testing structure

Output: Complete project structure ready for customization

Key Benefits:

Can scaffold in different location than design
Consistent project structure
Pre-filled documentation
Validation schemas scaffolded
2-4 hours of setup time saved

Interactive Questioning Framework

The skill asks targeted questions at each stage to gather requirements and guide decisions. Questions are based on proven best practices and real-world patterns.

Question Types:

Decision questions - Choose between well-defined options (SDK, language, complexity)
Definition questions - Describe use case, success criteria, tool specifications
Validation questions - Confirm decisions, check for missing requirements
Pattern questions - Identify which proven patterns apply to your use case

Complete Workflow Stages

Stage 1: BRAINSTORM (Define Problem Space)

Objective: Understand the problem and make foundational decisions

Questions Asked:

Use Case Definition
- What problem does this agent solve?
- Who are the primary users?
- What makes this valuable?
- What would success look like?
Success Criteria
- What are 3-5 measurable goals?
- What are the key edge cases to handle?
- What are acceptable limitations?
SDK Selection
- Need multi-agent orchestration?
- Model flexibility required?
- Complex state management needed?
- Performance requirements?
- Recommendation: Claude SDK / OpenAI Agents / Strands / LangGraph
Language Selection
- TypeScript (type safety, web/API focus, Bun runtime)
- Python (data science, ML libraries, asyncio)
Complexity Estimation
- Low (1-3 tools, no subagents, single purpose)
- Medium (4-7 tools, 0-2 subagents, moderate workflows)
- High (8+ tools, 3+ subagents, complex orchestration)

Output: Clear problem statement, success criteria, SDK/language selected, complexity assessed

Stage 2: DESIGN (Create Architectural Blueprint)

Objective: Design the complete agent architecture

Questions Asked:

Agent Persona & Role
- What expertise does the agent have?
- What communication style is appropriate?
- What are the agent's limitations?
- What should the agent never do?
Tool Design (3-5 recommended to start)
- Tool name and purpose
- Input parameters and types
- Output structure and types
- Validation requirements (Zod/Pydantic schemas)
- Error conditions to handle
Workflow Design
- What are the conversation stages?
- How does the conversation progress?
- What information is collected at each stage?
- How does the workflow handle failures?
Subagent Planning (if needed)
- What specialized tasks need delegation?
- Which model for which subagent? (Sonnet for analysis, Haiku for optimization)
- How do subagents communicate?
- What context do they need?
Data Model Design
- What data structures are needed?
- How is state managed?
- What needs persistence?
- Type safety approach (Zod for TS, Pydantic for Python)
MCP Integration (if applicable)
- Which MCP servers are needed?
- What documentation sources?
- Custom MCP tools required?
Pattern Selection
- Which proven patterns apply? (see Pattern Catalog)
- Validation pattern (Zod/Pydantic)
- Subagent pattern (Specialization, Parallel, Handoff)
- Data pattern (Mock-first, Service-based, Direct context)
- Scoring pattern (Weighted, Multi-criteria, Suitability)

Output: Complete architectural blueprint with all decisions documented

Stage 3: IMPLEMENT (Guided Implementation)

Note: This stage begins Phase 2 (Scaffolding)

Objective: Generate project structure and guide implementation

Activities:

Project Setup - Create directory structure, install dependencies
Tool Scaffolding - Generate tool templates with validation
Agent Configuration - Create agent config with system prompt
Data Model Setup - Generate type definitions and schemas
Mock Data - Create realistic mock data structures
Testing Structure - Setup test framework and initial tests

Guidance Provided:

Best practices for tool implementation
Error handling patterns
Validation schema examples
Type safety recommendations
Naming conventions
Code organization

Stage 4: TEST (Validation Strategy)

Objective: Plan comprehensive testing approach

Test Levels:

Unit Tests - Individual tool testing
Integration Tests - End-to-end workflows
Edge Case Tests - Invalid inputs, failures, timeouts
Performance Tests - Response times, token usage
Manual QA - Realistic scenarios, UX verification

Guidance Provided:

Test structure templates
Critical test scenarios
Edge cases to consider
Performance benchmarks
QA checklist

Stage 5: DOCUMENT (Comprehensive Documentation)

Objective: Create production-grade documentation

Documents Generated:

Agent CLAUDE.md - Complete agent documentation
- Overview and purpose
- Directory structure
- Setup instructions
- Tool descriptions
- Workflow explanation
- Testing approach
- Known limitations
- Key decisions and rationale
README.md - Quick start guide
- Installation steps
- Basic usage
- Configuration
- Examples
Comparison Matrix Entry - Template for comparison documentation
Memory System Entries - Knowledge graph templates (entities, relations, observations)
Tool Catalog Contributions - Reusable pattern documentation

Stage 6: DEPLOY (Production Release)

Objective: Safely release agent to production

Note: This stage begins Phase 3 (Operations)

Activities:

Deployment Gates - Offline evals → Red team → Staging → Production
Rollout Strategy - Canary (1%) → Limited (10%) → Gradual (50%) → Full (100%)
Rollback Planning - Automatic triggers, manual procedures
Agent Identity - Register SPIFFE ID (if applicable)
Inventory Documentation - Add to central agent inventory

Guidance Provided:

Deployment gates checklist
Rollout strategy templates
Rollback trigger criteria
Security hardening requirements

💡 Progressive Hint: "Apply deployment gates checklist?" → See references/operations-guide.md

Stage 7: OBSERVE (Production Monitoring)

Objective: Monitor agent behavior and performance in production

Activities:

Tracing Setup - Root span per request with OTel taxonomy
Metrics Configuration - Latency (p50/p90/p95/p99), token usage, cost
Logging Standards - Structured JSON, no PII, appropriate levels
Alerting - Define thresholds and notification channels

Guidance Provided:

OpenTelemetry configuration templates
Required trace attributes
Metric collection standards
Log sanitization guidelines

💡 Progressive Hint: "Setup OpenTelemetry observability?" → See references/operations-guide.md

Stage 8: EVALUATE (Performance Assessment)

Objective: Comprehensively assess agent performance and quality

Activities:

Offline Evaluation - Benchmarks, RAG metrics, safety testing
Online Evaluation - A/B tests, shadow traffic, canary analysis
Human Feedback - In-product feedback, review queues
Success Metrics - Acceptance rate, win rate, task completion

Guidance Provided:

Evaluation framework templates
Metric calculation methods
Feedback collection patterns
Feedback → Eval Case pipeline

💡 Progressive Hint: "Define evaluation framework (Offline/Online/Human)?" → See references/operations-guide.md

Stage 9: ITERATE (Continuous Improvement)

Objective: Plan for feedback and iteration

Activities:

Feedback Collection - User input, metrics, errors, usage patterns
Improvement Identification - What worked/didn't, missing features
Incremental Updates - Bug fixes, new tools, prompt improvements
Documentation Updates - Keep docs current with changes
Learning Capture - Update catalogs and patterns
Feedback Loop Closure - Convert feedback into durable eval cases

💡 Progressive Hint: "Setup feedback loops?" → See references/operations-guide.md

Decision Frameworks

SDK Selection Matrix

Choose Claude SDK when:

✅ Excellent reasoning required (complex decision-making)
✅ Anthropic models preferred
✅ Simple subagent support sufficient
✅ Conversational agents
⚠️ Multi-agent orchestration limited
⚠️ Anthropic-only (not multi-provider)

Choose OpenAI Agents when:

✅ Multi-agent coordination critical
✅ Parallel + handoff patterns needed
✅ Built-in session memory valuable (SQLite)
✅ OpenAI models preferred
⚠️ OpenAI-only (not multi-provider)

Choose Strands Agents when:

✅ Model flexibility required (multi-provider)
✅ Lightweight deployment
✅ Extensive MCP integration
✅ Model-driven architecture
✅ Google Drive, Slack, etc. integrations

Choose LangGraph when:

✅ Complex state management required
✅ Multi-step workflows with branching
✅ Workflow visualization needed
✅ Python-only acceptable
✅ LangChain ecosystem integration

Language Selection Criteria

Choose TypeScript when:

Web/API integrations primary focus
Production web agents
Bun runtime benefits valued
Strict compile-time typing preferred
Validation via Zod

Choose Python when:

Data science/ML integrations needed
Rich scientific computing ecosystem required
Async patterns (asyncio) important
Validation via Pydantic
NumPy, Pandas, SciPy, etc. needed

Complexity Level Guidance

Low Complexity (1-3 tools, 0 subagents)

Single-purpose agents
Direct context passing
Simple workflows (1-3 stages)
Example: Document Q&A, simple search

Medium Complexity (4-7 tools, 0-2 subagents)

Multi-stage workflows
Subagent specialization
Moderate state management
Example: Financial analysis, product comparison

High Complexity (8+ tools, 3+ subagents)

Multi-agent orchestration
Parallel execution patterns
Complex state management
Example: Portfolio analysis, multi-agent collaboration

Proven Patterns Catalog

Pattern 1: Zod/Pydantic Validation

Purpose: Type-safe input validation with runtime checking

When to use: All agents should validate tool inputs

Implementation:

// TypeScript with Zod
const ToolInputSchema = z.object({
  param: z.number().positive(),
  option: z.enum(['a', 'b', 'c'])
});
const validated = ToolInputSchema.parse(input);

# Python with Pydantic
class ToolInput(BaseModel):
    param: int = Field(gt=0)
    option: Literal['a', 'b', 'c']

Pattern 2: Subagent Specialization

Purpose: Delegate tasks by complexity or domain

When to use: Different tasks require different reasoning depth

Implementation:

Sonnet for complex reasoning and deep analysis
Haiku for optimization and quick calculations
Pass clear context in subagent prompts
Define specific success criteria

Pattern 3: Mock Data First

Purpose: Development independence from external APIs

When to use: Always (accelerates testing, enables local development)

Implementation:

Structure mock data identically to real API responses
Use clear TODO comments for replacement points
Support both mock and real via environment config
Test with mock, deploy with real

Pattern 4: Service-Based Architecture

Purpose: Separation of concerns, testability, reusability

When to use: Medium to high complexity agents

Implementation:

Models layer (Pydantic validation)
Services layer (business logic)
Tools layer (Claude-callable wrappers)

Pattern 5: Parallel Execution

Purpose: Performance improvement for independent tasks

When to use: Multiple agents/tools with no dependencies

Implementation:

# Python with asyncio
results = await asyncio.gather(
    agent1.run(task1),
    agent2.run(task2),
    agent3.run(task3)
)

Results: 65%+ faster for 3+ parallel tasks

Pattern 6: Weighted Scoring

Purpose: Multi-dimensional assessment with transparent calculation

When to use: Suitability analysis, ranking, prioritization

Implementation:

score = (
    compliance_score * 0.35 +
    risk_score * 0.25 +
    performance_score * 0.25 +
    time_horizon_score * 0.15
)

Pattern 7: Direct Context Passing (No RAG)

Purpose: Avoid RAG complexity for small document collections

When to use:

Working with 1 file at a time
Documents are 5-20 pages each
Entire document fits in LLM context window
User doesn't need semantic search across 100s of documents

Benefits: 3-step workflow vs 7+ steps with RAG

Pattern 8: Graceful Degradation

Purpose: Partial success better than total failure

When to use: Always (production-grade systems)

Implementation:

Handle missing data → Skip specific calculations, continue
Handle API unavailable → Use cached/mock data
Handle timeouts → Return partial results with warnings
Always provide user feedback on degraded operation

Quality Checklists

Before Starting Checklist

Can explain agent purpose in 1-2 sentences
Success criteria clearly defined (3-5 measurable goals)
SDK/language selection justified
Checked for similar existing patterns
Confirmed this is the simplest solution

Design Complete Checklist

Agent persona clearly defined
3-5 tools designed with input/output specs
Workflow stages identified and ordered
Data model designed with type safety
Pattern selections justified
Edge cases identified
Known limitations documented

Implementation Ready Checklist

All tools have validation schemas
Error handling strategy defined
Mock data structure planned
Test strategy outlined
Documentation templates ready

Production Ready Checklist

All tools handle errors gracefully
Input validation on all tools (Zod/Pydantic)
Type safety throughout
Naming conventions followed
Comprehensive CLAUDE.md exists
README.md with quick start
Manual QA scenarios tested
Edge cases handled
Performance acceptable (<2min workflows)

How to Use This Skill

Starting Phase 1: Design & Documentation

Simply say:

"Help me design an agent for [use case]"
"I want to build an agent that [does something]"
"Guide me through planning an agent for [problem]"

The skill will:

Ask interactive questions through BRAINSTORM and DESIGN stages
Help you make informed decisions using frameworks and patterns
Generate comprehensive design documentation
Create scaffolding manifest for Phase 2

Starting Phase 2: Scaffolding

Provide the design document from Phase 1:

"Scaffold this agent design in [target directory]"
"Generate the project structure for this design"
"Create the boilerplate from this design document"

The skill will:

Parse your design document
Create directory structure
Generate boilerplate code
Setup configuration files
Create pre-filled documentation

Outputs

Phase 1 Outputs

1. Agent Design Document (agent-design-document.md)

Complete agent specification
All architectural decisions documented
Tool specifications with validation schemas
Workflow stages defined
Pattern selections justified
Scaffolding instructions included
Human-readable and comprehensive

2. Scaffolding Manifest (scaffolding-manifest.json)

Machine-readable specification
File structure definition
Configuration parameters
Code generation instructions
Used by Phase 2 for automation

Phase 2 Outputs

1. Project Structure

project-name/
├── src/
│   ├── index.ts (or main.py)
│   ├── types/
│   ├── tools/
│   └── data/
├── .claude/
│   └── CLAUDE.md
├── tests/
├── package.json (or requirements.txt)
├── .env.example
├── README.md
└── tsconfig.json (or pyproject.toml)

2. Boilerplate Code

Agent configuration with system prompt
Tool templates with validation
Type definitions and schemas
Mock data structures
Error handling patterns

3. Configuration Files

Package manager config (package.json / requirements.txt)
TypeScript config (tsconfig.json) or Python config (pyproject.toml)
Environment template (.env.example)
Git ignore rules

4. Documentation

Pre-filled CLAUDE.md from design
README.md with quick start
Inline code comments
Test structure

Best Practices

Keep It Simple

Start with 3-5 tools, add more only if needed
Use 0-2 subagents for most agents
Apply patterns only when they solve real problems
Avoid over-engineering

Document as You Go

Update design document if requirements change
Keep CLAUDE.md current
Document "why" not just "what"
Capture learnings in memory system

Validate Early

Use Zod/Pydantic from the start
Handle errors at tool boundaries
Test with invalid inputs immediately
Don't defer error handling

Think Production-Grade

Error handling from day one
No "fix it later" mentality
Mock data structured like production data
Documentation complete before deployment

Iterate Incrementally

Build tools one at a time
Test each tool individually
Add complexity gradually
Commit often

Common Pitfalls to Avoid

❌ Over-engineering - Building for imaginary future needs ✅ Do instead: Build exactly what's needed now, refactor later

❌ Too many tools - Starting with 10+ tools ✅ Do instead: Start with 3-5 focused tools, add as needed

❌ Skipping validation - "I'll add validation later" ✅ Do instead: Zod/Pydantic schemas from the first tool

❌ Unclear workflows - Agent does "everything" ✅ Do instead: Define clear stages (Discovery → Analysis → Recommendation)

❌ Poor error handling - Assuming APIs always work ✅ Do instead: Graceful degradation, clear error messages

❌ Documentation debt - "I'll document it when it's done" ✅ Do instead: Design document before code, update as you build

❌ Premature optimization - Complex caching before measuring performance ✅ Do instead: Build simple version, measure, then optimize

Examples

See the examples/ directory for:

Complete design document example
Sample scaffolding output
Before/after comparisons
Real-world agent designs

Progressive Standards Integration

This skill offers optional progressive hints at key decision points throughout the workflow. When you see a hint, you can choose to apply the referenced standard or skip it based on your project's needs.

How Progressive Hints Work

At specific stages, you'll be prompted with questions like:

💡 "Apply 6 Principles decision framework?" 💡 "Classify with Agent Taxonomy (Level 0-4)?" 💡 "Apply deployment gates checklist?"

If you answer Yes: The skill will guide you through applying that standard, referencing the appropriate documentation.

If you answer No: The skill continues without enforcement—standards are guidance, not gates.

When Standards Apply

Scope	Apply These Standards
All Agents	Zod/Pydantic validation, error handling, basic logging
Production	6 Principles check, testing pyramid (80/15/5), deployment gates
Enterprise	Full OTel observability, evaluation framework, SPIFFE identity, A2A protocol

Hint Trigger Points

Stage	Hints Available
BRAINSTORM	"Apply 6 Principles decision framework?", "Classify with Agent Taxonomy?"
DESIGN	"Apply A2A protocol standards?", "Need memory strategy guidance?"
IMPLEMENT	"Apply coding standards (lint/validation)?"
TEST	"Apply testing pyramid (80/15/5)?"
DEPLOY	"Apply deployment gates checklist?"
OBSERVE	"Setup OpenTelemetry observability?"
EVALUATE	"Define evaluation framework?"
ITERATE	"Setup feedback loops?"

Standards & Principles

This skill integrates organizational standards and foundational principles. Reference documents provide quick application guidance; source documents contain full specifications.

Quick Reference Files (Application Guides)

references/principles-framework.md - 6 foundational principles (TRUTH, HONOR, EXCELLENCE, SERVE, PERSEVERE, SHARPEN)
references/agent-taxonomy.md - Agent classification (Level 0-4), A2A protocol
references/memory-strategies.md - STM/LTM/Hybrid, RAG vs NL2SQL decisions
references/coding-standards.md - Validation (Zod/Pydantic), testing pyramid, linting
references/operations-guide.md - Deploy, Observe, Evaluate guidance
references/standards-integration.md - Stage-to-standards mapping, quick links

Source Documents (Full Standards)

docs/principles/CORE_FRAMEWORK.md - Biblical principles, cascade framework
docs/standards/AGENTIC.md - Agent architecture, lifecycle, memory, A2A
docs/standards/DEVELOPMENT.md - Coding standards, testing, git workflow
docs/standards/OPERATIONS.md - Observability, evaluation, security

Support References

references/workflow-stages.md - Detailed stage instructions and questioning templates
references/decision-frameworks.md - Complete SDK selection matrices and criteria
references/pattern-catalog.md - All proven patterns with implementation details
references/quality-checklists.md - Comprehensive quality validation criteria
references/principles-framework.md - 6 foundational principles application guide
references/agent-taxonomy.md - Agent classification and A2A protocol
references/memory-strategies.md - Memory strategy selection guide
references/coding-standards.md - Coding standards and validation guide
references/operations-guide.md - Operations, deployment, and evaluation guide
references/standards-integration.md - Stage-to-standards mapping
templates/design-document-template.md - Template for design output
templates/scaffolding-manifest.json - Template for manifest output

Version History

v1.1.0 (2025-01-24)

Expanded to 9-stage workflow (added DEPLOY, OBSERVE, EVALUATE)
Added Phase 3: Operations
Integrated 6 foundational principles (TRUTH, HONOR, EXCELLENCE, SERVE, PERSEVERE, SHARPEN)
Added progressive hints system for optional standards application
Added reference files for principles, agent taxonomy, memory strategies, coding standards, operations
Added standards-integration.md for stage-to-standards mapping

v1.0.0 (2025-01-24)

Initial release
Complete 6-stage workflow
Two-phase operation (Design + Scaffolding)
Interactive questioning framework
Proven patterns catalog
Decision frameworks
Quality checklists

Remember: Excellence is not about complexity. It's about doing simple things consistently well, with transparency, clear documentation, and continuous improvement.

Install Skill

SKILL.md

Agent Builder Skill

Overview

When to Use This Skill

How This Skill Works

Two-Phase Workflow

Phase 1: Design & Documentation (Interactive)

Phase 2: Scaffolding & Implementation

Interactive Questioning Framework

Complete Workflow Stages

Stage 1: BRAINSTORM (Define Problem Space)

Stage 2: DESIGN (Create Architectural Blueprint)

Stage 3: IMPLEMENT (Guided Implementation)

Stage 4: TEST (Validation Strategy)

Stage 5: DOCUMENT (Comprehensive Documentation)

Stage 6: DEPLOY (Production Release)

Stage 7: OBSERVE (Production Monitoring)

Stage 8: EVALUATE (Performance Assessment)

Stage 9: ITERATE (Continuous Improvement)

Decision Frameworks

SDK Selection Matrix

Language Selection Criteria

Complexity Level Guidance

Proven Patterns Catalog

Pattern 1: Zod/Pydantic Validation

Pattern 2: Subagent Specialization

Pattern 3: Mock Data First

Pattern 4: Service-Based Architecture

Pattern 5: Parallel Execution

Pattern 6: Weighted Scoring

Pattern 7: Direct Context Passing (No RAG)

Pattern 8: Graceful Degradation

Quality Checklists

Before Starting Checklist

Design Complete Checklist

Implementation Ready Checklist

Production Ready Checklist

How to Use This Skill

Starting Phase 1: Design & Documentation

Starting Phase 2: Scaffolding

Outputs

Phase 1 Outputs

Phase 2 Outputs

Best Practices

Keep It Simple

Document as You Go

Validate Early

Think Production-Grade

Iterate Incrementally

Common Pitfalls to Avoid

Examples

Progressive Standards Integration

How Progressive Hints Work

When Standards Apply

Hint Trigger Points

Standards & Principles

Quick Reference Files (Application Guides)

Source Documents (Full Standards)

Support References

Version History