| name | agent-builder |
| description | Interactive AI agent development workflow orchestrator. Guides users through complete agent creation from brainstorming to production-ready implementation. Use when user wants to build a new AI agent, design an agentic system, scaffold an agent project, or needs help with agent architecture decisions. Supports three-phase workflow - design documentation, implementation scaffolding, and operations (deploy/observe/evaluate). |
| version | 1.1.0 |
| last-updated | Fri Jan 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) |
Agent Builder Skill
Overview
The Agent Builder skill is an interactive workflow orchestrator that guides you through building production-ready AI agents from concept to deployment. It implements a proven 9-stage methodology and supports two-phase operation for maximum flexibility:
- Phase 1 (Design): Interactive planning and comprehensive design documentation generation
- Phase 2 (Scaffolding): Project setup and boilerplate code generation in any target location
- Phase 3 (Operations): Deploy, observe, and evaluate in production with continuous improvement
The 9 Stages:
- BRAINSTORM → 2. DESIGN → 3. IMPLEMENT → 4. TEST → 5. DOCUMENT → 6. DEPLOY → 7. OBSERVE → 8. EVALUATE → 9. ITERATE
This skill helps you make informed architectural decisions, apply proven patterns, avoid common pitfalls, and produce high-quality documentation from the start. It integrates 6 foundational principles and organizational standards through optional progressive hints at key decision points.
When to Use This Skill
Activate this skill when you:
- Want to build a new AI agent from scratch
- Need to design an agentic system or multi-agent architecture
- Are planning an agent project and need architecture guidance
- Want to scaffold an agent project with proper structure
- Need help with SDK selection (Claude SDK, OpenAI Agents, Strands, LangGraph, etc.)
- Want to apply proven agent patterns to your use case
- Need comprehensive design documentation before implementation
- Want to separate design and implementation phases
Trigger phrases:
- "Help me build an agent for..."
- "I want to create an AI agent that..."
- "Guide me through designing an agent for..."
- "What's the best way to build an agent that..."
- "I need to scaffold a new agent project"
- "Help me choose the right SDK for..."
How This Skill Works
Two-Phase Workflow
Phase 1: Design & Documentation (Interactive)
Purpose: Plan and document your agent thoroughly before writing code
Process: Interactive guided questioning through:
- BRAINSTORM - Define problem, use case, success criteria, choose SDK/language
- DESIGN - Define agent persona, design tools, plan workflow, choose patterns
Output:
agent-design-document.md- Comprehensive human-readable design specificationscaffolding-manifest.json- Machine-readable specification for code generation
Key Benefits:
- Document in any location (doesn't require target project to exist)
- Review and approve design before implementation
- Version control design decisions
- Share designs with team for feedback
- Portable between projects
Phase 2: Scaffolding & Implementation
Purpose: Generate project structure and boilerplate from design document
Input: Design document from Phase 1
Process:
- Parse design document and manifest
- Create directory structure
- Generate boilerplate code with placeholders
- Create configuration files
- Generate pre-filled documentation
- Setup testing structure
Output: Complete project structure ready for customization
Key Benefits:
- Can scaffold in different location than design
- Consistent project structure
- Pre-filled documentation
- Validation schemas scaffolded
- 2-4 hours of setup time saved
Interactive Questioning Framework
The skill asks targeted questions at each stage to gather requirements and guide decisions. Questions are based on proven best practices and real-world patterns.
Question Types:
- Decision questions - Choose between well-defined options (SDK, language, complexity)
- Definition questions - Describe use case, success criteria, tool specifications
- Validation questions - Confirm decisions, check for missing requirements
- Pattern questions - Identify which proven patterns apply to your use case
Complete Workflow Stages
Stage 1: BRAINSTORM (Define Problem Space)
Objective: Understand the problem and make foundational decisions
Questions Asked:
Use Case Definition
- What problem does this agent solve?
- Who are the primary users?
- What makes this valuable?
- What would success look like?
Success Criteria
- What are 3-5 measurable goals?
- What are the key edge cases to handle?
- What are acceptable limitations?
SDK Selection
- Need multi-agent orchestration?
- Model flexibility required?
- Complex state management needed?
- Performance requirements?
- Recommendation: Claude SDK / OpenAI Agents / Strands / LangGraph
Language Selection
- TypeScript (type safety, web/API focus, Bun runtime)
- Python (data science, ML libraries, asyncio)
Complexity Estimation
- Low (1-3 tools, no subagents, single purpose)
- Medium (4-7 tools, 0-2 subagents, moderate workflows)
- High (8+ tools, 3+ subagents, complex orchestration)
Output: Clear problem statement, success criteria, SDK/language selected, complexity assessed
Stage 2: DESIGN (Create Architectural Blueprint)
Objective: Design the complete agent architecture
Questions Asked:
Agent Persona & Role
- What expertise does the agent have?
- What communication style is appropriate?
- What are the agent's limitations?
- What should the agent never do?
Tool Design (3-5 recommended to start)
- Tool name and purpose
- Input parameters and types
- Output structure and types
- Validation requirements (Zod/Pydantic schemas)
- Error conditions to handle
Workflow Design
- What are the conversation stages?
- How does the conversation progress?
- What information is collected at each stage?
- How does the workflow handle failures?
Subagent Planning (if needed)
- What specialized tasks need delegation?
- Which model for which subagent? (Sonnet for analysis, Haiku for optimization)
- How do subagents communicate?
- What context do they need?
Data Model Design
- What data structures are needed?
- How is state managed?
- What needs persistence?
- Type safety approach (Zod for TS, Pydantic for Python)
MCP Integration (if applicable)
- Which MCP servers are needed?
- What documentation sources?
- Custom MCP tools required?
Pattern Selection
- Which proven patterns apply? (see Pattern Catalog)
- Validation pattern (Zod/Pydantic)
- Subagent pattern (Specialization, Parallel, Handoff)
- Data pattern (Mock-first, Service-based, Direct context)
- Scoring pattern (Weighted, Multi-criteria, Suitability)
Output: Complete architectural blueprint with all decisions documented
Stage 3: IMPLEMENT (Guided Implementation)
Note: This stage begins Phase 2 (Scaffolding)
Objective: Generate project structure and guide implementation
Activities:
- Project Setup - Create directory structure, install dependencies
- Tool Scaffolding - Generate tool templates with validation
- Agent Configuration - Create agent config with system prompt
- Data Model Setup - Generate type definitions and schemas
- Mock Data - Create realistic mock data structures
- Testing Structure - Setup test framework and initial tests
Guidance Provided:
- Best practices for tool implementation
- Error handling patterns
- Validation schema examples
- Type safety recommendations
- Naming conventions
- Code organization
Stage 4: TEST (Validation Strategy)
Objective: Plan comprehensive testing approach
Test Levels:
- Unit Tests - Individual tool testing
- Integration Tests - End-to-end workflows
- Edge Case Tests - Invalid inputs, failures, timeouts
- Performance Tests - Response times, token usage
- Manual QA - Realistic scenarios, UX verification
Guidance Provided:
- Test structure templates
- Critical test scenarios
- Edge cases to consider
- Performance benchmarks
- QA checklist
Stage 5: DOCUMENT (Comprehensive Documentation)
Objective: Create production-grade documentation
Documents Generated:
Agent CLAUDE.md - Complete agent documentation
- Overview and purpose
- Directory structure
- Setup instructions
- Tool descriptions
- Workflow explanation
- Testing approach
- Known limitations
- Key decisions and rationale
README.md - Quick start guide
- Installation steps
- Basic usage
- Configuration
- Examples
Comparison Matrix Entry - Template for comparison documentation
Memory System Entries - Knowledge graph templates (entities, relations, observations)
Tool Catalog Contributions - Reusable pattern documentation
Stage 6: DEPLOY (Production Release)
Objective: Safely release agent to production
Note: This stage begins Phase 3 (Operations)
Activities:
- Deployment Gates - Offline evals → Red team → Staging → Production
- Rollout Strategy - Canary (1%) → Limited (10%) → Gradual (50%) → Full (100%)
- Rollback Planning - Automatic triggers, manual procedures
- Agent Identity - Register SPIFFE ID (if applicable)
- Inventory Documentation - Add to central agent inventory
Guidance Provided:
- Deployment gates checklist
- Rollout strategy templates
- Rollback trigger criteria
- Security hardening requirements
💡 Progressive Hint: "Apply deployment gates checklist?" → See
references/operations-guide.md
Stage 7: OBSERVE (Production Monitoring)
Objective: Monitor agent behavior and performance in production
Activities:
- Tracing Setup - Root span per request with OTel taxonomy
- Metrics Configuration - Latency (p50/p90/p95/p99), token usage, cost
- Logging Standards - Structured JSON, no PII, appropriate levels
- Alerting - Define thresholds and notification channels
Guidance Provided:
- OpenTelemetry configuration templates
- Required trace attributes
- Metric collection standards
- Log sanitization guidelines
💡 Progressive Hint: "Setup OpenTelemetry observability?" → See
references/operations-guide.md
Stage 8: EVALUATE (Performance Assessment)
Objective: Comprehensively assess agent performance and quality
Activities:
- Offline Evaluation - Benchmarks, RAG metrics, safety testing
- Online Evaluation - A/B tests, shadow traffic, canary analysis
- Human Feedback - In-product feedback, review queues
- Success Metrics - Acceptance rate, win rate, task completion
Guidance Provided:
- Evaluation framework templates
- Metric calculation methods
- Feedback collection patterns
- Feedback → Eval Case pipeline
💡 Progressive Hint: "Define evaluation framework (Offline/Online/Human)?" → See
references/operations-guide.md
Stage 9: ITERATE (Continuous Improvement)
Objective: Plan for feedback and iteration
Activities:
- Feedback Collection - User input, metrics, errors, usage patterns
- Improvement Identification - What worked/didn't, missing features
- Incremental Updates - Bug fixes, new tools, prompt improvements
- Documentation Updates - Keep docs current with changes
- Learning Capture - Update catalogs and patterns
- Feedback Loop Closure - Convert feedback into durable eval cases
💡 Progressive Hint: "Setup feedback loops?" → See
references/operations-guide.md
Decision Frameworks
SDK Selection Matrix
Choose Claude SDK when:
- ✅ Excellent reasoning required (complex decision-making)
- ✅ Anthropic models preferred
- ✅ Simple subagent support sufficient
- ✅ Conversational agents
- ⚠️ Multi-agent orchestration limited
- ⚠️ Anthropic-only (not multi-provider)
Choose OpenAI Agents when:
- ✅ Multi-agent coordination critical
- ✅ Parallel + handoff patterns needed
- ✅ Built-in session memory valuable (SQLite)
- ✅ OpenAI models preferred
- ⚠️ OpenAI-only (not multi-provider)
Choose Strands Agents when:
- ✅ Model flexibility required (multi-provider)
- ✅ Lightweight deployment
- ✅ Extensive MCP integration
- ✅ Model-driven architecture
- ✅ Google Drive, Slack, etc. integrations
Choose LangGraph when:
- ✅ Complex state management required
- ✅ Multi-step workflows with branching
- ✅ Workflow visualization needed
- ✅ Python-only acceptable
- ✅ LangChain ecosystem integration
Language Selection Criteria
Choose TypeScript when:
- Web/API integrations primary focus
- Production web agents
- Bun runtime benefits valued
- Strict compile-time typing preferred
- Validation via Zod
Choose Python when:
- Data science/ML integrations needed
- Rich scientific computing ecosystem required
- Async patterns (asyncio) important
- Validation via Pydantic
- NumPy, Pandas, SciPy, etc. needed
Complexity Level Guidance
Low Complexity (1-3 tools, 0 subagents)
- Single-purpose agents
- Direct context passing
- Simple workflows (1-3 stages)
- Example: Document Q&A, simple search
Medium Complexity (4-7 tools, 0-2 subagents)
- Multi-stage workflows
- Subagent specialization
- Moderate state management
- Example: Financial analysis, product comparison
High Complexity (8+ tools, 3+ subagents)
- Multi-agent orchestration
- Parallel execution patterns
- Complex state management
- Example: Portfolio analysis, multi-agent collaboration
Proven Patterns Catalog
Pattern 1: Zod/Pydantic Validation
Purpose: Type-safe input validation with runtime checking
When to use: All agents should validate tool inputs
Implementation:
// TypeScript with Zod
const ToolInputSchema = z.object({
param: z.number().positive(),
option: z.enum(['a', 'b', 'c'])
});
const validated = ToolInputSchema.parse(input);
# Python with Pydantic
class ToolInput(BaseModel):
param: int = Field(gt=0)
option: Literal['a', 'b', 'c']
Pattern 2: Subagent Specialization
Purpose: Delegate tasks by complexity or domain
When to use: Different tasks require different reasoning depth
Implementation:
- Sonnet for complex reasoning and deep analysis
- Haiku for optimization and quick calculations
- Pass clear context in subagent prompts
- Define specific success criteria
Pattern 3: Mock Data First
Purpose: Development independence from external APIs
When to use: Always (accelerates testing, enables local development)
Implementation:
- Structure mock data identically to real API responses
- Use clear TODO comments for replacement points
- Support both mock and real via environment config
- Test with mock, deploy with real
Pattern 4: Service-Based Architecture
Purpose: Separation of concerns, testability, reusability
When to use: Medium to high complexity agents
Implementation:
- Models layer (Pydantic validation)
- Services layer (business logic)
- Tools layer (Claude-callable wrappers)
Pattern 5: Parallel Execution
Purpose: Performance improvement for independent tasks
When to use: Multiple agents/tools with no dependencies
Implementation:
# Python with asyncio
results = await asyncio.gather(
agent1.run(task1),
agent2.run(task2),
agent3.run(task3)
)
Results: 65%+ faster for 3+ parallel tasks
Pattern 6: Weighted Scoring
Purpose: Multi-dimensional assessment with transparent calculation
When to use: Suitability analysis, ranking, prioritization
Implementation:
score = (
compliance_score * 0.35 +
risk_score * 0.25 +
performance_score * 0.25 +
time_horizon_score * 0.15
)
Pattern 7: Direct Context Passing (No RAG)
Purpose: Avoid RAG complexity for small document collections
When to use:
- Working with 1 file at a time
- Documents are 5-20 pages each
- Entire document fits in LLM context window
- User doesn't need semantic search across 100s of documents
Benefits: 3-step workflow vs 7+ steps with RAG
Pattern 8: Graceful Degradation
Purpose: Partial success better than total failure
When to use: Always (production-grade systems)
Implementation:
- Handle missing data → Skip specific calculations, continue
- Handle API unavailable → Use cached/mock data
- Handle timeouts → Return partial results with warnings
- Always provide user feedback on degraded operation
Quality Checklists
Before Starting Checklist
- Can explain agent purpose in 1-2 sentences
- Success criteria clearly defined (3-5 measurable goals)
- SDK/language selection justified
- Checked for similar existing patterns
- Confirmed this is the simplest solution
Design Complete Checklist
- Agent persona clearly defined
- 3-5 tools designed with input/output specs
- Workflow stages identified and ordered
- Data model designed with type safety
- Pattern selections justified
- Edge cases identified
- Known limitations documented
Implementation Ready Checklist
- All tools have validation schemas
- Error handling strategy defined
- Mock data structure planned
- Test strategy outlined
- Documentation templates ready
Production Ready Checklist
- All tools handle errors gracefully
- Input validation on all tools (Zod/Pydantic)
- Type safety throughout
- Naming conventions followed
- Comprehensive CLAUDE.md exists
- README.md with quick start
- Manual QA scenarios tested
- Edge cases handled
- Performance acceptable (<2min workflows)
How to Use This Skill
Starting Phase 1: Design & Documentation
Simply say:
- "Help me design an agent for [use case]"
- "I want to build an agent that [does something]"
- "Guide me through planning an agent for [problem]"
The skill will:
- Ask interactive questions through BRAINSTORM and DESIGN stages
- Help you make informed decisions using frameworks and patterns
- Generate comprehensive design documentation
- Create scaffolding manifest for Phase 2
Starting Phase 2: Scaffolding
Provide the design document from Phase 1:
- "Scaffold this agent design in [target directory]"
- "Generate the project structure for this design"
- "Create the boilerplate from this design document"
The skill will:
- Parse your design document
- Create directory structure
- Generate boilerplate code
- Setup configuration files
- Create pre-filled documentation
Outputs
Phase 1 Outputs
1. Agent Design Document (agent-design-document.md)
- Complete agent specification
- All architectural decisions documented
- Tool specifications with validation schemas
- Workflow stages defined
- Pattern selections justified
- Scaffolding instructions included
- Human-readable and comprehensive
2. Scaffolding Manifest (scaffolding-manifest.json)
- Machine-readable specification
- File structure definition
- Configuration parameters
- Code generation instructions
- Used by Phase 2 for automation
Phase 2 Outputs
1. Project Structure
project-name/
├── src/
│ ├── index.ts (or main.py)
│ ├── types/
│ ├── tools/
│ └── data/
├── .claude/
│ └── CLAUDE.md
├── tests/
├── package.json (or requirements.txt)
├── .env.example
├── README.md
└── tsconfig.json (or pyproject.toml)
2. Boilerplate Code
- Agent configuration with system prompt
- Tool templates with validation
- Type definitions and schemas
- Mock data structures
- Error handling patterns
3. Configuration Files
- Package manager config (package.json / requirements.txt)
- TypeScript config (tsconfig.json) or Python config (pyproject.toml)
- Environment template (.env.example)
- Git ignore rules
4. Documentation
- Pre-filled CLAUDE.md from design
- README.md with quick start
- Inline code comments
- Test structure
Best Practices
Keep It Simple
- Start with 3-5 tools, add more only if needed
- Use 0-2 subagents for most agents
- Apply patterns only when they solve real problems
- Avoid over-engineering
Document as You Go
- Update design document if requirements change
- Keep CLAUDE.md current
- Document "why" not just "what"
- Capture learnings in memory system
Validate Early
- Use Zod/Pydantic from the start
- Handle errors at tool boundaries
- Test with invalid inputs immediately
- Don't defer error handling
Think Production-Grade
- Error handling from day one
- No "fix it later" mentality
- Mock data structured like production data
- Documentation complete before deployment
Iterate Incrementally
- Build tools one at a time
- Test each tool individually
- Add complexity gradually
- Commit often
Common Pitfalls to Avoid
❌ Over-engineering - Building for imaginary future needs ✅ Do instead: Build exactly what's needed now, refactor later
❌ Too many tools - Starting with 10+ tools ✅ Do instead: Start with 3-5 focused tools, add as needed
❌ Skipping validation - "I'll add validation later" ✅ Do instead: Zod/Pydantic schemas from the first tool
❌ Unclear workflows - Agent does "everything" ✅ Do instead: Define clear stages (Discovery → Analysis → Recommendation)
❌ Poor error handling - Assuming APIs always work ✅ Do instead: Graceful degradation, clear error messages
❌ Documentation debt - "I'll document it when it's done" ✅ Do instead: Design document before code, update as you build
❌ Premature optimization - Complex caching before measuring performance ✅ Do instead: Build simple version, measure, then optimize
Examples
See the examples/ directory for:
- Complete design document example
- Sample scaffolding output
- Before/after comparisons
- Real-world agent designs
Progressive Standards Integration
This skill offers optional progressive hints at key decision points throughout the workflow. When you see a hint, you can choose to apply the referenced standard or skip it based on your project's needs.
How Progressive Hints Work
At specific stages, you'll be prompted with questions like:
💡 "Apply 6 Principles decision framework?" 💡 "Classify with Agent Taxonomy (Level 0-4)?" 💡 "Apply deployment gates checklist?"
If you answer Yes: The skill will guide you through applying that standard, referencing the appropriate documentation.
If you answer No: The skill continues without enforcement—standards are guidance, not gates.
When Standards Apply
| Scope | Apply These Standards |
|---|---|
| All Agents | Zod/Pydantic validation, error handling, basic logging |
| Production | 6 Principles check, testing pyramid (80/15/5), deployment gates |
| Enterprise | Full OTel observability, evaluation framework, SPIFFE identity, A2A protocol |
Hint Trigger Points
| Stage | Hints Available |
|---|---|
| BRAINSTORM | "Apply 6 Principles decision framework?", "Classify with Agent Taxonomy?" |
| DESIGN | "Apply A2A protocol standards?", "Need memory strategy guidance?" |
| IMPLEMENT | "Apply coding standards (lint/validation)?" |
| TEST | "Apply testing pyramid (80/15/5)?" |
| DEPLOY | "Apply deployment gates checklist?" |
| OBSERVE | "Setup OpenTelemetry observability?" |
| EVALUATE | "Define evaluation framework?" |
| ITERATE | "Setup feedback loops?" |
Standards & Principles
This skill integrates organizational standards and foundational principles. Reference documents provide quick application guidance; source documents contain full specifications.
Quick Reference Files (Application Guides)
references/principles-framework.md- 6 foundational principles (TRUTH, HONOR, EXCELLENCE, SERVE, PERSEVERE, SHARPEN)references/agent-taxonomy.md- Agent classification (Level 0-4), A2A protocolreferences/memory-strategies.md- STM/LTM/Hybrid, RAG vs NL2SQL decisionsreferences/coding-standards.md- Validation (Zod/Pydantic), testing pyramid, lintingreferences/operations-guide.md- Deploy, Observe, Evaluate guidancereferences/standards-integration.md- Stage-to-standards mapping, quick links
Source Documents (Full Standards)
docs/principles/CORE_FRAMEWORK.md- Biblical principles, cascade frameworkdocs/standards/AGENTIC.md- Agent architecture, lifecycle, memory, A2Adocs/standards/DEVELOPMENT.md- Coding standards, testing, git workflowdocs/standards/OPERATIONS.md- Observability, evaluation, security
Support References
references/workflow-stages.md- Detailed stage instructions and questioning templatesreferences/decision-frameworks.md- Complete SDK selection matrices and criteriareferences/pattern-catalog.md- All proven patterns with implementation detailsreferences/quality-checklists.md- Comprehensive quality validation criteriareferences/principles-framework.md- 6 foundational principles application guidereferences/agent-taxonomy.md- Agent classification and A2A protocolreferences/memory-strategies.md- Memory strategy selection guidereferences/coding-standards.md- Coding standards and validation guidereferences/operations-guide.md- Operations, deployment, and evaluation guidereferences/standards-integration.md- Stage-to-standards mappingtemplates/design-document-template.md- Template for design outputtemplates/scaffolding-manifest.json- Template for manifest output
Version History
v1.1.0 (2025-01-24)
- Expanded to 9-stage workflow (added DEPLOY, OBSERVE, EVALUATE)
- Added Phase 3: Operations
- Integrated 6 foundational principles (TRUTH, HONOR, EXCELLENCE, SERVE, PERSEVERE, SHARPEN)
- Added progressive hints system for optional standards application
- Added reference files for principles, agent taxonomy, memory strategies, coding standards, operations
- Added standards-integration.md for stage-to-standards mapping
v1.0.0 (2025-01-24)
- Initial release
- Complete 6-stage workflow
- Two-phase operation (Design + Scaffolding)
- Interactive questioning framework
- Proven patterns catalog
- Decision frameworks
- Quality checklists
Remember: Excellence is not about complexity. It's about doing simple things consistently well, with transparency, clear documentation, and continuous improvement.