| name | ai-tool-designer |
| description | Guide for designing effective tools for AI agents. Use when creating tools for custom agent systems or any AI tool interfaces. Provides principles for tool naming, input/output design, error handling, and evaluation methodologies that maximize agent effectiveness. |
| license | Complete terms in LICENSE.txt |
AI Agent Tool Designer
Overview
This skill provides comprehensive guidance for designing tools that AI agents can use effectively. Whether building custom agent tools or any AI-accessible interfaces, these principles maximize agent success in accomplishing real-world tasks.
Note: Use the more specific mcp-builder skill if you want to create an MCP server.
The quality of a tool system is measured not by how comprehensively it implements features, but by how well it enables AI agents to accomplish realistic, complex tasks using only the tools provided.
Agent-Centric Design Principles
Before implementing any tool system, understand these foundational principles for designing tools that AI agents can use effectively:
1. Build for Workflows, Not Just API Endpoints
Principle: Design thoughtful, high-impact workflow tools rather than simply wrapping existing API endpoints.
Why it matters: Agents need to accomplish complete tasks, not just make individual API calls. Tools that consolidate related operations reduce the number of steps agents must take and improve success rates.
How to apply:
- Consolidate related operations (e.g.,
schedule_eventthat both checks availability and creates the event) - Focus on tools that enable complete tasks, not just individual API calls
- Consider what workflows agents actually need to accomplish, not just what the underlying API offers
- Ask: "What is the user trying to accomplish?" rather than "What does the API provide?"
Examples:
- ❌ Bad: Separate tools
check_calendar_availability,create_calendar_event,send_event_notification - ✅ Good: Single tool
schedule_eventwith parameters for checking conflicts and sending notifications
2. Optimize for Limited Context
Principle: Agents have constrained context windows - make every token count.
Why it matters: When agents run out of context, they fail to complete tasks. Verbose tool outputs force agents to make difficult decisions about what information to keep or discard.
How to apply:
- Return high-signal information, not exhaustive data dumps
- Provide "concise" vs "detailed" response format options (default to concise)
- Default to human-readable identifiers over technical codes (names over IDs when possible)
- Consider the agent's context budget as a scarce resource
- Implement character limits and graceful truncation (typically 25,000 characters)
- Use pagination with reasonable defaults (20-50 items)
Examples:
- ❌ Bad: Return all 50 fields from user object including metadata, internal IDs, timestamps in multiple formats
- ✅ Good: Return name, email, role, and key status fields; offer
detailed=trueparameter for full data
3. Design Actionable Error Messages
Principle: Error messages should guide agents toward correct usage patterns, not just report failures.
Why it matters: Agents learn tool usage through feedback. Clear, educational errors help agents self-correct and succeed on retry.
How to apply:
- Suggest specific next steps in error messages
- Make errors educational, not just diagnostic
- Include examples of correct usage when parameters are invalid
- Guide agents toward solutions: "Try using filter='active_only' to reduce results"
- Avoid technical jargon; use natural language
Examples:
- ❌ Bad: "Error 400: Invalid request"
- ✅ Good: "The limit parameter must be between 1-100. You provided 500. Try using limit=50 and pagination with offset to retrieve more results."
4. Follow Natural Task Subdivisions
Principle: Tool names and organization should reflect how humans think about tasks, not just API structure.
Why it matters: Agents use tool names and descriptions to decide which tool to call. Natural naming improves tool discovery and reduces wrong tool selections.
How to apply:
- Tool names should reflect human mental models of tasks
- Group related tools with consistent prefixes for discoverability
- Design tools around natural workflows, not just API structure
- Use action-oriented naming:
search_users,create_project,send_message - Include service/system prefix to avoid conflicts:
slack_send_messagenot justsend_message
Examples:
- ❌ Bad:
api_endpoint_users_post,api_endpoint_users_get,api_endpoint_users_delete - ✅ Good:
create_user,search_users,delete_user
5. Use Evaluation-Driven Development
Principle: Create realistic evaluation scenarios early and let agent feedback drive tool improvements.
Why it matters: Only by testing tools with actual agents can you discover usability issues. Prototype quickly and iterate based on real agent performance.
How to apply:
- Create 10+ complex, realistic questions agents should answer using your tools
- Test with actual AI agents attempting to solve these questions
- Observe where agents struggle, make mistakes, or run out of context
- Iterate on tool design based on agent feedback
- Measure success by agent task completion rate, not feature completeness
Process:
- Build initial tools based on these principles
- Create evaluation questions (see Evaluation Guide)
- Test with agents
- Identify failure patterns
- Refine tools
- Repeat
Tool Design Framework
Follow this systematic framework when designing any tool for AI agents:
Phase 1: Planning
1. Identify Core Workflows
- List the most valuable operations agents need to perform
- Prioritize tools that enable the most common and important use cases
- Consider which tools work together to enable complex workflows
2. Design Input Schemas
- Use strong validation (dry-validation for Ruby, JSON Schema)
- Include proper constraints (min/max length, regex patterns, ranges)
- Provide clear, descriptive field descriptions with examples
- Set sensible defaults to reduce required parameters
3. Design Output Formats
- Support multiple formats (JSON for programmatic, Markdown for human-readable)
- Define consistent response structures across similar tools
- Plan for large-scale usage (thousands of users/resources)
- Implement character limits and truncation strategies
- Include pagination metadata (
has_more,next_offset,total_count)
4. Plan Error Handling
- Design clear, actionable, agent-friendly error messages
- Handle authentication and authorization errors gracefully
- Consider rate limiting and timeout scenarios
- Provide guidance on how to proceed after errors
Phase 2: Implementation
Tool Naming Conventions:
- Use snake_case:
search_users,create_project - Include service prefix:
github_create_issue,slack_send_message - Be action-oriented: start with verbs (get, list, search, create, update, delete)
- Be specific: avoid generic names that could conflict
Tool Descriptions: Write comprehensive descriptions that include:
- One-line summary of what the tool does
- Detailed explanation of purpose and functionality
- When to use this tool (and when NOT to use it)
- Parameter descriptions with examples
- Return value schema
- Error handling guidance
Tool Annotations (if supported by your system):
readOnlyHint: truefor read-only operationsdestructiveHint: falsefor non-destructive operationsidempotentHint: trueif repeated calls have same effectopenWorldHint: trueif interacting with external systems
Phase 3: Refinement
Code Quality Checklist:
- ✅ No duplicated code between tools (DRY principle)
- ✅ Shared logic extracted into reusable functions
- ✅ Similar operations return similar formats (consistency)
- ✅ All external calls have error handling
- ✅ Full type coverage (type hints, TypeScript types)
- ✅ Every tool has comprehensive documentation
Testing:
- Test with valid and invalid inputs
- Test error handling paths
- Test with real AI agents using evaluation questions
- Test pagination and large result sets
- Test character limits and truncation
Response Format Guidelines
All tools that return data should support multiple formats for flexibility:
JSON Format (response_format="json")
Purpose: Machine-readable structured data for programmatic processing
Best practices:
- Include all available fields and metadata
- Use consistent field names and types
- Suitable for when agents need to process data further
- Return IDs alongside names for precision
Example:
{
"users": [
{
"id": "U123456",
"name": "John Doe",
"email": "john@example.com",
"role": "developer",
"active": true
}
],
"total": 150,
"count": 20,
"has_more": true,
"next_offset": 20
}
Markdown Format (response_format="markdown", typically default)
Purpose: Human-readable formatted text for user presentation
Best practices:
- Use headers, lists, and formatting for clarity
- Convert timestamps to readable format ("2024-01-15 10:30 UTC" vs epoch)
- Show display names with IDs in parentheses ("@john.doe (U123456)")
- Omit verbose metadata (show one profile image URL, not all sizes)
- Group related information logically
- Use when presenting information to end users
Example:
## Users (20 of 150)
- **John Doe** (@john.doe)
- Email: john@example.com
- Role: Developer
- Status: Active
- **Jane Smith** (@jane.smith)
- Email: jane@example.com
- Role: Designer
- Status: Active
*Showing 20 results. Use offset=20 to see more.*
Pagination Best Practices
For tools that list resources:
Implementation requirements:
- Always respect the
limitparameter (never load all results when limit specified) - Implement offset-based or cursor-based pagination
- Return pagination metadata:
has_more,next_offset/next_cursor,total_count - Never load all results into memory for large datasets
- Default to reasonable limits (20-50 items typical)
Response structure:
{
"items": [...],
"total": 150,
"count": 20,
"offset": 0,
"has_more": true,
"next_offset": 20
}
Clear guidance in responses: Include instructions for getting more data:
- "Showing 20 of 150 results. Use offset=20 to see the next page."
- "Results truncated. Add filters to narrow the search."
Character Limits and Truncation
To prevent overwhelming context windows:
Implementation:
- Define CHARACTER_LIMIT constant (typically 25,000 characters)
- Check response size before returning
- Truncate gracefully with clear indicators
- Provide guidance on how to filter/paginate for complete results
Example handling:
CHARACTER_LIMIT = 25_000
if result.length > CHARACTER_LIMIT
truncated_data = data[0...[1, data.length / 2].max]
response[:truncated] = true
response[:truncation_message] =
"Response truncated from #{data.length} to #{truncated_data.length} items. " \
"Use 'offset' parameter or add filters like status='active' to see more."
end
Input Validation Best Practices
Security and usability:
- Validate all parameters against schema before processing
- Sanitize file paths to prevent directory traversal
- Validate URLs and external identifiers
- Check parameter sizes and ranges
- Prevent command injection in system calls
- Return clear validation errors with examples of correct format
Schema design:
- Use strong validation (dry-validation, JSON Schema)
- Include constraints (minLength, maxLength, pattern, minimum, maximum)
- Provide detailed field descriptions with examples
- Mark required vs optional parameters clearly
- Set sensible defaults where possible
Resources
This skill includes reference documentation for deeper exploration:
references/tool_design_patterns.md
Comprehensive patterns and anti-patterns for common tool design scenarios with detailed examples.
references/evaluation_guide.md
Complete methodology for creating evaluation questions that test tool effectiveness with AI agents, including how to run evaluations and interpret results.
Further Reading
For detailed examples and advanced patterns:
- Tool Design Patterns - Comprehensive patterns and examples
- Evaluation Guide - Testing methodology and evaluation creation