| name | ai-dev-integration |
| description | Expert guidance for developing and integrating AI systems using LLM APIs, SDKs, and Model Context Protocol (MCP). Covers API selection, SDK patterns, MCP development, production patterns, security, cost optimization, and architecture decisions for building production-ready AI integrations. |
AI Development & Integration
Purpose
Guide developers in building production-ready AI integrations with comprehensive coverage of LLM APIs, SDKs, and the Model Context Protocol (MCP). Provide decision frameworks for choosing the right integration approach, implementing secure and cost-effective solutions, and avoiding common pitfalls.
When to Use This Skill
Invoke this skill when addressing:
- API selection decisions: Choosing between OpenAI, Anthropic Claude, Google Gemini, Ollama, or other providers
- Integration architecture: Designing systems that use LLMs for chat, document processing, analysis, or other tasks
- MCP vs direct API: Deciding whether to build an MCP server or use direct API calls
- MCP development: Creating MCP servers with FastMCP (Python) or TypeScript SDK
- Production readiness: Implementing error handling, rate limiting, caching, monitoring, or security measures
- Multi-provider strategies: Building systems with fallback logic or provider switching
- Cost optimization: Reducing token usage, implementing caching, or tracking expenses
- Security concerns: Preventing prompt injection, handling PII, or implementing auth
- Streaming implementations: Building real-time chat or processing systems
- Agent frameworks: Deciding if CrewAI, LangChain, AutoGen, or similar tools are needed
Core Decision Frameworks
API Selection Decision Tree
When to use OpenAI:
- Need GPT-4 Turbo or GPT-4o specific capabilities
- Require DALL-E image generation or Whisper transcription
- Building on existing OpenAI integrations
- Cost-sensitive applications with GPT-3.5-turbo
- Need function calling with streaming
When to use Anthropic Claude:
- Require 200K+ token context windows (Claude 3.5 Sonnet/Opus)
- Need strong reasoning and analysis capabilities
- Building tool-heavy integrations (MCP compatible)
- Prefer thoughtful, nuanced responses
- Require strong security and reduced hallucinations
When to use Google Gemini:
- Need multimodal inputs (images, video, audio in same context)
- Require 2M+ token context window (Gemini 1.5 Pro)
- Building Google Cloud integrations
- Need competitive pricing on long-context tasks
When to use Ollama (local):
- Privacy requirements prevent cloud API usage
- Need offline operation or airgapped environments
- Want to avoid per-token costs
- Acceptable with lower quality for simpler tasks
- Have GPU resources for inference
Multi-provider strategy: Implement when requiring high availability, cost optimization through provider switching, or different models for different task types.
MCP vs Direct API Integration
Use MCP when:
- Building tool-heavy integrations requiring multiple capabilities (database access, file operations, API calls)
- Creating reusable tool packages for multiple projects
- Giving Claude extended capabilities beyond simple completions
- Need standardized tool discovery and lifecycle management
- Building integrations specifically for Claude (MCP is Anthropic-specific)
- Want to separate tool implementation from application logic
Use direct API when:
- Need simple completions without extensive tooling
- Building one-off integrations for specific tasks
- Using language models other than Claude (MCP is Claude-specific)
- Require maximum control over request/response handling
- Need custom streaming or token-level processing
- Integration complexity doesn't justify MCP overhead
Architecture pattern comparison:
MCP Architecture:
Application → MCP Client → MCP Server → Tools/Resources
Benefits: Standardized, reusable, discoverable tools
Complexity: Higher initial setup, server lifecycle management
Direct API Architecture:
Application → SDK/HTTP Client → LLM API → Response
Benefits: Simple, direct control, any LLM provider
Complexity: Lower initial setup, manual function calling
For detailed MCP development guidance, consult references/mcp-development.md.
SDK Integration Best Practices
Error handling pattern (all SDKs):
- Implement exponential backoff for rate limits
- Catch provider-specific exceptions
- Log errors with request IDs for debugging
- Implement circuit breakers for repeated failures
- Provide user-friendly error messages
Streaming vs batch processing:
- Use streaming: Chat applications, real-time UIs, long-running generations where partial results are valuable
- Use batch: Background processing, bulk operations, when final result is needed before showing anything
Rate limiting strategies:
- Token bucket algorithm for smooth request distribution
- Provider-specific limits (consult
references/api-comparison.md) - Implement queuing for burst handling
- Track usage per user/tenant for fair distribution
Context window management:
- Track token counts before sending (use tiktoken for OpenAI, Anthropic tokenizer for Claude)
- Implement sliding window for conversations
- Summarize old messages to preserve context
- Chunk large documents with overlap
- Use references/pointers instead of full content when possible
For detailed SDK patterns and code examples, consult references/sdk-patterns.md.
MCP Development Essentials
FastMCP vs TypeScript SDK Selection
Choose FastMCP (Python) when:
- Primary codebase is Python
- Integrating with data science/ML tooling (pandas, numpy, scikit-learn)
- Need rapid prototyping with minimal boilerplate
- Team expertise is Python-focused
- Integrating with FastAPI, Django, or Flask backends
Choose TypeScript SDK when:
- Primary codebase is Node.js/TypeScript
- Need tight integration with JavaScript ecosystem
- Building full-stack applications with shared types
- Team expertise is TypeScript-focused
- Require advanced type safety and IDE support
Tool Design Patterns
Effective tool design:
- Single responsibility: Each tool does one thing well
- Clear parameters: Use Pydantic/Zod schemas for validation
- Descriptive names: Tool name clearly indicates purpose
- Helpful descriptions: Explain when and why to use the tool
- Error context: Return actionable error messages
Anti-patterns to avoid:
- God tools: Tools that do too many things (split them up)
- Vague parameters: Unclear what values are valid or expected
- Silent failures: Tools that fail without informative errors
- State dependencies: Tools that require specific call order (make them independent)
- Inconsistent interfaces: Similar tools with different parameter patterns
Security Considerations for MCP Servers
Authentication and authorization:
- Implement authentication for sensitive tools
- Use environment variables for credentials, never hardcode
- Apply principle of least privilege to database/API access
- Validate all user inputs rigorously
- Audit tool access and usage
Data access controls:
- Scope database queries to authorized data only
- Implement row-level security where applicable
- Sanitize outputs to prevent leaking PII
- Use read-only connections when write access isn't needed
- Rate limit expensive operations
For comprehensive MCP security guidance, consult references/mcp-development.md.
Production Patterns
Caching Strategies
LLM response caching:
- Cache identical prompts with same parameters
- Use semantic similarity for near-duplicate queries
- Implement TTL based on data freshness requirements
- Cache embeddings for reuse across requests
- Provider-specific caching (Anthropic prompt caching, OpenAI response caching)
Cost-effective caching:
- Cache expensive operations (embeddings, long completions)
- Invalidate caches when underlying data changes
- Use Redis/Memcached for distributed caching
- Monitor cache hit rates for optimization
Monitoring and Observability
Key metrics to track:
- Request latency (p50, p95, p99)
- Token usage per request and total
- Error rates by type and provider
- Cost per request and daily/monthly totals
- Cache hit rates
Implement:
- Structured logging with request IDs
- Distributed tracing for multi-service flows
- Alerting on error rate spikes or cost anomalies
- Dashboard for real-time monitoring
Cost Optimization Techniques
Token reduction strategies:
- Remove unnecessary whitespace and formatting
- Use shorter system prompts when possible
- Implement prompt compression techniques
- Cache common responses
- Use cheaper models for simpler tasks
Intelligent routing:
- Route simple queries to GPT-3.5-turbo or Claude Haiku
- Use expensive models (GPT-4, Claude Opus) only when needed
- Implement confidence scoring to determine model selection
- A/B test model performance vs cost tradeoffs
For cost tracking tools, use scripts/cost-calculator.py.
Prompt Injection Prevention
Input validation:
- Sanitize user inputs before inclusion in prompts
- Use structured inputs (JSON) instead of free text when possible
- Implement content filtering for malicious patterns
- Separate user content from instructions clearly
- Use XML tags or delimiters to demarcate user content
System prompt hardening:
- Be explicit about ignoring instructions in user content
- Use Anthropic's Constitutional AI principles for Claude
- Implement output validation to detect leaked instructions
- Test with adversarial prompts regularly
PII and Sensitive Data Handling
Data minimization:
- Avoid sending PII to LLM APIs when possible
- Anonymize or pseudonymize data before processing
- Use on-premise models (Ollama) for highly sensitive data
- Implement data retention policies
Compliance considerations:
- Review provider data processing agreements (DPAs)
- Understand data residency requirements
- Implement audit logging for compliance
- Consider zero data retention options (where available)
For comprehensive production checklist, consult references/production-checklist.md.
Integration Architectures
Synchronous vs Asynchronous Patterns
Synchronous (request-response):
- Use for: Chat interfaces, real-time interactions, simple queries
- Pattern: User waits for completion, streaming for UX
- Implementation: Direct API calls, WebSocket for streaming
- Pros: Simple, immediate feedback
- Cons: User blocked during processing, scaling challenges
Asynchronous (queue-based):
- Use for: Batch processing, long-running tasks, high volume
- Pattern: Queue request, process in background, notify on completion
- Implementation: Celery, RQ, AWS SQS, Google Cloud Tasks
- Pros: Scalable, resilient, non-blocking
- Cons: Added complexity, eventual consistency
Hybrid approach:
- Streaming response for initial results (synchronous UX)
- Queue follow-up processing (asynchronous backend)
- WebSocket/SSE for progress updates
- Best of both worlds for complex workflows
Webhook Handling
Best practices:
- Validate webhook signatures for security
- Respond quickly (< 3s), queue actual processing
- Implement idempotency to handle duplicates
- Retry failed webhooks with exponential backoff
- Monitor webhook delivery success rates
State Management Across Conversations
Conversation state strategies:
- Stateless: Include full history in each request (simple, scales horizontally)
- Session storage: Store conversation in Redis/database (better for long conversations)
- Hybrid: Recent messages in request, older messages summarized/retrieved as needed
State storage considerations:
- Choose storage based on conversation length expectations
- Implement conversation expiration for cleanup
- Consider multi-tenant isolation requirements
- Use conversation IDs for tracking and debugging
When to Consider Agent Frameworks
Decision Criteria
Use simple API calls when:
- Single-task workflows (chat, completion, classification)
- No need for multi-step reasoning or tool orchestration
- Direct control over prompts and responses is required
- Minimal dependencies are preferred
Consider agent frameworks when:
- Multi-agent collaboration is needed (CrewAI)
- Complex tool orchestration across multiple steps (LangChain)
- Need for autonomous task breakdown and execution (AutoGen)
- Building agent-to-agent communication patterns
- Require pre-built integrations and abstractions
Framework selection guidance:
- CrewAI: Multi-agent workflows with role-based collaboration
- LangChain: Tool chaining, retrieval-augmented generation (RAG)
- AutoGen: Autonomous agents with code execution
- Haystack: NLP pipelines and document processing
Note: For detailed agent framework guidance, consult the ai-agent-frameworks skill (if available).
Signs you need orchestration:
- Tasks require multiple sequential LLM calls with dependencies
- Need to coordinate between different tools and APIs
- Workflows vary based on intermediate results
- Building complex autonomous behaviors
Signs simple API calls suffice:
- Single prompt → single response pattern
- Predictable, linear workflows
- Minimal tool usage or simple function calling
- Performance and control are critical
Template Resources
MCP Server Templates
Python template: scripts/mcp-template-python/
- FastMCP-based server structure
- Example tool implementations
- Environment configuration
- Testing setup with pytest
TypeScript template: scripts/mcp-template-typescript/
- MCP SDK server structure
- Example tool implementations
- Build configuration with tsup
- Testing setup with vitest
Cost Estimation Tool
Usage: python scripts/cost-calculator.py
Calculate estimated costs for different providers based on:
- Input/output token counts
- Model selection
- Request volume
- Compare costs across providers
Helps make informed decisions about provider selection and budget planning.
Additional Resources
references/api-comparison.md: Detailed provider comparison matrix (features, pricing, rate limits, capabilities)references/mcp-development.md: Comprehensive MCP development guide (FastMCP/TypeScript SDK, tool patterns, security, debugging)references/sdk-patterns.md: Code examples for error handling, streaming, rate limiting, context managementreferences/production-checklist.md: Pre-deployment validation, monitoring setup, security audit, performance optimization
Workflow Recommendations
For new integrations:
- Start with decision frameworks above to select provider and architecture
- Review
references/api-comparison.mdfor detailed provider evaluation - Implement basic integration using patterns from
references/sdk-patterns.md - Add production hardening using
references/production-checklist.md - Use
scripts/cost-calculator.pyto validate cost assumptions
For MCP development:
- Decide if MCP is appropriate using decision framework above
- Choose FastMCP or TypeScript SDK based on team expertise
- Start from template in
scripts/mcp-template-python/orscripts/mcp-template-typescript/ - Review
references/mcp-development.mdfor tool design patterns and security - Test thoroughly before deployment
For production deployments:
- Complete all items in
references/production-checklist.md - Implement monitoring and cost tracking
- Set up alerting for errors and cost anomalies
- Document operational runbooks
- Plan for scaling and disaster recovery