name	ai-dev-integration
description	Expert guidance for developing and integrating AI systems using LLM APIs, SDKs, and Model Context Protocol (MCP). Covers API selection, SDK patterns, MCP development, production patterns, security, cost optimization, and architecture decisions for building production-ready AI integrations.

AI Development & Integration

Purpose

Guide developers in building production-ready AI integrations with comprehensive coverage of LLM APIs, SDKs, and the Model Context Protocol (MCP). Provide decision frameworks for choosing the right integration approach, implementing secure and cost-effective solutions, and avoiding common pitfalls.

When to Use This Skill

Invoke this skill when addressing:

API selection decisions: Choosing between OpenAI, Anthropic Claude, Google Gemini, Ollama, or other providers
Integration architecture: Designing systems that use LLMs for chat, document processing, analysis, or other tasks
MCP vs direct API: Deciding whether to build an MCP server or use direct API calls
MCP development: Creating MCP servers with FastMCP (Python) or TypeScript SDK
Production readiness: Implementing error handling, rate limiting, caching, monitoring, or security measures
Multi-provider strategies: Building systems with fallback logic or provider switching
Cost optimization: Reducing token usage, implementing caching, or tracking expenses
Security concerns: Preventing prompt injection, handling PII, or implementing auth
Streaming implementations: Building real-time chat or processing systems
Agent frameworks: Deciding if CrewAI, LangChain, AutoGen, or similar tools are needed

Core Decision Frameworks

API Selection Decision Tree

When to use OpenAI:

Need GPT-4 Turbo or GPT-4o specific capabilities
Require DALL-E image generation or Whisper transcription
Building on existing OpenAI integrations
Cost-sensitive applications with GPT-3.5-turbo
Need function calling with streaming

When to use Anthropic Claude:

Require 200K+ token context windows (Claude 3.5 Sonnet/Opus)
Need strong reasoning and analysis capabilities
Building tool-heavy integrations (MCP compatible)
Prefer thoughtful, nuanced responses
Require strong security and reduced hallucinations

When to use Google Gemini:

Need multimodal inputs (images, video, audio in same context)
Require 2M+ token context window (Gemini 1.5 Pro)
Building Google Cloud integrations
Need competitive pricing on long-context tasks

When to use Ollama (local):

Privacy requirements prevent cloud API usage
Need offline operation or airgapped environments
Want to avoid per-token costs
Acceptable with lower quality for simpler tasks
Have GPU resources for inference

Multi-provider strategy: Implement when requiring high availability, cost optimization through provider switching, or different models for different task types.

MCP vs Direct API Integration

Use MCP when:

Building tool-heavy integrations requiring multiple capabilities (database access, file operations, API calls)
Creating reusable tool packages for multiple projects
Giving Claude extended capabilities beyond simple completions
Need standardized tool discovery and lifecycle management
Building integrations specifically for Claude (MCP is Anthropic-specific)
Want to separate tool implementation from application logic

Use direct API when:

Need simple completions without extensive tooling
Building one-off integrations for specific tasks
Using language models other than Claude (MCP is Claude-specific)
Require maximum control over request/response handling
Need custom streaming or token-level processing
Integration complexity doesn't justify MCP overhead

Architecture pattern comparison:

MCP Architecture:
Application → MCP Client → MCP Server → Tools/Resources
Benefits: Standardized, reusable, discoverable tools
Complexity: Higher initial setup, server lifecycle management

Direct API Architecture:
Application → SDK/HTTP Client → LLM API → Response
Benefits: Simple, direct control, any LLM provider
Complexity: Lower initial setup, manual function calling

For detailed MCP development guidance, consult references/mcp-development.md.

SDK Integration Best Practices

Error handling pattern (all SDKs):

Implement exponential backoff for rate limits
Catch provider-specific exceptions
Log errors with request IDs for debugging
Implement circuit breakers for repeated failures
Provide user-friendly error messages

Streaming vs batch processing:

Use streaming: Chat applications, real-time UIs, long-running generations where partial results are valuable
Use batch: Background processing, bulk operations, when final result is needed before showing anything

Rate limiting strategies:

Token bucket algorithm for smooth request distribution
Provider-specific limits (consult references/api-comparison.md)
Implement queuing for burst handling
Track usage per user/tenant for fair distribution

Context window management:

Track token counts before sending (use tiktoken for OpenAI, Anthropic tokenizer for Claude)
Implement sliding window for conversations
Summarize old messages to preserve context
Chunk large documents with overlap
Use references/pointers instead of full content when possible

For detailed SDK patterns and code examples, consult references/sdk-patterns.md.

MCP Development Essentials

FastMCP vs TypeScript SDK Selection

Choose FastMCP (Python) when:

Primary codebase is Python
Integrating with data science/ML tooling (pandas, numpy, scikit-learn)
Need rapid prototyping with minimal boilerplate
Team expertise is Python-focused
Integrating with FastAPI, Django, or Flask backends

Choose TypeScript SDK when:

Primary codebase is Node.js/TypeScript
Need tight integration with JavaScript ecosystem
Building full-stack applications with shared types
Team expertise is TypeScript-focused
Require advanced type safety and IDE support

Tool Design Patterns

Effective tool design:

Single responsibility: Each tool does one thing well
Clear parameters: Use Pydantic/Zod schemas for validation
Descriptive names: Tool name clearly indicates purpose
Helpful descriptions: Explain when and why to use the tool
Error context: Return actionable error messages

Anti-patterns to avoid:

God tools: Tools that do too many things (split them up)
Vague parameters: Unclear what values are valid or expected
Silent failures: Tools that fail without informative errors
State dependencies: Tools that require specific call order (make them independent)
Inconsistent interfaces: Similar tools with different parameter patterns

Security Considerations for MCP Servers

Authentication and authorization:

Implement authentication for sensitive tools
Use environment variables for credentials, never hardcode
Apply principle of least privilege to database/API access
Validate all user inputs rigorously
Audit tool access and usage

Data access controls:

Scope database queries to authorized data only
Implement row-level security where applicable
Sanitize outputs to prevent leaking PII
Use read-only connections when write access isn't needed
Rate limit expensive operations

For comprehensive MCP security guidance, consult references/mcp-development.md.

Production Patterns

Caching Strategies

LLM response caching:

Cache identical prompts with same parameters
Use semantic similarity for near-duplicate queries
Implement TTL based on data freshness requirements
Cache embeddings for reuse across requests
Provider-specific caching (Anthropic prompt caching, OpenAI response caching)

Cost-effective caching:

Cache expensive operations (embeddings, long completions)
Invalidate caches when underlying data changes
Use Redis/Memcached for distributed caching
Monitor cache hit rates for optimization

Monitoring and Observability

Key metrics to track:

Request latency (p50, p95, p99)
Token usage per request and total
Error rates by type and provider
Cost per request and daily/monthly totals
Cache hit rates

Implement:

Structured logging with request IDs
Distributed tracing for multi-service flows
Alerting on error rate spikes or cost anomalies
Dashboard for real-time monitoring

Cost Optimization Techniques

Token reduction strategies:

Remove unnecessary whitespace and formatting
Use shorter system prompts when possible
Implement prompt compression techniques
Cache common responses
Use cheaper models for simpler tasks

Intelligent routing:

Route simple queries to GPT-3.5-turbo or Claude Haiku
Use expensive models (GPT-4, Claude Opus) only when needed
Implement confidence scoring to determine model selection
A/B test model performance vs cost tradeoffs

For cost tracking tools, use scripts/cost-calculator.py.

Prompt Injection Prevention

Input validation:

Sanitize user inputs before inclusion in prompts
Use structured inputs (JSON) instead of free text when possible
Implement content filtering for malicious patterns
Separate user content from instructions clearly
Use XML tags or delimiters to demarcate user content

System prompt hardening:

Be explicit about ignoring instructions in user content
Use Anthropic's Constitutional AI principles for Claude
Implement output validation to detect leaked instructions
Test with adversarial prompts regularly

PII and Sensitive Data Handling

Data minimization:

Avoid sending PII to LLM APIs when possible
Anonymize or pseudonymize data before processing
Use on-premise models (Ollama) for highly sensitive data
Implement data retention policies

Compliance considerations:

Review provider data processing agreements (DPAs)
Understand data residency requirements
Implement audit logging for compliance
Consider zero data retention options (where available)

For comprehensive production checklist, consult references/production-checklist.md.

Integration Architectures

Synchronous vs Asynchronous Patterns

Synchronous (request-response):

Use for: Chat interfaces, real-time interactions, simple queries
Pattern: User waits for completion, streaming for UX
Implementation: Direct API calls, WebSocket for streaming
Pros: Simple, immediate feedback
Cons: User blocked during processing, scaling challenges

Asynchronous (queue-based):

Use for: Batch processing, long-running tasks, high volume
Pattern: Queue request, process in background, notify on completion
Implementation: Celery, RQ, AWS SQS, Google Cloud Tasks
Pros: Scalable, resilient, non-blocking
Cons: Added complexity, eventual consistency

Hybrid approach:

Streaming response for initial results (synchronous UX)
Queue follow-up processing (asynchronous backend)
WebSocket/SSE for progress updates
Best of both worlds for complex workflows

Webhook Handling

Best practices:

Validate webhook signatures for security
Respond quickly (< 3s), queue actual processing
Implement idempotency to handle duplicates
Retry failed webhooks with exponential backoff
Monitor webhook delivery success rates

State Management Across Conversations

Conversation state strategies:

Stateless: Include full history in each request (simple, scales horizontally)
Session storage: Store conversation in Redis/database (better for long conversations)
Hybrid: Recent messages in request, older messages summarized/retrieved as needed

State storage considerations:

Choose storage based on conversation length expectations
Implement conversation expiration for cleanup
Consider multi-tenant isolation requirements
Use conversation IDs for tracking and debugging

When to Consider Agent Frameworks

Decision Criteria

Use simple API calls when:

Single-task workflows (chat, completion, classification)
No need for multi-step reasoning or tool orchestration
Direct control over prompts and responses is required
Minimal dependencies are preferred

Consider agent frameworks when:

Multi-agent collaboration is needed (CrewAI)
Complex tool orchestration across multiple steps (LangChain)
Need for autonomous task breakdown and execution (AutoGen)
Building agent-to-agent communication patterns
Require pre-built integrations and abstractions

Framework selection guidance:

CrewAI: Multi-agent workflows with role-based collaboration
LangChain: Tool chaining, retrieval-augmented generation (RAG)
AutoGen: Autonomous agents with code execution
Haystack: NLP pipelines and document processing

Note: For detailed agent framework guidance, consult the ai-agent-frameworks skill (if available).

Signs you need orchestration:

Tasks require multiple sequential LLM calls with dependencies
Need to coordinate between different tools and APIs
Workflows vary based on intermediate results
Building complex autonomous behaviors

Signs simple API calls suffice:

Single prompt → single response pattern
Predictable, linear workflows
Minimal tool usage or simple function calling
Performance and control are critical

Template Resources

MCP Server Templates

Python template: scripts/mcp-template-python/

FastMCP-based server structure
Example tool implementations
Environment configuration
Testing setup with pytest

TypeScript template: scripts/mcp-template-typescript/

MCP SDK server structure
Example tool implementations
Build configuration with tsup
Testing setup with vitest

Cost Estimation Tool

Usage: python scripts/cost-calculator.py

Calculate estimated costs for different providers based on:

Input/output token counts
Model selection
Request volume
Compare costs across providers

Helps make informed decisions about provider selection and budget planning.

Additional Resources

references/api-comparison.md: Detailed provider comparison matrix (features, pricing, rate limits, capabilities)
references/mcp-development.md: Comprehensive MCP development guide (FastMCP/TypeScript SDK, tool patterns, security, debugging)
references/sdk-patterns.md: Code examples for error handling, streaming, rate limiting, context management
references/production-checklist.md: Pre-deployment validation, monitoring setup, security audit, performance optimization

Workflow Recommendations

For new integrations:

Start with decision frameworks above to select provider and architecture
Review references/api-comparison.md for detailed provider evaluation
Implement basic integration using patterns from references/sdk-patterns.md
Add production hardening using references/production-checklist.md
Use scripts/cost-calculator.py to validate cost assumptions

For MCP development:

Decide if MCP is appropriate using decision framework above
Choose FastMCP or TypeScript SDK based on team expertise
Start from template in scripts/mcp-template-python/ or scripts/mcp-template-typescript/
Review references/mcp-development.md for tool design patterns and security
Test thoroughly before deployment

For production deployments:

Complete all items in references/production-checklist.md
Implement monitoring and cost tracking
Set up alerting for errors and cost anomalies
Document operational runbooks
Plan for scaling and disaster recovery

ai-dev-integration

Install Skill

SKILL.md