| name | ai-integration-patterns |
| description | This skill should be used when integrating AI capabilities (Claude API, OpenAI, embeddings, etc.) into applications - covers prompt engineering, streaming responses, cost optimization, rate limiting, context window management, error handling, and production-ready patterns for AI-powered features. |
AI Integration Patterns
Overview
Integrate AI capabilities into applications effectively and reliably. This skill teaches production-ready patterns for working with Claude API, OpenAI, and other AI services.
Core principle: AI integrations should be fast, cost-effective, and gracefully handle failures.
When to Use
Use this skill when:
- Adding AI features to applications (chat, generation, analysis)
- Integrating Claude API or OpenAI
- Building AI-powered tools or assistants
- Implementing embeddings and semantic search
- Optimizing AI costs and performance
- Handling streaming responses
- Managing context windows
Common use cases:
- Chatbots and conversational interfaces
- Content generation (text, code, summaries)
- Document analysis and Q&A
- Semantic search
- AI-powered recommendations
- Code assistants
AI Service Selection
| Use Case | Best Choice | Why |
|---|---|---|
| Chat/conversation | Claude API | Superior reasoning, long context |
| Code generation | Claude API or GPT-4 | Strong coding abilities |
| Embeddings/search | OpenAI Ada | Cost-effective, proven |
| Image generation | DALL-E or Midjourney | Best quality |
| Voice | OpenAI Whisper/TTS | Standard solutions |
| Vision | GPT-4V or Claude | Strong multimodal |
| Fast/cheap tasks | Claude Haiku | 3x faster, 10x cheaper |
Claude API Integration
Basic Setup
// Install SDK
npm install @anthropic-ai/sdk
// Initialize client
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
})
Simple Message
async function chat(userMessage: string) {
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [
{ role: 'user', content: userMessage }
],
})
return message.content[0].text
}
Streaming Responses
async function* streamChat(userMessage: string) {
const stream = await anthropic.messages.stream({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [
{ role: 'user', content: userMessage }
],
})
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta') {
yield chunk.delta.text
}
}
}
// Usage in API route
export async function POST(req: Request) {
const { message } = await req.json()
const stream = new ReadableStream({
async start(controller) {
for await (const text of streamChat(message)) {
controller.enqueue(`data: ${JSON.stringify({ text })}\n\n`)
}
controller.close()
},
})
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
})
}
System Prompts (Critical for Quality)
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
system: `You are a helpful coding assistant.
Rules:
- Always provide working, tested code
- Explain your reasoning
- Use TypeScript when possible
- Follow best practices
- Be concise but thorough`,
messages: [
{ role: 'user', content: userMessage }
],
})
Multi-Turn Conversations
interface Message {
role: 'user' | 'assistant'
content: string
}
async function conversationChat(
messages: Message[],
newMessage: string
) {
const updatedMessages = [
...messages,
{ role: 'user' as const, content: newMessage }
]
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: updatedMessages,
})
return {
messages: [
...updatedMessages,
{
role: 'assistant' as const,
content: response.content[0].text
}
],
response: response.content[0].text
}
}
Model Selection Strategy
Claude Models
| Model | Use For | Speed | Cost | Context |
|---|---|---|---|---|
| Sonnet 4.5 | General tasks, coding | Fast | Medium | 200k |
| Haiku 4.5 | Simple tasks, high volume | Fastest | Lowest | 200k |
| Opus 4 | Complex reasoning, critical tasks | Slower | Highest | 200k |
Decision framework:
Start with Haiku for:
- Simple Q&A
- Classification tasks
- Summarization
- High-volume requests
Use Sonnet for:
- Conversational AI
- Code generation
- Content creation
- General-purpose tasks
Use Opus for:
- Complex analysis
- Critical decisions
- Research tasks
- When quality > cost
Dynamic Model Selection
function selectModel(taskComplexity: 'simple' | 'medium' | 'complex') {
const modelMap = {
simple: 'claude-haiku-4-5-20250929',
medium: 'claude-sonnet-4-5-20250929',
complex: 'claude-opus-4-20250514',
}
return modelMap[taskComplexity]
}
// Example: Use cheap model for classification
const model = userMessage.length < 100
? 'claude-haiku-4-5-20250929'
: 'claude-sonnet-4-5-20250929'
Prompt Engineering Patterns
The Template Pattern
function buildPrompt(context: {
userQuery: string
documents: string[]
rules: string[]
}) {
return `Context: ${context.documents.join('\n\n')}
Rules:
${context.rules.map(r => `- ${r}`).join('\n')}
User question: ${context.userQuery}
Provide a clear, accurate answer based on the context above.`
}
The XML Structure Pattern
// Claude works well with XML-structured prompts
const prompt = `
<context>
<documents>
<document name="user_guide.md">
${userGuide}
</document>
<document name="api_docs.md">
${apiDocs}
</document>
</documents>
</context>
<task>
Answer the user's question using ONLY the information in the documents above.
If the answer isn't in the documents, say so clearly.
</task>
<question>
${userQuestion}
</question>
`
The Few-Shot Pattern
const fewShotPrompt = `Classify the sentiment of customer feedback.
Examples:
Feedback: "This product is amazing! Best purchase ever."
Sentiment: positive
Feedback: "Terrible service, will never buy again."
Sentiment: negative
Feedback: "It's okay, nothing special."
Sentiment: neutral
Now classify this:
Feedback: "${customerFeedback}"
Sentiment:`
The Chain-of-Thought Pattern
const prompt = `Solve this problem step by step:
Problem: ${problem}
Think through this carefully:
1. What information do we have?
2. What are we trying to find?
3. What steps are needed?
4. Execute each step
5. Verify the answer
Show your work.`
Context Window Management
Truncation Strategies
function truncateToTokenLimit(
text: string,
maxTokens: number
): string {
// Rough estimate: 1 token ≈ 4 characters
const maxChars = maxTokens * 4
if (text.length <= maxChars) return text
// Truncate from middle to preserve beginning and end
const keepSize = maxChars / 2
return text.slice(0, keepSize) +
'\n\n[... content truncated ...]\n\n' +
text.slice(-keepSize)
}
Chunking Long Documents
function chunkDocument(
document: string,
chunkSize: number = 4000 // ~1000 tokens
): string[] {
const chunks: string[] = []
let currentChunk = ''
const paragraphs = document.split('\n\n')
for (const para of paragraphs) {
if ((currentChunk + para).length > chunkSize) {
if (currentChunk) chunks.push(currentChunk)
currentChunk = para
} else {
currentChunk += (currentChunk ? '\n\n' : '') + para
}
}
if (currentChunk) chunks.push(currentChunk)
return chunks
}
// Process each chunk
async function processLongDocument(document: string) {
const chunks = chunkDocument(document)
const summaries = []
for (const chunk of chunks) {
const summary = await chat(`Summarize this section:\n\n${chunk}`)
summaries.push(summary)
}
// Final synthesis
const finalSummary = await chat(
`Synthesize these section summaries into a coherent overview:\n\n${summaries.join('\n\n')}`
)
return finalSummary
}
Conversation History Management
interface ConversationManager {
messages: Message[]
maxTokens: number
}
function trimConversationHistory(
manager: ConversationManager
): Message[] {
// Keep system message and recent messages
const systemMsg = manager.messages[0]
let recentMessages = manager.messages.slice(-10) // Last 10 messages
// Estimate tokens (rough)
const estimateTokens = (msgs: Message[]) =>
msgs.reduce((sum, m) => sum + m.content.length / 4, 0)
// Remove oldest non-system messages until under limit
while (estimateTokens(recentMessages) > manager.maxTokens &&
recentMessages.length > 2) {
recentMessages = recentMessages.slice(1)
}
return [systemMsg, ...recentMessages]
}
Cost Optimization
Token Usage Tracking
interface UsageMetrics {
inputTokens: number
outputTokens: number
cost: number
}
async function chatWithCostTracking(
userMessage: string
): Promise<{ response: string; usage: UsageMetrics }> {
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }],
})
// Pricing (as of creation - verify current rates)
const inputCostPer1M = 3.00 // $3 per million input tokens
const outputCostPer1M = 15.00 // $15 per million output tokens
const usage = {
inputTokens: message.usage.input_tokens,
outputTokens: message.usage.output_tokens,
cost: (
(message.usage.input_tokens / 1_000_000) * inputCostPer1M +
(message.usage.output_tokens / 1_000_000) * outputCostPer1M
)
}
return {
response: message.content[0].text,
usage
}
}
Caching Strategies
import { createHash } from 'crypto'
// Simple in-memory cache
const responseCache = new Map<string, string>()
function getCacheKey(prompt: string, model: string): string {
return createHash('sha256')
.update(`${model}:${prompt}`)
.digest('hex')
}
async function cachedChat(userMessage: string, model: string) {
const cacheKey = getCacheKey(userMessage, model)
// Check cache
if (responseCache.has(cacheKey)) {
return {
response: responseCache.get(cacheKey)!,
cached: true
}
}
// Call API
const response = await chat(userMessage)
// Cache response
responseCache.set(cacheKey, response)
return {
response,
cached: false
}
}
Batch Processing
// Process multiple requests in parallel
async function batchProcess(
requests: string[],
batchSize: number = 5
) {
const results = []
for (let i = 0; i < requests.length; i += batchSize) {
const batch = requests.slice(i, i + batchSize)
const batchResults = await Promise.all(
batch.map(req => chat(req))
)
results.push(...batchResults)
// Small delay between batches
if (i + batchSize < requests.length) {
await new Promise(resolve => setTimeout(resolve, 100))
}
}
return results
}
Error Handling & Retry Logic
Robust API Calls
async function robustApiCall<T>(
apiCall: () => Promise<T>,
maxRetries: number = 3
): Promise<T> {
let lastError: Error
for (let i = 0; i < maxRetries; i++) {
try {
return await apiCall()
} catch (error: any) {
lastError = error
// Don't retry client errors (4xx)
if (error.status >= 400 && error.status < 500) {
throw error
}
// Exponential backoff
const delay = Math.min(1000 * Math.pow(2, i), 10000)
await new Promise(resolve => setTimeout(resolve, delay))
}
}
throw lastError!
}
// Usage
const response = await robustApiCall(() =>
anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }],
})
)
Rate Limiting
class RateLimiter {
private queue: Array<() => Promise<any>> = []
private processing = false
private lastRequestTime = 0
private minInterval: number
constructor(requestsPerMinute: number) {
this.minInterval = 60000 / requestsPerMinute
}
async throttle<T>(fn: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push(async () => {
try {
const result = await fn()
resolve(result)
} catch (error) {
reject(error)
}
})
this.processQueue()
})
}
private async processQueue() {
if (this.processing || this.queue.length === 0) return
this.processing = true
while (this.queue.length > 0) {
const now = Date.now()
const timeSinceLastRequest = now - this.lastRequestTime
if (timeSinceLastRequest < this.minInterval) {
await new Promise(resolve =>
setTimeout(resolve, this.minInterval - timeSinceLastRequest)
)
}
const fn = this.queue.shift()!
this.lastRequestTime = Date.now()
await fn()
}
this.processing = false
}
}
// Usage
const limiter = new RateLimiter(50) // 50 requests per minute
const response = await limiter.throttle(() =>
anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }],
})
)
Embeddings & Semantic Search
OpenAI Embeddings (Standard Choice)
import OpenAI from 'openai'
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
})
async function createEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small', // Cheaper
// model: 'text-embedding-3-large', // Better quality
input: text,
})
return response.data[0].embedding
}
Simple Vector Storage (In-Memory)
interface Document {
id: string
text: string
embedding: number[]
metadata?: Record<string, any>
}
class SimpleVectorStore {
private documents: Document[] = []
async addDocument(id: string, text: string, metadata?: any) {
const embedding = await createEmbedding(text)
this.documents.push({ id, text, embedding, metadata })
}
cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0)
const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0))
const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0))
return dotProduct / (magA * magB)
}
async search(query: string, limit: number = 5): Promise<Document[]> {
const queryEmbedding = await createEmbedding(query)
const scored = this.documents.map(doc => ({
doc,
score: this.cosineSimilarity(queryEmbedding, doc.embedding)
}))
return scored
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map(item => item.doc)
}
}
// Usage
const store = new SimpleVectorStore()
await store.addDocument('doc1', 'Claude is an AI assistant...')
await store.addDocument('doc2', 'JavaScript is a programming language...')
const results = await store.search('Tell me about AI')
RAG (Retrieval-Augmented Generation)
async function answerWithRAG(
question: string,
vectorStore: SimpleVectorStore
): Promise<string> {
// 1. Retrieve relevant documents
const relevantDocs = await vectorStore.search(question, 3)
// 2. Build context
const context = relevantDocs
.map((doc, i) => `Document ${i + 1}:\n${doc.text}`)
.join('\n\n')
// 3. Generate answer with Claude
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
system: `Answer questions based on the provided documents.
If the answer isn't in the documents, say so clearly.`,
messages: [{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${question}`
}],
})
return response.content[0].text
}
Production Patterns
API Route Example (Next.js)
// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server'
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
})
export async function POST(req: NextRequest) {
try {
const { message, conversationHistory = [] } = await req.json()
// Validate input
if (!message || typeof message !== 'string') {
return NextResponse.json(
{ error: 'Message is required' },
{ status: 400 }
)
}
// Call Claude
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [
...conversationHistory,
{ role: 'user', content: message }
],
})
return NextResponse.json({
response: response.content[0].text,
usage: response.usage,
})
} catch (error: any) {
console.error('Claude API error:', error)
return NextResponse.json(
{ error: 'Failed to process request' },
{ status: 500 }
)
}
}
Frontend Integration (React)
'use client'
import { useState } from 'react'
export function ChatInterface() {
const [messages, setMessages] = useState<Array<{role: string, content: string}>>([])
const [input, setInput] = useState('')
const [loading, setLoading] = useState(false)
const sendMessage = async () => {
if (!input.trim()) return
const userMessage = { role: 'user', content: input }
setMessages(prev => [...prev, userMessage])
setInput('')
setLoading(true)
try {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: input,
conversationHistory: messages
}),
})
const data = await response.json()
setMessages(prev => [...prev, {
role: 'assistant',
content: data.response
}])
} catch (error) {
console.error('Failed to send message:', error)
} finally {
setLoading(false)
}
}
return (
<div className="flex flex-col h-screen">
<div className="flex-1 overflow-y-auto p-4">
{messages.map((msg, i) => (
<div key={i} className={msg.role === 'user' ? 'text-right' : 'text-left'}>
<div className="inline-block p-2 rounded bg-gray-100">
{msg.content}
</div>
</div>
))}
</div>
<div className="p-4 border-t">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
disabled={loading}
className="w-full p-2 border rounded"
placeholder="Type a message..."
/>
</div>
</div>
)
}
Security Best Practices
API Key Management
// ✅ GOOD: Server-side only
// app/api/chat/route.ts
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY, // Server environment
})
// ❌ BAD: Never expose API keys in client-side code
// components/Chat.tsx
const anthropic = new Anthropic({
apiKey: 'sk-ant-...' // NEVER DO THIS!
})
Input Sanitization
function sanitizeInput(input: string): string {
// Remove potential prompt injection attempts
return input
.replace(/system:/gi, '')
.replace(/assistant:/gi, '')
.trim()
.slice(0, 10000) // Limit length
}
Rate Limiting per User
import { rateLimit } from '@/lib/rate-limit'
export async function POST(req: NextRequest) {
const userId = req.headers.get('x-user-id')
// Allow 10 requests per minute per user
const allowed = await rateLimit(userId, 10, 60)
if (!allowed) {
return NextResponse.json(
{ error: 'Rate limit exceeded' },
{ status: 429 }
)
}
// Process request...
}
Monitoring & Logging
import { logger } from '@/lib/logger'
async function monitoredChatCall(userMessage: string) {
const startTime = Date.now()
try {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }],
})
const duration = Date.now() - startTime
logger.info('Claude API call successful', {
duration,
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
model: response.model,
})
return response
} catch (error: any) {
logger.error('Claude API call failed', {
error: error.message,
duration: Date.now() - startTime,
})
throw error
}
}
Common Pitfalls
| Mistake | Impact | Solution |
|---|---|---|
| Exposing API keys client-side | Security breach | Always call from server |
| No rate limiting | API bill shock | Implement per-user limits |
| Ignoring token limits | Errors, failed requests | Track and truncate inputs |
| No error handling | Poor UX | Implement retries and fallbacks |
| Sending sensitive data | Privacy violations | Sanitize inputs |
| Not caching responses | Unnecessary costs | Cache identical requests |
| Using wrong model | High costs or poor quality | Match model to task complexity |
Cost Estimation
Claude Sonnet 4.5 (example pricing):
- Input: $3 per million tokens
- Output: $15 per million tokens
Example calculations:
- 1000 chat messages (avg 100 tokens each): ~$0.45
- 100 document summaries (avg 2000 tokens each): ~$0.90
- 10,000 simple classifications (avg 50 tokens each): ~$2.25
Cost optimization tips:
- Use Haiku for simple tasks (10x cheaper)
- Cache repeated queries
- Batch similar requests
- Truncate long inputs when possible
- Use smaller max_tokens when appropriate
Resources
Documentation:
- Claude API: https://docs.anthropic.com
- OpenAI API: https://platform.openai.com/docs
Tools:
- Anthropic SDK:
@anthropic-ai/sdk - OpenAI SDK:
openai - Vector databases: Pinecone, Weaviate, Qdrant
Best Practices:
- Prompt Engineering Guide: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
- Claude API Cookbook: https://github.com/anthropics/anthropic-cookbook
AI integrations should be reliable, cost-effective, and provide real value. Start simple, measure everything, and optimize based on real usage patterns.