name	ai-integration-patterns
description	This skill should be used when integrating AI capabilities (Claude API, OpenAI, embeddings, etc.) into applications - covers prompt engineering, streaming responses, cost optimization, rate limiting, context window management, error handling, and production-ready patterns for AI-powered features.

AI Integration Patterns

Overview

Integrate AI capabilities into applications effectively and reliably. This skill teaches production-ready patterns for working with Claude API, OpenAI, and other AI services.

Core principle: AI integrations should be fast, cost-effective, and gracefully handle failures.

When to Use

Use this skill when:

Adding AI features to applications (chat, generation, analysis)
Integrating Claude API or OpenAI
Building AI-powered tools or assistants
Implementing embeddings and semantic search
Optimizing AI costs and performance
Handling streaming responses
Managing context windows

Common use cases:

Chatbots and conversational interfaces
Content generation (text, code, summaries)
Document analysis and Q&A
Semantic search
AI-powered recommendations
Code assistants

AI Service Selection

Use Case	Best Choice	Why
Chat/conversation	Claude API	Superior reasoning, long context
Code generation	Claude API or GPT-4	Strong coding abilities
Embeddings/search	OpenAI Ada	Cost-effective, proven
Image generation	DALL-E or Midjourney	Best quality
Voice	OpenAI Whisper/TTS	Standard solutions
Vision	GPT-4V or Claude	Strong multimodal
Fast/cheap tasks	Claude Haiku	3x faster, 10x cheaper

Claude API Integration

Basic Setup

// Install SDK
npm install @anthropic-ai/sdk

// Initialize client
import Anthropic from '@anthropic-ai/sdk'

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
})

Simple Message

async function chat(userMessage: string) {
  const message = await anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: userMessage }
    ],
  })

  return message.content[0].text
}

Streaming Responses

async function* streamChat(userMessage: string) {
  const stream = await anthropic.messages.stream({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: userMessage }
    ],
  })

  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' &&
        chunk.delta.type === 'text_delta') {
      yield chunk.delta.text
    }
  }
}

// Usage in API route
export async function POST(req: Request) {
  const { message } = await req.json()

  const stream = new ReadableStream({
    async start(controller) {
      for await (const text of streamChat(message)) {
        controller.enqueue(`data: ${JSON.stringify({ text })}\n\n`)
      }
      controller.close()
    },
  })

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  })
}

System Prompts (Critical for Quality)

const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-5-20250929',
  max_tokens: 1024,
  system: `You are a helpful coding assistant.

  Rules:
  - Always provide working, tested code
  - Explain your reasoning
  - Use TypeScript when possible
  - Follow best practices
  - Be concise but thorough`,
  messages: [
    { role: 'user', content: userMessage }
  ],
})

Multi-Turn Conversations

interface Message {
  role: 'user' | 'assistant'
  content: string
}

async function conversationChat(
  messages: Message[],
  newMessage: string
) {
  const updatedMessages = [
    ...messages,
    { role: 'user' as const, content: newMessage }
  ]

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: updatedMessages,
  })

  return {
    messages: [
      ...updatedMessages,
      {
        role: 'assistant' as const,
        content: response.content[0].text
      }
    ],
    response: response.content[0].text
  }
}

Model Selection Strategy

Claude Models

Model	Use For	Speed	Cost	Context
Sonnet 4.5	General tasks, coding	Fast	Medium	200k
Haiku 4.5	Simple tasks, high volume	Fastest	Lowest	200k
Opus 4	Complex reasoning, critical tasks	Slower	Highest	200k

Decision framework:

Start with Haiku for:
- Simple Q&A
- Classification tasks
- Summarization
- High-volume requests

Use Sonnet for:
- Conversational AI
- Code generation
- Content creation
- General-purpose tasks

Use Opus for:
- Complex analysis
- Critical decisions
- Research tasks
- When quality > cost

Dynamic Model Selection

function selectModel(taskComplexity: 'simple' | 'medium' | 'complex') {
  const modelMap = {
    simple: 'claude-haiku-4-5-20250929',
    medium: 'claude-sonnet-4-5-20250929',
    complex: 'claude-opus-4-20250514',
  }

  return modelMap[taskComplexity]
}

// Example: Use cheap model for classification
const model = userMessage.length < 100
  ? 'claude-haiku-4-5-20250929'
  : 'claude-sonnet-4-5-20250929'

Prompt Engineering Patterns

The Template Pattern

function buildPrompt(context: {
  userQuery: string
  documents: string[]
  rules: string[]
}) {
  return `Context: ${context.documents.join('\n\n')}

Rules:
${context.rules.map(r => `- ${r}`).join('\n')}

User question: ${context.userQuery}

Provide a clear, accurate answer based on the context above.`
}

The XML Structure Pattern

// Claude works well with XML-structured prompts
const prompt = `
<context>
<documents>
  <document name="user_guide.md">
    ${userGuide}
  </document>
  <document name="api_docs.md">
    ${apiDocs}
  </document>
</documents>
</context>

<task>
Answer the user's question using ONLY the information in the documents above.
If the answer isn't in the documents, say so clearly.
</task>

<question>
${userQuestion}
</question>
`

The Few-Shot Pattern

const fewShotPrompt = `Classify the sentiment of customer feedback.

Examples:

Feedback: "This product is amazing! Best purchase ever."
Sentiment: positive

Feedback: "Terrible service, will never buy again."
Sentiment: negative

Feedback: "It's okay, nothing special."
Sentiment: neutral

Now classify this:
Feedback: "${customerFeedback}"
Sentiment:`

The Chain-of-Thought Pattern

const prompt = `Solve this problem step by step:

Problem: ${problem}

Think through this carefully:
1. What information do we have?
2. What are we trying to find?
3. What steps are needed?
4. Execute each step
5. Verify the answer

Show your work.`

Context Window Management

Truncation Strategies

function truncateToTokenLimit(
  text: string,
  maxTokens: number
): string {
  // Rough estimate: 1 token ≈ 4 characters
  const maxChars = maxTokens * 4

  if (text.length <= maxChars) return text

  // Truncate from middle to preserve beginning and end
  const keepSize = maxChars / 2
  return text.slice(0, keepSize) +
         '\n\n[... content truncated ...]\n\n' +
         text.slice(-keepSize)
}

Chunking Long Documents

function chunkDocument(
  document: string,
  chunkSize: number = 4000 // ~1000 tokens
): string[] {
  const chunks: string[] = []
  let currentChunk = ''

  const paragraphs = document.split('\n\n')

  for (const para of paragraphs) {
    if ((currentChunk + para).length > chunkSize) {
      if (currentChunk) chunks.push(currentChunk)
      currentChunk = para
    } else {
      currentChunk += (currentChunk ? '\n\n' : '') + para
    }
  }

  if (currentChunk) chunks.push(currentChunk)

  return chunks
}

// Process each chunk
async function processLongDocument(document: string) {
  const chunks = chunkDocument(document)
  const summaries = []

  for (const chunk of chunks) {
    const summary = await chat(`Summarize this section:\n\n${chunk}`)
    summaries.push(summary)
  }

  // Final synthesis
  const finalSummary = await chat(
    `Synthesize these section summaries into a coherent overview:\n\n${summaries.join('\n\n')}`
  )

  return finalSummary
}

Conversation History Management

interface ConversationManager {
  messages: Message[]
  maxTokens: number
}

function trimConversationHistory(
  manager: ConversationManager
): Message[] {
  // Keep system message and recent messages
  const systemMsg = manager.messages[0]
  let recentMessages = manager.messages.slice(-10) // Last 10 messages

  // Estimate tokens (rough)
  const estimateTokens = (msgs: Message[]) =>
    msgs.reduce((sum, m) => sum + m.content.length / 4, 0)

  // Remove oldest non-system messages until under limit
  while (estimateTokens(recentMessages) > manager.maxTokens &&
         recentMessages.length > 2) {
    recentMessages = recentMessages.slice(1)
  }

  return [systemMsg, ...recentMessages]
}

Cost Optimization

Token Usage Tracking

interface UsageMetrics {
  inputTokens: number
  outputTokens: number
  cost: number
}

async function chatWithCostTracking(
  userMessage: string
): Promise<{ response: string; usage: UsageMetrics }> {
  const message = await anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: [{ role: 'user', content: userMessage }],
  })

  // Pricing (as of creation - verify current rates)
  const inputCostPer1M = 3.00 // $3 per million input tokens
  const outputCostPer1M = 15.00 // $15 per million output tokens

  const usage = {
    inputTokens: message.usage.input_tokens,
    outputTokens: message.usage.output_tokens,
    cost: (
      (message.usage.input_tokens / 1_000_000) * inputCostPer1M +
      (message.usage.output_tokens / 1_000_000) * outputCostPer1M
    )
  }

  return {
    response: message.content[0].text,
    usage
  }
}

Caching Strategies

import { createHash } from 'crypto'

// Simple in-memory cache
const responseCache = new Map<string, string>()

function getCacheKey(prompt: string, model: string): string {
  return createHash('sha256')
    .update(`${model}:${prompt}`)
    .digest('hex')
}

async function cachedChat(userMessage: string, model: string) {
  const cacheKey = getCacheKey(userMessage, model)

  // Check cache
  if (responseCache.has(cacheKey)) {
    return {
      response: responseCache.get(cacheKey)!,
      cached: true
    }
  }

  // Call API
  const response = await chat(userMessage)

  // Cache response
  responseCache.set(cacheKey, response)

  return {
    response,
    cached: false
  }
}

Batch Processing

// Process multiple requests in parallel
async function batchProcess(
  requests: string[],
  batchSize: number = 5
) {
  const results = []

  for (let i = 0; i < requests.length; i += batchSize) {
    const batch = requests.slice(i, i + batchSize)
    const batchResults = await Promise.all(
      batch.map(req => chat(req))
    )
    results.push(...batchResults)

    // Small delay between batches
    if (i + batchSize < requests.length) {
      await new Promise(resolve => setTimeout(resolve, 100))
    }
  }

  return results
}

Error Handling & Retry Logic

Robust API Calls

async function robustApiCall<T>(
  apiCall: () => Promise<T>,
  maxRetries: number = 3
): Promise<T> {
  let lastError: Error

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await apiCall()
    } catch (error: any) {
      lastError = error

      // Don't retry client errors (4xx)
      if (error.status >= 400 && error.status < 500) {
        throw error
      }

      // Exponential backoff
      const delay = Math.min(1000 * Math.pow(2, i), 10000)
      await new Promise(resolve => setTimeout(resolve, delay))
    }
  }

  throw lastError!
}

// Usage
const response = await robustApiCall(() =>
  anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: [{ role: 'user', content: userMessage }],
  })
)

Rate Limiting

class RateLimiter {
  private queue: Array<() => Promise<any>> = []
  private processing = false
  private lastRequestTime = 0
  private minInterval: number

  constructor(requestsPerMinute: number) {
    this.minInterval = 60000 / requestsPerMinute
  }

  async throttle<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push(async () => {
        try {
          const result = await fn()
          resolve(result)
        } catch (error) {
          reject(error)
        }
      })

      this.processQueue()
    })
  }

  private async processQueue() {
    if (this.processing || this.queue.length === 0) return

    this.processing = true

    while (this.queue.length > 0) {
      const now = Date.now()
      const timeSinceLastRequest = now - this.lastRequestTime

      if (timeSinceLastRequest < this.minInterval) {
        await new Promise(resolve =>
          setTimeout(resolve, this.minInterval - timeSinceLastRequest)
        )
      }

      const fn = this.queue.shift()!
      this.lastRequestTime = Date.now()
      await fn()
    }

    this.processing = false
  }
}

// Usage
const limiter = new RateLimiter(50) // 50 requests per minute

const response = await limiter.throttle(() =>
  anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: [{ role: 'user', content: userMessage }],
  })
)

Embeddings & Semantic Search

OpenAI Embeddings (Standard Choice)

import OpenAI from 'openai'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
})

async function createEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small', // Cheaper
    // model: 'text-embedding-3-large', // Better quality
    input: text,
  })

  return response.data[0].embedding
}

Simple Vector Storage (In-Memory)

interface Document {
  id: string
  text: string
  embedding: number[]
  metadata?: Record<string, any>
}

class SimpleVectorStore {
  private documents: Document[] = []

  async addDocument(id: string, text: string, metadata?: any) {
    const embedding = await createEmbedding(text)
    this.documents.push({ id, text, embedding, metadata })
  }

  cosineSimilarity(a: number[], b: number[]): number {
    const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0)
    const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0))
    const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0))
    return dotProduct / (magA * magB)
  }

  async search(query: string, limit: number = 5): Promise<Document[]> {
    const queryEmbedding = await createEmbedding(query)

    const scored = this.documents.map(doc => ({
      doc,
      score: this.cosineSimilarity(queryEmbedding, doc.embedding)
    }))

    return scored
      .sort((a, b) => b.score - a.score)
      .slice(0, limit)
      .map(item => item.doc)
  }
}

// Usage
const store = new SimpleVectorStore()
await store.addDocument('doc1', 'Claude is an AI assistant...')
await store.addDocument('doc2', 'JavaScript is a programming language...')

const results = await store.search('Tell me about AI')

RAG (Retrieval-Augmented Generation)

async function answerWithRAG(
  question: string,
  vectorStore: SimpleVectorStore
): Promise<string> {
  // 1. Retrieve relevant documents
  const relevantDocs = await vectorStore.search(question, 3)

  // 2. Build context
  const context = relevantDocs
    .map((doc, i) => `Document ${i + 1}:\n${doc.text}`)
    .join('\n\n')

  // 3. Generate answer with Claude
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    system: `Answer questions based on the provided documents.
    If the answer isn't in the documents, say so clearly.`,
    messages: [{
      role: 'user',
      content: `Context:\n${context}\n\nQuestion: ${question}`
    }],
  })

  return response.content[0].text
}

Production Patterns

API Route Example (Next.js)

// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server'
import Anthropic from '@anthropic-ai/sdk'

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY!,
})

export async function POST(req: NextRequest) {
  try {
    const { message, conversationHistory = [] } = await req.json()

    // Validate input
    if (!message || typeof message !== 'string') {
      return NextResponse.json(
        { error: 'Message is required' },
        { status: 400 }
      )
    }

    // Call Claude
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-5-20250929',
      max_tokens: 1024,
      messages: [
        ...conversationHistory,
        { role: 'user', content: message }
      ],
    })

    return NextResponse.json({
      response: response.content[0].text,
      usage: response.usage,
    })
  } catch (error: any) {
    console.error('Claude API error:', error)
    return NextResponse.json(
      { error: 'Failed to process request' },
      { status: 500 }
    )
  }
}

Frontend Integration (React)

'use client'

import { useState } from 'react'

export function ChatInterface() {
  const [messages, setMessages] = useState<Array<{role: string, content: string}>>([])
  const [input, setInput] = useState('')
  const [loading, setLoading] = useState(false)

  const sendMessage = async () => {
    if (!input.trim()) return

    const userMessage = { role: 'user', content: input }
    setMessages(prev => [...prev, userMessage])
    setInput('')
    setLoading(true)

    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          message: input,
          conversationHistory: messages
        }),
      })

      const data = await response.json()

      setMessages(prev => [...prev, {
        role: 'assistant',
        content: data.response
      }])
    } catch (error) {
      console.error('Failed to send message:', error)
    } finally {
      setLoading(false)
    }
  }

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, i) => (
          <div key={i} className={msg.role === 'user' ? 'text-right' : 'text-left'}>
            <div className="inline-block p-2 rounded bg-gray-100">
              {msg.content}
            </div>
          </div>
        ))}
      </div>
      <div className="p-4 border-t">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          disabled={loading}
          className="w-full p-2 border rounded"
          placeholder="Type a message..."
        />
      </div>
    </div>
  )
}

Security Best Practices

API Key Management

// ✅ GOOD: Server-side only
// app/api/chat/route.ts
const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY, // Server environment
})

// ❌ BAD: Never expose API keys in client-side code
// components/Chat.tsx
const anthropic = new Anthropic({
  apiKey: 'sk-ant-...' // NEVER DO THIS!
})

Input Sanitization

function sanitizeInput(input: string): string {
  // Remove potential prompt injection attempts
  return input
    .replace(/system:/gi, '')
    .replace(/assistant:/gi, '')
    .trim()
    .slice(0, 10000) // Limit length
}

Rate Limiting per User

import { rateLimit } from '@/lib/rate-limit'

export async function POST(req: NextRequest) {
  const userId = req.headers.get('x-user-id')

  // Allow 10 requests per minute per user
  const allowed = await rateLimit(userId, 10, 60)

  if (!allowed) {
    return NextResponse.json(
      { error: 'Rate limit exceeded' },
      { status: 429 }
    )
  }

  // Process request...
}

Monitoring & Logging

import { logger } from '@/lib/logger'

async function monitoredChatCall(userMessage: string) {
  const startTime = Date.now()

  try {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-5-20250929',
      max_tokens: 1024,
      messages: [{ role: 'user', content: userMessage }],
    })

    const duration = Date.now() - startTime

    logger.info('Claude API call successful', {
      duration,
      inputTokens: response.usage.input_tokens,
      outputTokens: response.usage.output_tokens,
      model: response.model,
    })

    return response
  } catch (error: any) {
    logger.error('Claude API call failed', {
      error: error.message,
      duration: Date.now() - startTime,
    })
    throw error
  }
}

Common Pitfalls

Mistake	Impact	Solution
Exposing API keys client-side	Security breach	Always call from server
No rate limiting	API bill shock	Implement per-user limits
Ignoring token limits	Errors, failed requests	Track and truncate inputs
No error handling	Poor UX	Implement retries and fallbacks
Sending sensitive data	Privacy violations	Sanitize inputs
Not caching responses	Unnecessary costs	Cache identical requests
Using wrong model	High costs or poor quality	Match model to task complexity

Cost Estimation

Claude Sonnet 4.5 (example pricing):

Input: $3 per million tokens
Output: $15 per million tokens

Example calculations:

1000 chat messages (avg 100 tokens each): ~$0.45
100 document summaries (avg 2000 tokens each): ~$0.90
10,000 simple classifications (avg 50 tokens each): ~$2.25

Cost optimization tips:

Use Haiku for simple tasks (10x cheaper)
Cache repeated queries
Batch similar requests
Truncate long inputs when possible
Use smaller max_tokens when appropriate

Resources

Documentation:

Claude API: https://docs.anthropic.com
OpenAI API: https://platform.openai.com/docs

Tools:

Anthropic SDK: @anthropic-ai/sdk
OpenAI SDK: openai
Vector databases: Pinecone, Weaviate, Qdrant

Best Practices:

Prompt Engineering Guide: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
Claude API Cookbook: https://github.com/anthropics/anthropic-cookbook

AI integrations should be reliable, cost-effective, and provide real value. Start simple, measure everything, and optimize based on real usage patterns.

Install Skill

SKILL.md