| name | Paid Models Integration |
| description | Expert guidance for integrating paid AI model APIs including Claude (Anthropic), OpenAI, Google Gemini, Groq, and others with Python applications |
| version | 1.0.0 |
Paid AI Models Integration
Complete guide for integrating commercial AI model APIs with Python applications.
Overview
This skill covers integration with major paid AI model providers:
- Anthropic Claude - Advanced reasoning and coding
- OpenAI - GPT-4o, GPT-4, GPT-3.5
- Google Gemini - Multimodal capabilities
- Groq - Fast inference (paid tier)
- Amazon Bedrock - AWS-hosted models
- Azure OpenAI - Enterprise OpenAI
Anthropic Claude Integration
Setup
# Install SDK
pip install anthropic
# Set API key
export ANTHROPIC_API_KEY='your-api-key'
Basic Usage
import anthropic
client = anthropic.Anthropic(
api_key="your-api-key"
)
# Simple message
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.content[0].text)
Advanced Features
from anthropic import Anthropic
client = Anthropic()
# Streaming response
with client.messages.stream(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# With system prompt
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=2048,
system="You are a Python expert. Provide clear, well-documented code.",
messages=[
{"role": "user", "content": "Write a binary search function"}
]
)
Tool Use (Function Calling)
import anthropic
client = anthropic.Anthropic()
# Define tools
tools = [
{
"name": "get_weather",
"description": "Get weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
]
# Request with tools
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
]
)
# Process tool calls
for content in response.content:
if content.type == "tool_use":
print(f"Tool: {content.name}")
print(f"Input: {content.input}")
# Execute tool and send result back
tool_result = get_weather(content.input["location"])
# Continue conversation with tool result
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": content.id,
"content": tool_result
}
]
}
]
)
Vision (Image Analysis)
import base64
import anthropic
client = anthropic.Anthropic()
# Read image
with open("image.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this image"
}
]
}
]
)
print(response.content[0].text)
Prompt Caching (Cost Optimization)
# Enable prompt caching for repeated content
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an expert in Python programming...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Explain decorators"}
]
)
# Check cache usage
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
OpenAI Integration
Setup
pip install openai
export OPENAI_API_KEY='your-api-key'
Basic Usage
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
# Chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain async programming"}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Function Calling
functions = [
{
"name": "get_stock_price",
"description": "Get current stock price",
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "Stock symbol (e.g., AAPL)"
}
},
"required": ["symbol"]
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the price of Apple stock?"}],
functions=functions,
function_call="auto"
)
# Process function call
if response.choices[0].message.function_call:
function_call = response.choices[0].message.function_call
print(f"Function: {function_call.name}")
print(f"Arguments: {function_call.arguments}")
Vision
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
)
Embeddings
response = client.embeddings.create(
model="text-embedding-3-large",
input="Your text here"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
Assistants API
# Create assistant
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a math tutor. Help with math problems.",
model="gpt-4o",
tools=[{"type": "code_interpreter"}]
)
# Create thread
thread = client.beta.threads.create()
# Add message
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Solve: 3x + 11 = 14"
)
# Run assistant
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)
# Wait for completion
import time
while run.status != "completed":
time.sleep(1)
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
# Get messages
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
print(f"{msg.role}: {msg.content[0].text.value}")
Google Gemini Integration
Setup
pip install google-generativeai
export GOOGLE_API_KEY='your-api-key'
Basic Usage
import google.generativeai as genai
genai.configure(api_key="your-api-key")
# Create model
model = genai.GenerativeModel('gemini-2.0-flash-exp')
# Generate content
response = model.generate_content("Explain machine learning")
print(response.text)
Streaming
response = model.generate_content(
"Write a long story",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
Multimodal (Vision + Text)
import PIL.Image
# Load image
image = PIL.Image.open('photo.jpg')
# Analyze image
model = genai.GenerativeModel('gemini-2.0-flash-exp')
response = model.generate_content([
"What's in this image?",
image
])
print(response.text)
Function Calling
def get_exchange_rate(currency_from: str, currency_to: str) -> dict:
"""Get exchange rate between currencies"""
return {"rate": 1.12, "from": currency_from, "to": currency_to}
model = genai.GenerativeModel(
'gemini-2.0-flash-exp',
tools=[get_exchange_rate]
)
response = model.generate_content(
"What's the exchange rate from USD to EUR?"
)
# Check for function calls
for part in response.parts:
if fn := part.function_call:
print(f"Calling: {fn.name}")
print(f"Args: {fn.args}")
Groq Integration
Setup
pip install groq
export GROQ_API_KEY='your-api-key'
Basic Usage
from groq import Groq
client = Groq(api_key="your-api-key")
# Fast inference
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain neural networks"}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Amazon Bedrock Integration
Setup
pip install boto3
# Configure AWS credentials
aws configure
Basic Usage
import boto3
import json
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
# Claude on Bedrock
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain cloud computing"}
]
})
response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
body=body
)
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])
Azure OpenAI Integration
Setup
pip install openai
export AZURE_OPENAI_API_KEY='your-api-key'
export AZURE_OPENAI_ENDPOINT='https://your-resource.openai.azure.com/'
Basic Usage
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="your-api-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com/"
)
response = client.chat.completions.create(
model="gpt-4o", # Your deployment name
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
Unified Interface with Agno
Multi-Provider Support
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
from agno.models.gemini import Gemini
from agno.models.groq import Groq
# Create agents with different providers
claude_agent = Agent(
model=Claude(id="claude-3-7-sonnet-latest"),
description="Claude-powered agent"
)
openai_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
description="OpenAI-powered agent"
)
gemini_agent = Agent(
model=Gemini(id="gemini-2.0-flash-exp"),
description="Gemini-powered agent"
)
groq_agent = Agent(
model=Groq(id="llama-3.3-70b-versatile"),
description="Groq-powered agent"
)
# Use the same interface
for agent in [claude_agent, openai_agent, gemini_agent, groq_agent]:
response = agent.run("Explain AI")
print(f"{agent.description}: {response.content[:100]}...")
Production Patterns
Retry Logic with Exponential Backoff
import time
from functools import wraps
from anthropic import Anthropic, RateLimitError, APIError
def retry_with_backoff(max_retries=3, initial_delay=1):
"""Retry decorator with exponential backoff"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
delay = initial_delay
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
print(f"Rate limited, retrying in {delay}s...")
time.sleep(delay)
delay *= 2
except APIError as e:
if attempt == max_retries - 1:
raise
print(f"API error, retrying in {delay}s...")
time.sleep(delay)
delay *= 2
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
def call_claude(prompt: str):
client = Anthropic()
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Rate Limiting
import asyncio
from asyncio import Semaphore
from anthropic import AsyncAnthropic
class RateLimitedClient:
"""Rate-limited API client"""
def __init__(self, max_concurrent=5, requests_per_minute=60):
self.client = AsyncAnthropic()
self.semaphore = Semaphore(max_concurrent)
self.requests_per_minute = requests_per_minute
self.request_times = []
async def call(self, prompt: str):
"""Make rate-limited API call"""
async with self.semaphore:
# Wait if we've hit rate limit
await self._wait_if_needed()
response = await self.client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
self.request_times.append(asyncio.get_event_loop().time())
return response.content[0].text
async def _wait_if_needed(self):
"""Wait if rate limit would be exceeded"""
now = asyncio.get_event_loop().time()
minute_ago = now - 60
# Remove old requests
self.request_times = [t for t in self.request_times if t > minute_ago]
# Wait if needed
if len(self.request_times) >= self.requests_per_minute:
sleep_time = 60 - (now - self.request_times[0])
if sleep_time > 0:
await asyncio.sleep(sleep_time)
Cost Tracking
from dataclasses import dataclass
from typing import Dict
@dataclass
class TokenUsage:
"""Track token usage"""
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_creation_tokens: int = 0
class CostTracker:
"""Track API costs"""
# Pricing per 1M tokens (adjust based on current pricing)
PRICING = {
"claude-3-7-sonnet-20250219": {
"input": 3.00,
"output": 15.00,
"cache_write": 3.75,
"cache_read": 0.30
},
"gpt-4o": {
"input": 2.50,
"output": 10.00
}
}
def __init__(self):
self.usage: Dict[str, TokenUsage] = {}
def track(self, model: str, response):
"""Track usage from response"""
if model not in self.usage:
self.usage[model] = TokenUsage()
usage = self.usage[model]
# Handle different response formats
if hasattr(response, 'usage'):
# Anthropic format
usage.input_tokens += response.usage.input_tokens
usage.output_tokens += response.usage.output_tokens
if hasattr(response.usage, 'cache_read_input_tokens'):
usage.cache_read_tokens += response.usage.cache_read_input_tokens or 0
if hasattr(response.usage, 'cache_creation_input_tokens'):
usage.cache_creation_tokens += response.usage.cache_creation_input_tokens or 0
elif hasattr(response, 'usage'):
# OpenAI format
usage.input_tokens += response.usage.prompt_tokens
usage.output_tokens += response.usage.completion_tokens
def calculate_cost(self, model: str) -> float:
"""Calculate total cost for model"""
if model not in self.usage or model not in self.PRICING:
return 0.0
usage = self.usage[model]
pricing = self.PRICING[model]
cost = 0.0
cost += (usage.input_tokens / 1_000_000) * pricing["input"]
cost += (usage.output_tokens / 1_000_000) * pricing["output"]
if "cache_write" in pricing:
cost += (usage.cache_creation_tokens / 1_000_000) * pricing["cache_write"]
if "cache_read" in pricing:
cost += (usage.cache_read_tokens / 1_000_000) * pricing["cache_read"]
return cost
def report(self):
"""Generate cost report"""
total = 0.0
print("\n=== API Cost Report ===")
for model, usage in self.usage.items():
cost = self.calculate_cost(model)
total += cost
print(f"\n{model}:")
print(f" Input tokens: {usage.input_tokens:,}")
print(f" Output tokens: {usage.output_tokens:,}")
if usage.cache_read_tokens:
print(f" Cache read: {usage.cache_read_tokens:,}")
if usage.cache_creation_tokens:
print(f" Cache write: {usage.cache_creation_tokens:,}")
print(f" Cost: ${cost:.4f}")
print(f"\nTotal cost: ${total:.4f}")
return total
Environment Configuration
from pydantic_settings import BaseSettings
from typing import Optional
class AIConfig(BaseSettings):
"""AI API configuration"""
# API Keys
anthropic_api_key: Optional[str] = None
openai_api_key: Optional[str] = None
google_api_key: Optional[str] = None
groq_api_key: Optional[str] = None
# Default models
default_model: str = "claude-3-7-sonnet-20250219"
# Limits
max_tokens: int = 4096
temperature: float = 0.7
timeout: int = 30
class Config:
env_file = ".env"
case_sensitive = False
# Usage
config = AIConfig()
client = Anthropic(api_key=config.anthropic_api_key)
Best Practices
- Use environment variables for API keys - Never hardcode
- Implement retry logic - Handle transient failures
- Track token usage and costs - Monitor spending
- Use streaming for long responses - Better UX
- Cache prompts when possible - Reduce costs
- Set reasonable timeouts - Prevent hanging
- Handle rate limits gracefully - Exponential backoff
- Validate inputs - Use Pydantic
- Log API calls - Debug and audit
- Use prompt caching - Save on repeated content
Model Selection Guide
When to Use Claude
- Complex reasoning tasks
- Code generation and review
- Long context understanding (200K tokens)
- Safety-critical applications
- Document analysis
When to Use GPT-4o
- General purpose tasks
- Function calling
- Vision tasks
- Fastest OpenAI model
- Good balance of cost/performance
When to Use Gemini
- Multimodal tasks (text + image + video)
- Cost-effective for simple tasks
- Long context (2M tokens with Gemini Pro)
- Fast inference
When to Use Groq
- Ultra-fast inference needed
- Real-time applications
- Cost-effective for simple tasks
- Open source models
Resources
- Anthropic Docs: https://docs.anthropic.com/
- OpenAI Docs: https://platform.openai.com/docs
- Google AI Docs: https://ai.google.dev/docs
- Groq Docs: https://console.groq.com/docs
- AWS Bedrock: https://aws.amazon.com/bedrock/
- Azure OpenAI: https://azure.microsoft.com/en-us/products/ai-services/openai-service