| name | openrouter |
| description | OpenRouter unified AI API - Access 200+ LLMs through single interface with intelligent routing, streaming, cost optimization, and model fallbacks |
| version | 1.0.0 |
| category | ai-service |
| author | Claude MPM Team |
| license | MIT |
| progressive_disclosure | [object Object] |
| context_limit | 4200 |
| tags | ai, llm, api, openai-compatible, streaming, cost-optimization, multi-model |
| requires_tools |
OpenRouter - Unified AI API Gateway
Overview
OpenRouter provides a single API to access 200+ language models from OpenAI, Anthropic, Google, Meta, Mistral, and more. It offers intelligent routing, streaming, cost optimization, and standardized OpenAI-compatible interface.
Key Features:
- Access 200+ models through one API
- OpenAI-compatible interface (drop-in replacement)
- Intelligent model routing and fallbacks
- Real-time streaming responses
- Cost tracking and optimization
- Model performance analytics
- Function calling support
- Vision model support
Pricing Model:
- Pay-per-token (no subscriptions)
- Volume discounts available
- Free tier with credits
- Per-model pricing varies
Installation:
npm install openai # Use OpenAI SDK
# or
pip install openai # Python
Quick Start
1. Get API Key
# Sign up at https://openrouter.ai/keys
export OPENROUTER_API_KEY="sk-or-v1-..."
2. Basic Chat Completion
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
'HTTP-Referer': 'https://your-app.com', // Optional
'X-Title': 'Your App Name', // Optional
}
});
async function chat() {
const completion = await client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'Explain quantum computing in simple terms' }
],
});
console.log(completion.choices[0].message.content);
}
3. Streaming Response
async function streamChat() {
const stream = await client.chat.completions.create({
model: 'openai/gpt-4-turbo',
messages: [
{ role: 'user', content: 'Write a short story about AI' }
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
}
Model Selection Strategy
Available Model Categories
Flagship Models (Highest Quality):
const flagshipModels = {
claude: 'anthropic/claude-3.5-sonnet', // Best reasoning
gpt4: 'openai/gpt-4-turbo', // Best general purpose
gemini: 'google/gemini-pro-1.5', // Best long context
opus: 'anthropic/claude-3-opus', // Best complex tasks
};
Fast Models (Low Latency):
const fastModels = {
claude: 'anthropic/claude-3-haiku', // Fastest Claude
gpt35: 'openai/gpt-3.5-turbo', // Fast GPT
gemini: 'google/gemini-flash-1.5', // Fast Gemini
llama: 'meta-llama/llama-3.1-8b-instruct', // Fast open source
};
Cost-Optimized Models:
const budgetModels = {
haiku: 'anthropic/claude-3-haiku', // $0.25/$1.25 per 1M tokens
gemini: 'google/gemini-flash-1.5', // $0.075/$0.30 per 1M tokens
llama: 'meta-llama/llama-3.1-8b-instruct', // $0.06/$0.06 per 1M tokens
mixtral: 'mistralai/mixtral-8x7b-instruct', // $0.24/$0.24 per 1M tokens
};
Specialized Models:
const specializedModels = {
vision: 'openai/gpt-4-vision-preview', // Image understanding
code: 'anthropic/claude-3.5-sonnet', // Code generation
longContext: 'google/gemini-pro-1.5', // 2M token context
function: 'openai/gpt-4-turbo', // Function calling
};
Model Selection Logic
interface ModelSelector {
task: 'chat' | 'code' | 'vision' | 'function' | 'summary';
priority: 'quality' | 'speed' | 'cost';
maxCost?: number; // Max cost per 1M tokens
contextSize?: number;
}
function selectModel(criteria: ModelSelector): string {
if (criteria.task === 'vision') {
return 'openai/gpt-4-vision-preview';
}
if (criteria.task === 'code') {
return criteria.priority === 'quality'
? 'anthropic/claude-3.5-sonnet'
: 'meta-llama/llama-3.1-70b-instruct';
}
if (criteria.contextSize && criteria.contextSize > 100000) {
return 'google/gemini-pro-1.5'; // 2M context
}
// Default selection by priority
switch (criteria.priority) {
case 'quality':
return 'anthropic/claude-3.5-sonnet';
case 'speed':
return 'anthropic/claude-3-haiku';
case 'cost':
return criteria.maxCost && criteria.maxCost < 0.5
? 'google/gemini-flash-1.5'
: 'anthropic/claude-3-haiku';
default:
return 'openai/gpt-4-turbo';
}
}
// Usage
const model = selectModel({
task: 'code',
priority: 'quality',
});
Streaming Implementation
TypeScript Streaming with Error Handling
async function robustStreamingChat(
prompt: string,
model: string = 'anthropic/claude-3.5-sonnet'
) {
try {
const stream = await client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
stream: true,
max_tokens: 4000,
});
let fullResponse = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (delta?.content) {
fullResponse += delta.content;
process.stdout.write(delta.content);
}
// Handle function calls
if (delta?.function_call) {
console.log('\nFunction call:', delta.function_call);
}
// Check for finish reason
if (chunk.choices[0]?.finish_reason) {
console.log(`\n[Finished: ${chunk.choices[0].finish_reason}]`);
}
}
return fullResponse;
} catch (error) {
if (error instanceof Error) {
console.error('Streaming error:', error.message);
}
throw error;
}
}
Python Streaming
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ.get("OPENROUTER_API_KEY"),
)
def stream_chat(prompt: str, model: str = "anthropic/claude-3.5-sonnet"):
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True,
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
full_response += content
print(content, end="", flush=True)
print() # New line
return full_response
React Streaming Component
import { useState } from 'react';
function StreamingChat() {
const [response, setResponse] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
async function handleSubmit(prompt: string) {
setIsStreaming(true);
setResponse('');
try {
const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: prompt }],
stream: true,
}),
});
const reader = res.body?.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
try {
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content || '';
setResponse(prev => prev + content);
} catch (e) {
// Skip invalid JSON
}
}
}
}
} catch (error) {
console.error('Streaming error:', error);
} finally {
setIsStreaming(false);
}
}
return (
<div>
<textarea
value={response}
readOnly
rows={20}
cols={80}
placeholder="Response will appear here..."
/>
<button onClick={() => handleSubmit('Explain AI')}>
{isStreaming ? 'Streaming...' : 'Send'}
</button>
</div>
);
}
Function Calling
Basic Function Calling
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name, e.g. San Francisco',
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit'],
},
},
required: ['location'],
},
},
},
];
async function chatWithFunctions() {
const completion = await client.chat.completions.create({
model: 'openai/gpt-4-turbo',
messages: [
{ role: 'user', content: 'What is the weather in Tokyo?' }
],
tools,
tool_choice: 'auto',
});
const message = completion.choices[0].message;
if (message.tool_calls) {
for (const toolCall of message.tool_calls) {
console.log('Function:', toolCall.function.name);
console.log('Arguments:', toolCall.function.arguments);
// Execute function
const args = JSON.parse(toolCall.function.arguments);
const result = await getWeather(args.location, args.unit);
// Send result back
const followUp = await client.chat.completions.create({
model: 'openai/gpt-4-turbo',
messages: [
{ role: 'user', content: 'What is the weather in Tokyo?' },
message,
{
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
},
],
tools,
});
console.log(followUp.choices[0].message.content);
}
}
}
Multi-Step Function Calling
async function multiStepFunctionCall(userQuery: string) {
const messages = [{ role: 'user', content: userQuery }];
let iterationCount = 0;
const maxIterations = 5;
while (iterationCount < maxIterations) {
const completion = await client.chat.completions.create({
model: 'openai/gpt-4-turbo',
messages,
tools,
tool_choice: 'auto',
});
const message = completion.choices[0].message;
messages.push(message);
if (!message.tool_calls) {
// No more function calls, return final response
return message.content;
}
// Execute all function calls
for (const toolCall of message.tool_calls) {
const functionName = toolCall.function.name;
const args = JSON.parse(toolCall.function.arguments);
// Execute function (implement your function registry)
const result = await executeFunctionCall(functionName, args);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
iterationCount++;
}
throw new Error('Max iterations reached');
}
Cost Optimization
Token Counting and Cost Estimation
import { encoding_for_model } from 'tiktoken';
interface CostEstimate {
promptTokens: number;
completionTokens: number;
promptCost: number;
completionCost: number;
totalCost: number;
}
const modelPricing = {
'anthropic/claude-3.5-sonnet': { input: 3.00, output: 15.00 }, // per 1M tokens
'anthropic/claude-3-haiku': { input: 0.25, output: 1.25 },
'openai/gpt-4-turbo': { input: 10.00, output: 30.00 },
'openai/gpt-3.5-turbo': { input: 0.50, output: 1.50 },
'google/gemini-flash-1.5': { input: 0.075, output: 0.30 },
};
function estimateCost(
prompt: string,
expectedCompletion: number,
model: string
): CostEstimate {
const encoder = encoding_for_model('gpt-4'); // Approximation
const promptTokens = encoder.encode(prompt).length;
const completionTokens = expectedCompletion;
const pricing = modelPricing[model] || { input: 0, output: 0 };
const promptCost = (promptTokens / 1_000_000) * pricing.input;
const completionCost = (completionTokens / 1_000_000) * pricing.output;
return {
promptTokens,
completionTokens,
promptCost,
completionCost,
totalCost: promptCost + completionCost,
};
}
// Usage
const estimate = estimateCost(
'Explain quantum computing',
500, // Expected response tokens
'anthropic/claude-3.5-sonnet'
);
console.log(`Estimated cost: $${estimate.totalCost.toFixed(4)}`);
Dynamic Model Selection by Budget
async function budgetOptimizedChat(
prompt: string,
maxCostPerRequest: number = 0.01 // $0.01 max
) {
// Estimate with expensive model
const expensiveEstimate = estimateCost(
prompt,
1000,
'anthropic/claude-3.5-sonnet'
);
let selectedModel = 'anthropic/claude-3.5-sonnet';
if (expensiveEstimate.totalCost > maxCostPerRequest) {
// Try cheaper models
const cheapEstimate = estimateCost(
prompt,
1000,
'anthropic/claude-3-haiku'
);
if (cheapEstimate.totalCost > maxCostPerRequest) {
selectedModel = 'google/gemini-flash-1.5';
} else {
selectedModel = 'anthropic/claude-3-haiku';
}
}
console.log(`Selected model: ${selectedModel}`);
const completion = await client.chat.completions.create({
model: selectedModel,
messages: [{ role: 'user', content: prompt }],
});
return completion.choices[0].message.content;
}
Batching for Cost Reduction
async function batchProcess(prompts: string[], model: string) {
// Process multiple prompts in parallel with rate limiting
const concurrency = 5;
const results = [];
for (let i = 0; i < prompts.length; i += concurrency) {
const batch = prompts.slice(i, i + concurrency);
const batchResults = await Promise.all(
batch.map(prompt =>
client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
max_tokens: 500, // Limit tokens to control cost
})
)
);
results.push(...batchResults);
// Rate limiting delay
if (i + concurrency < prompts.length) {
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
return results;
}
Model Fallback and Retry Strategy
Automatic Fallback
const modelFallbackChain = [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4-turbo',
'anthropic/claude-3-haiku',
'google/gemini-flash-1.5',
];
async function chatWithFallback(
prompt: string,
maxRetries: number = 3
): Promise<string> {
for (const model of modelFallbackChain) {
try {
console.log(`Trying model: ${model}`);
const completion = await client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
max_tokens: 2000,
});
return completion.choices[0].message.content || '';
} catch (error) {
console.warn(`Model ${model} failed:`, error);
// Continue to next model
if (model === modelFallbackChain[modelFallbackChain.length - 1]) {
throw new Error('All models failed');
}
}
}
throw new Error('No models available');
}
Exponential Backoff for Rate Limits
async function retryWithBackoff<T>(
fn: () => Promise<T>,
maxRetries: number = 5
): Promise<T> {
let lastError: Error;
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
lastError = error as Error;
// Check if rate limit error
if (error.status === 429) {
const delay = Math.pow(2, i) * 1000; // Exponential backoff
console.log(`Rate limited. Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
} else {
throw error; // Non-retryable error
}
}
}
throw lastError!;
}
// Usage
const result = await retryWithBackoff(() =>
client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello' }],
})
);
Prompt Engineering Best Practices
System Prompts for Consistency
const systemPrompts = {
concise: 'You are a helpful assistant. Be concise and direct.',
detailed: 'You are a knowledgeable expert. Provide comprehensive answers with examples.',
code: 'You are an expert programmer. Provide clean, well-commented code with explanations.',
creative: 'You are a creative writing assistant. Be imaginative and engaging.',
};
async function chatWithPersonality(
prompt: string,
personality: keyof typeof systemPrompts
) {
const completion = await client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'system', content: systemPrompts[personality] },
{ role: 'user', content: prompt },
],
});
return completion.choices[0].message.content;
}
Few-Shot Prompting
async function fewShotClassification(text: string) {
const completion = await client.chat.completions.create({
model: 'openai/gpt-4-turbo',
messages: [
{
role: 'system',
content: 'Classify text sentiment as positive, negative, or neutral.',
},
{ role: 'user', content: 'I love this product!' },
{ role: 'assistant', content: 'positive' },
{ role: 'user', content: 'This is terrible.' },
{ role: 'assistant', content: 'negative' },
{ role: 'user', content: 'It works fine.' },
{ role: 'assistant', content: 'neutral' },
{ role: 'user', content: text },
],
});
return completion.choices[0].message.content;
}
Chain of Thought Prompting
async function reasoningTask(problem: string) {
const completion = await client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{
role: 'user',
content: `${problem}\n\nLet's solve this step by step:\n1.`,
},
],
max_tokens: 3000,
});
return completion.choices[0].message.content;
}
Rate Limits and Throttling
Rate Limit Handler
class RateLimitedClient {
private requestQueue: Array<() => Promise<any>> = [];
private processing = false;
private requestsPerMinute = 60;
private requestInterval = 60000 / this.requestsPerMinute;
async enqueue<T>(request: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.requestQueue.push(async () => {
try {
const result = await request();
resolve(result);
} catch (error) {
reject(error);
}
});
this.processQueue();
});
}
private async processQueue() {
if (this.processing || this.requestQueue.length === 0) return;
this.processing = true;
while (this.requestQueue.length > 0) {
const request = this.requestQueue.shift()!;
await request();
await new Promise(resolve => setTimeout(resolve, this.requestInterval));
}
this.processing = false;
}
}
// Usage
const rateLimitedClient = new RateLimitedClient();
const result = await rateLimitedClient.enqueue(() =>
client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello' }],
})
);
Vision Models
Image Understanding
async function analyzeImage(imageUrl: string, question: string) {
const completion = await client.chat.completions.create({
model: 'openai/gpt-4-vision-preview',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: question },
{ type: 'image_url', image_url: { url: imageUrl } },
],
},
],
max_tokens: 1000,
});
return completion.choices[0].message.content;
}
// Usage
const result = await analyzeImage(
'https://example.com/image.jpg',
'What objects are in this image?'
);
Multi-Image Analysis
async function compareImages(imageUrls: string[]) {
const completion = await client.chat.completions.create({
model: 'openai/gpt-4-vision-preview',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Compare these images and describe the differences:' },
...imageUrls.map(url => ({
type: 'image_url' as const,
image_url: { url },
})),
],
},
],
});
return completion.choices[0].message.content;
}
Error Handling and Monitoring
Comprehensive Error Handler
interface ErrorResponse {
error: {
message: string;
type: string;
code: string;
};
}
async function robustCompletion(prompt: string) {
try {
const completion = await client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: prompt }],
});
return completion.choices[0].message.content;
} catch (error: any) {
// Rate limit errors
if (error.status === 429) {
console.error('Rate limit exceeded. Please wait.');
throw new Error('RATE_LIMIT_EXCEEDED');
}
// Invalid API key
if (error.status === 401) {
console.error('Invalid API key');
throw new Error('INVALID_API_KEY');
}
// Model not found
if (error.status === 404) {
console.error('Model not found');
throw new Error('MODEL_NOT_FOUND');
}
// Server errors
if (error.status >= 500) {
console.error('OpenRouter server error');
throw new Error('SERVER_ERROR');
}
// Unknown error
console.error('Unknown error:', error);
throw error;
}
}
Request/Response Logging
class LoggingClient {
async chat(prompt: string, model: string) {
const startTime = Date.now();
console.log('[Request]', {
timestamp: new Date().toISOString(),
model,
promptLength: prompt.length,
});
try {
const completion = await client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
});
const duration = Date.now() - startTime;
console.log('[Response]', {
timestamp: new Date().toISOString(),
duration,
usage: completion.usage,
finishReason: completion.choices[0].finish_reason,
});
return completion;
} catch (error) {
console.error('[Error]', {
timestamp: new Date().toISOString(),
duration: Date.now() - startTime,
error,
});
throw error;
}
}
}
Best Practices
Model Selection:
- Use fast models (Haiku, Flash) for simple tasks
- Use flagship models (Sonnet, GPT-4) for complex reasoning
- Consider context size requirements
- Test multiple models for your use case
Cost Optimization:
- Estimate costs before requests
- Use cheaper models when possible
- Implement token limits
- Cache common responses
- Batch similar requests
Streaming:
- Always use streaming for user-facing apps
- Handle connection interruptions
- Show progress indicators
- Buffer partial responses
Error Handling:
- Implement retry logic with exponential backoff
- Use model fallbacks for reliability
- Log all errors for debugging
- Handle rate limits gracefully
Prompt Engineering:
- Use system prompts for consistency
- Implement few-shot learning for specific tasks
- Use chain-of-thought for complex reasoning
- Keep prompts concise to reduce costs
Rate Limiting:
- Respect API rate limits
- Implement request queuing
- Use exponential backoff
- Monitor usage metrics
Security:
- Never expose API keys in client code
- Use environment variables
- Implement server-side proxies
- Validate user inputs
Monitoring:
- Track token usage
- Monitor response times
- Log errors and failures
- Analyze model performance
Common Pitfalls
❌ Exposing API keys in frontend:
// WRONG - API key exposed
const client = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: 'sk-or-v1-...', // Exposed!
});
✅ Correct - Server-side proxy:
// Backend proxy
app.post('/api/chat', async (req, res) => {
const { prompt } = req.body;
const completion = await client.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: prompt }],
});
res.json(completion);
});
❌ Not handling streaming errors:
// WRONG - no error handling
for await (const chunk of stream) {
console.log(chunk.choices[0].delta.content);
}
✅ Correct - with error handling:
try {
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
} catch (error) {
console.error('Stream error:', error);
// Implement retry or fallback
}
❌ Ignoring rate limits:
// WRONG - no rate limiting
const promises = prompts.map(prompt => chat(prompt));
await Promise.all(promises); // May hit rate limits
✅ Correct - with rate limiting:
const results = [];
for (let i = 0; i < prompts.length; i += 5) {
const batch = prompts.slice(i, i + 5);
const batchResults = await Promise.all(batch.map(chat));
results.push(...batchResults);
await new Promise(r => setTimeout(r, 1000)); // Delay between batches
}
Performance Optimization
Caching Responses
const responseCache = new Map<string, string>();
async function cachedChat(prompt: string, model: string) {
const cacheKey = `${model}:${prompt}`;
if (responseCache.has(cacheKey)) {
console.log('Cache hit');
return responseCache.get(cacheKey)!;
}
const completion = await client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
});
const response = completion.choices[0].message.content || '';
responseCache.set(cacheKey, response);
return response;
}
Parallel Processing
async function parallelChat(prompts: string[], model: string) {
const results = await Promise.all(
prompts.map(prompt =>
client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
})
)
);
return results.map(r => r.choices[0].message.content);
}
Resources
- Documentation: https://openrouter.ai/docs
- API Reference: https://openrouter.ai/docs/api-reference
- Model List: https://openrouter.ai/models
- Pricing: https://openrouter.ai/docs/pricing
- Status Page: https://status.openrouter.ai
Related Skills
- MCP Servers: Integration with Model Context Protocol (when built)
- TypeScript API Integration: Type-safe OpenRouter clients
- Python API Integration: Python SDK usage patterns
Summary
- OpenRouter provides unified access to 200+ LLMs
- OpenAI-compatible API for easy migration
- Cost optimization through model selection and token management
- Streaming for responsive user experiences
- Function calling for tool integration
- Vision models for image understanding
- Fallback strategies for reliability
- Rate limiting and error handling essential
- Perfect for multi-model apps, cost-sensitive deployments, avoiding vendor lock-in