| name | openrouter |
| description | OpenRouter API - Unified access to 400+ AI models through one API |
OpenRouter Skill
Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.
When to Use This Skill
This skill should be triggered when:
- Making API calls to multiple AI model providers through a unified interface
- Implementing model fallback strategies or auto-routing
- Working with OpenAI-compatible SDKs but targeting multiple providers
- Configuring advanced sampling parameters (temperature, top_p, penalties)
- Setting up streaming responses or structured JSON outputs
- Comparing costs across different AI models
- Building applications that need automatic provider failover
- Implementing function/tool calling across different models
- Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)
Quick Reference
Basic Chat Completion (Python)
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="<OPENROUTER_API_KEY>",
)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)
Basic Chat Completion (JavaScript/TypeScript)
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '<OPENROUTER_API_KEY>',
});
const completion = await openai.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);
cURL Request
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "What is the meaning of life?"}]
}'
Model Fallback Configuration (Python)
completion = client.chat.completions.create(
model="openai/gpt-4o",
extra_body={
"models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
},
messages=[{"role": "user", "content": "Your prompt here"}]
)
Model Fallback Configuration (TypeScript)
const completion = await client.chat.completions.create({
model: 'openai/gpt-4o',
models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
messages: [{ role: 'user', content: 'Your prompt here' }],
});
Auto Router (Dynamic Model Selection)
completion = client.chat.completions.create(
model="openrouter/auto", # Automatically selects best model for the prompt
messages=[{"role": "user", "content": "Your prompt here"}]
)
Advanced Parameters Example
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a creative story"}],
temperature=0.8, # Higher for creativity (0.0-2.0)
max_tokens=500, # Limit response length
top_p=0.9, # Nucleus sampling (0.0-1.0)
frequency_penalty=0.5, # Reduce repetition (-2.0-2.0)
presence_penalty=0.3 # Encourage topic diversity (-2.0-2.0)
)
Streaming Response
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='')
JSON Mode (Structured Output)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
}],
response_format={"type": "json_object"}
)
Deterministic Output with Seed
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Generate a random number"}],
seed=42, # Same seed = same output (when supported)
temperature=0.0 # Deterministic sampling
)
Key Concepts
Model Routing
OpenRouter provides intelligent routing capabilities:
- Auto Router (
openrouter/auto): Automatically selects the best model based on your prompt using NotDiamond - Fallback Models: Specify multiple models that automatically retry if primary fails
- Provider Routing: Automatically routes across providers for reliability
Authentication
- Uses Bearer token authentication with API keys
- API keys can be managed programmatically
- Compatible with OpenAI SDK authentication patterns
Model Naming Convention
Models use the format provider/model-name:
openai/gpt-4o- OpenAI's GPT-4 Optimizedanthropic/claude-3.5-sonnet- Anthropic's Claude 3.5 Sonnetgoogle/gemini-2.0-flash-exp:free- Google's free Gemini modelopenrouter/auto- Auto-routing system
Sampling Parameters
Temperature (0.0-2.0, default: 1.0)
- Lower = more predictable, focused responses
- Higher = more creative, diverse responses
- Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work
Top P (0.0-1.0, default: 1.0)
- Limits choices to percentage of likely tokens
- Dynamic filtering of improbable options
- Balance between consistency and variety
Frequency/Presence Penalties (-2.0-2.0, default: 0.0)
- Frequency: Discourages repeating tokens proportional to use
- Presence: Simpler penalty not scaled by count
- Positive values reduce repetition, negative encourage reuse
Max Tokens (integer)
- Sets maximum response length
- Cannot exceed context length minus prompt length
- Use to control costs and enforce concise replies
Response Formats
- Standard JSON: Default chat completion format
- Streaming: Server-Sent Events (SSE) with
stream: true - JSON Mode: Guaranteed valid JSON with
response_format: {"type": "json_object"} - Structured Outputs: Schema-validated JSON responses
Advanced Features
- Tool/Function Calling: Connect models to external APIs
- Multimodal Inputs: Support for images, PDFs, audio
- Prompt Caching: Reduce costs for repeated prompts
- Web Search Integration: Enhanced responses with web data
- Zero Completion Insurance: Protection against failed responses
- Logprobs: Access token probabilities for confidence analysis
Reference Files
This skill includes comprehensive documentation in references/:
- llms-full.md - Complete list of available models with metadata
- llms-small.md - Curated subset of popular models
- llms.md - Standard model listings
Use view to read specific reference files when detailed model information is needed.
Working with This Skill
For Beginners
- Start with basic chat completion examples (Python/JavaScript/cURL above)
- Use the standard OpenAI SDK for easy integration
- Try simple model names like
openai/gpt-4ooranthropic/claude-3.5-sonnet - Keep parameters simple initially (just model and messages)
For Intermediate Users
- Implement model fallback arrays for reliability
- Experiment with sampling parameters (temperature, top_p)
- Use streaming for better UX in conversational apps
- Try
openrouter/autofor automatic model selection - Implement JSON mode for structured data extraction
For Advanced Users
- Fine-tune multiple sampling parameters together
- Implement custom routing logic with fallback chains
- Use logprobs for confidence scoring
- Leverage tool/function calling capabilities
- Optimize costs by selecting appropriate models per task
- Implement prompt caching strategies
- Use seed parameter for reproducible testing
Common Patterns
Error Handling with Fallbacks
try:
completion = client.chat.completions.create(
model="openai/gpt-4o",
extra_body={
"models": [
"anthropic/claude-3.5-sonnet",
"google/gemini-2.0-flash-exp:free"
]
},
messages=[{"role": "user", "content": "Your prompt"}]
)
except Exception as e:
print(f"All models failed: {e}")
Cost-Optimized Routing
# Use cheaper models for simple tasks
simple_completion = client.chat.completions.create(
model="google/gemini-2.0-flash-exp:free",
messages=[{"role": "user", "content": "Simple question"}]
)
# Use premium models for complex tasks
complex_completion = client.chat.completions.create(
model="openai/o1",
messages=[{"role": "user", "content": "Complex reasoning task"}]
)
Context-Aware Temperature
# Low temperature for factual responses
factual = client.chat.completions.create(
model="openai/gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
# High temperature for creative content
creative = client.chat.completions.create(
model="openai/gpt-4o",
temperature=1.2,
messages=[{"role": "user", "content": "Write a unique story opening"}]
)
Resources
Official Documentation
- API Reference: https://openrouter.ai/docs/api-reference/overview
- Quickstart Guide: https://openrouter.ai/docs/quickstart
- Model List: https://openrouter.ai/docs/models
- Parameters Guide: https://openrouter.ai/docs/api-reference/parameters
Key Endpoints
- Chat Completions:
POST https://openrouter.ai/api/v1/chat/completions - List Models:
GET https://openrouter.ai/api/v1/models - Generation Info:
GET https://openrouter.ai/api/v1/generation
Notes
- OpenRouter normalizes API schemas across all providers
- Uses OpenAI-compatible API format for easy migration
- Automatic provider fallback if models are rate-limited or down
- Pricing based on actual model used (important for fallbacks)
- Response includes metadata about which model processed the request
- All models support streaming via Server-Sent Events
- Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)
Best Practices
- Always implement fallbacks for production applications
- Use appropriate temperature based on task type (low for factual, high for creative)
- Set max_tokens to control costs and response length
- Enable streaming for better user experience in chat applications
- Use JSON mode when you need guaranteed structured output
- Test with seed parameter for reproducible results during development
- Monitor costs by selecting appropriate models per task
- Use auto-routing when unsure which model performs best
- Implement proper error handling for rate limits and failures
- Cache prompts for repeated requests to reduce costs