name	openrouter-prod-checklist
description	Pre-launch production readiness checklist for OpenRouter. Use when preparing to deploy to production. Trigger with phrases like 'openrouter production', 'openrouter go-live', 'openrouter launch checklist', 'deploy openrouter'.
allowed-tools	Read, Write, Edit, Grep
version	1.0.0
license	MIT
author	Jeremy Longshore <jeremy@intentsolutions.io>

OpenRouter Production Checklist

Overview

This skill provides a comprehensive checklist covering security, monitoring, error handling, and operational readiness for production OpenRouter deployments.

Prerequisites

Working OpenRouter integration
Production infrastructure ready

Instructions

Follow these steps to implement this skill:

Verify Prerequisites: Ensure all prerequisites listed above are met
Review the Implementation: Study the code examples and patterns below
Adapt to Your Environment: Modify configuration values for your setup
Test the Integration: Run the verification steps to confirm functionality
Monitor in Production: Set up appropriate logging and monitoring

Overview

This skill provides a comprehensive checklist covering security, monitoring, error handling, and operational readiness for production OpenRouter deployments.

Prerequisites

Working OpenRouter integration
Production infrastructure ready

Pre-Launch Checklist

API Key Security

[ ] API key stored in environment variable, not code
[ ] Key has appropriate credit limit set
[ ] Production key separate from development key
[ ] Key has descriptive label in dashboard
[ ] HTTP-Referer header set for tracking
[ ] X-Title header set for identification

Error Handling

[ ] All API calls wrapped in try/catch
[ ] Rate limit errors handled with exponential backoff
[ ] Payment errors (402) caught and alerted
[ ] Model unavailability triggers fallback
[ ] Timeout handling implemented
[ ] Connection errors retry appropriately

Fallbacks Configured

[ ] Primary model selected
[ ] At least 2 fallback models configured
[ ] Fallbacks tested and verified working
[ ] Model unavailability detection in place
[ ] Automatic failover logic implemented

Cost Controls

[ ] Per-key credit limits set
[ ] max_tokens limit on all requests
[ ] Token usage monitoring active
[ ] Cost alerts configured
[ ] Budget thresholds defined

Monitoring

[ ] Request logging enabled
[ ] Error rate tracking active
[ ] Latency monitoring configured
[ ] Token usage dashboards set up
[ ] Alert thresholds defined

Security Checklist

Key Management

# ✓ Correct: Environment variable
api_key = os.environ["OPENROUTER_API_KEY"]

# ✗ Wrong: Hardcoded
api_key = "sk-or-v1-xxxxx"

# ✓ Correct: Secrets manager
from google.cloud import secretmanager
client = secretmanager.SecretManagerServiceClient()
api_key = client.access_secret_version(name=secret_path).payload.data.decode()

Request Validation

def validate_request(prompt: str, model: str):
    """Validate before sending to API."""
    if not prompt or not prompt.strip():
        raise ValueError("Empty prompt")

    if len(prompt) > 100_000:  # ~25K tokens
        raise ValueError("Prompt too long")

    if "/" not in model:
        raise ValueError("Model must include provider prefix")

    # Sanitize if needed
    return prompt.strip()

Response Handling

def safe_extract_response(response) -> str:
    """Safely extract content from response."""
    if not response.choices:
        return ""

    content = response.choices[0].message.content
    if content is None:
        return ""

    # Optionally sanitize output
    return content

Production Configuration

Recommended Defaults

PRODUCTION_CONFIG = {
    # Timeouts
    "timeout": 60.0,          # Connection timeout
    "max_retries": 3,         # Retry count

    # Token limits
    "max_tokens": 4096,       # Default response limit
    "temperature": 0.7,       # Balanced creativity

    # Headers
    "http_referer": "https://your-app.com",
    "x_title": "Your App Name",

    # Fallback models
    "fallback_models": [
        "anthropic/claude-3.5-sonnet",
        "openai/gpt-4-turbo",
        "meta-llama/llama-3.1-70b-instruct"
    ]
}

Production Client

class ProductionOpenRouterClient:
    def __init__(self, config: dict = None):
        config = config or PRODUCTION_CONFIG

        self.client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=os.environ["OPENROUTER_API_KEY"],
            timeout=config.get("timeout", 60.0),
            max_retries=config.get("max_retries", 3),
            default_headers={
                "HTTP-Referer": config.get("http_referer", ""),
                "X-Title": config.get("x_title", ""),
            }
        )
        self.fallback_models = config.get("fallback_models", [])
        self.max_tokens = config.get("max_tokens", 4096)

    def chat(
        self,
        prompt: str,
        model: str = "anthropic/claude-3.5-sonnet",
        **kwargs
    ):
        models_to_try = [model] + [
            m for m in self.fallback_models if m != model
        ]

        last_error = None
        for try_model in models_to_try:
            try:
                return self.client.chat.completions.create(
                    model=try_model,
                    messages=[{"role": "user", "content": prompt}],
                    max_tokens=kwargs.get("max_tokens", self.max_tokens),
                    **{k: v for k, v in kwargs.items() if k != "max_tokens"}
                )
            except Exception as e:
                last_error = e
                if "unavailable" in str(e).lower():
                    continue
                raise

        raise last_error or Exception("All models failed")

Monitoring Setup

Basic Metrics

import time
from dataclasses import dataclass
from typing import Optional

@dataclass
class RequestMetrics:
    model: str
    prompt_tokens: int
    completion_tokens: int
    latency_ms: float
    success: bool
    error: Optional[str] = None

class MetricsCollector:
    def __init__(self):
        self.metrics = []

    def record(self, metrics: RequestMetrics):
        self.metrics.append(metrics)
        # Send to your monitoring system
        self._send_to_monitoring(metrics)

    def _send_to_monitoring(self, metrics: RequestMetrics):
        # Example: Send to StatsD/Datadog
        # statsd.gauge('openrouter.latency', metrics.latency_ms)
        # statsd.increment('openrouter.requests')
        pass

collector = MetricsCollector()

def monitored_chat(prompt: str, model: str):
    start = time.time()
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )

        collector.record(RequestMetrics(
            model=model,
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
            latency_ms=(time.time() - start) * 1000,
            success=True
        ))

        return response

    except Exception as e:
        collector.record(RequestMetrics(
            model=model,
            prompt_tokens=0,
            completion_tokens=0,
            latency_ms=(time.time() - start) * 1000,
            success=False,
            error=str(e)
        ))
        raise

Health Check Endpoint

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/health/openrouter")
def openrouter_health():
    try:
        # Quick health check
        response = client.chat.completions.create(
            model="openai/gpt-3.5-turbo",
            messages=[{"role": "user", "content": "ping"}],
            max_tokens=5,
            timeout=5.0
        )

        return jsonify({
            "status": "healthy",
            "model": "openai/gpt-3.5-turbo",
            "response_received": True
        })

    except Exception as e:
        return jsonify({
            "status": "unhealthy",
            "error": str(e)
        }), 503

Scaling Considerations

Connection Pooling

import httpx

# Use connection pooling for high-volume
http_client = httpx.Client(
    limits=httpx.Limits(
        max_keepalive_connections=20,
        max_connections=100
    )
)

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
    http_client=http_client
)

Async for High Throughput

from openai import AsyncOpenAI
import asyncio

async_client = AsyncOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"]
)

async def process_batch(prompts: list, max_concurrent: int = 10):
    semaphore = asyncio.Semaphore(max_concurrent)

    async def process_one(prompt):
        async with semaphore:
            return await async_client.chat.completions.create(
                model="openai/gpt-4-turbo",
                messages=[{"role": "user", "content": prompt}]
            )

    return await asyncio.gather(*[process_one(p) for p in prompts])

Launch Verification

Pre-Launch Tests

def run_production_verification():
    """Run before going live."""
    tests = []

    # Test 1: API connectivity
    try:
        models = requests.get(
            "https://openrouter.ai/api/v1/models",
            headers={"Authorization": f"Bearer {api_key}"}
        )
        tests.append(("API Connectivity", models.status_code == 200))
    except:
        tests.append(("API Connectivity", False))

    # Test 2: Primary model works
    try:
        response = client.chat.completions.create(
            model="anthropic/claude-3.5-sonnet",
            messages=[{"role": "user", "content": "test"}],
            max_tokens=5
        )
        tests.append(("Primary Model", True))
    except:
        tests.append(("Primary Model", False))

    # Test 3: Fallback models work
    for fallback in ["openai/gpt-4-turbo", "meta-llama/llama-3.1-70b-instruct"]:
        try:
            response = client.chat.completions.create(
                model=fallback,
                messages=[{"role": "user", "content": "test"}],
                max_tokens=5
            )
            tests.append((f"Fallback: {fallback}", True))
        except:
            tests.append((f"Fallback: {fallback}", False))

    # Test 4: Credit balance
    try:
        key_info = requests.get(
            "https://openrouter.ai/api/v1/auth/key",
            headers={"Authorization": f"Bearer {api_key}"}
        ).json()
        has_credits = key_info["data"]["limit_remaining"] > 10
        tests.append(("Sufficient Credits", has_credits))
    except:
        tests.append(("Sufficient Credits", False))

    # Print results
    print("\nProduction Verification Results:")
    print("-" * 40)
    all_passed = True
    for name, passed in tests:
        status = "✓" if passed else "✗"
        print(f"  {status} {name}")
        if not passed:
            all_passed = False

    return all_passed

if not run_production_verification():
    print("\n⚠️  Some checks failed. Review before going live.")

Output

Successful execution produces:

Working OpenRouter integration
Verified API connectivity
Example responses demonstrating functionality

Error Handling

Common errors and solutions:

401 Unauthorized: Check API key format (must start with sk-or-)
429 Rate Limited: Implement exponential backoff
500 Server Error: Retry with backoff, check OpenRouter status page
Model Not Found: Verify model ID includes provider prefix

Examples

See code examples in sections above for complete, runnable implementations.

Resources

Output

Successful execution produces:

Working OpenRouter integration
Verified API connectivity
Example responses demonstrating functionality

Error Handling

Common errors and solutions:

401 Unauthorized: Check API key format (must start with sk-or-)
429 Rate Limited: Implement exponential backoff
500 Server Error: Retry with backoff, check OpenRouter status page
Model Not Found: Verify model ID includes provider prefix

Examples

See code examples in sections above for complete, runnable implementations.

openrouter-prod-checklist

Install Skill

SKILL.md

OpenRouter Production Checklist

Overview

Prerequisites

Instructions

Overview

Prerequisites

Pre-Launch Checklist

API Key Security

Error Handling

Fallbacks Configured

Cost Controls

Monitoring

Security Checklist

Key Management

Request Validation

Response Handling

Production Configuration

Recommended Defaults

Production Client

Monitoring Setup

Basic Metrics

Health Check Endpoint

Scaling Considerations

Connection Pooling

Async for High Throughput

Launch Verification

Pre-Launch Tests

Output

Error Handling

Examples

Resources

Output

Error Handling

Examples

Resources