name	openai-responses
description	This skill provides comprehensive knowledge for working with OpenAI's Responses API, the unified stateful API for building agentic applications. It should be used when building AI agents that preserve reasoning across turns, integrating MCP servers for external tools, using built-in tools (Code Interpreter, File Search, Web Search, Image Generation), managing stateful conversations, implementing background processing, or migrating from Chat Completions API. Use when building agentic workflows, conversational AI with memory, tools-based applications, RAG systems, data analysis agents, or any application requiring OpenAI's reasoning models with persistent state. Covers both Node.js SDK and Cloudflare Workers implementations. Keywords: responses api, openai responses, stateful openai, openai mcp, code interpreter openai, file search openai, web search openai, image generation openai, reasoning preservation, agentic workflows, conversation state, background mode, chat completions migration, gpt-5, polymorphic outputs
license	MIT

OpenAI Responses API

Status: Production Ready Last Updated: 2025-10-25 API Launch: March 2025 Dependencies: openai@5.19.1+ (Node.js) or fetch API (Cloudflare Workers)

What Is the Responses API?

The Responses API (/v1/responses) is OpenAI's unified interface for building agentic applications, launched in March 2025. It fundamentally changes how you interact with OpenAI models by providing stateful conversations and a structured loop for reasoning and acting.

Key Innovation: Preserved Reasoning State

Unlike Chat Completions where reasoning is discarded between turns, Responses keeps the notebook open. The model's step-by-step thought processes survive into the next turn, improving performance by approximately 5% on TAUBench and enabling better multi-turn interactions.

Why Use Responses Over Chat Completions?

Feature	Chat Completions	Responses API	Benefit
State Management	Manual (you track history)	Automatic (conversation IDs)	Simpler code, less error-prone
Reasoning	Dropped between turns	Preserved across turns	Better multi-turn performance
Tools	Client-side round trips	Server-side hosted	Lower latency, simpler code
Output Format	Single message	Polymorphic (messages, reasoning, tool calls)	Richer debugging, better UX
Cache Utilization	Baseline	40-80% better	Lower costs, faster responses
MCP Support	Manual integration	Built-in	Easy external tool connections

Quick Start (5 Minutes)

1. Get API Key

# Sign up at https://platform.openai.com/
# Navigate to API Keys section
# Create new key and save securely
export OPENAI_API_KEY="sk-proj-..."

Why this matters:

API key required for all requests
Keep secure (never commit to git)
Use environment variables

2. Install SDK (Node.js)

npm install openai

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What are the 5 Ds of dodgeball?',
});

console.log(response.output_text);

CRITICAL:

Always use server-side (never expose API key in client code)
Model defaults to gpt-5 (can use gpt-5-mini, gpt-4o, etc.)
input can be string or array of messages

3. Or Use Direct API (Cloudflare Workers)

// No SDK needed - use fetch()
const response = await fetch('https://api.openai.com/v1/responses', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gpt-5',
    input: 'Hello, world!',
  }),
});

const data = await response.json();
console.log(data.output_text);

Why fetch?

No dependencies in edge environments
Full control over request/response
Works in Cloudflare Workers, Deno, Bun

Responses vs Chat Completions: Complete Comparison

When to Use Each

Use Responses API when:

✅ Building agentic applications (reasoning + actions)
✅ Need preserved reasoning state across turns
✅ Want built-in tools (Code Interpreter, File Search, Web Search)
✅ Using MCP servers for external integrations
✅ Implementing conversational AI with automatic state management
✅ Background processing for long-running tasks
✅ Need polymorphic outputs (messages, reasoning, tool calls)

Use Chat Completions when:

✅ Simple one-off text generation
✅ Fully stateless interactions (no conversation continuity needed)
✅ Legacy integrations (existing Chat Completions code)
✅ Very simple use cases without tools

Architecture Differences

Chat Completions Flow:

User Input → Model → Single Message → Done
(Reasoning discarded, state lost)

Responses API Flow:

User Input → Model (preserved reasoning) → Polymorphic Outputs
            ↓ (server-side tools)
    Tool Call → Tool Result → Model → Final Response
(Reasoning preserved, state maintained)

Performance Benefits

Cache Utilization:

Chat Completions: Baseline performance
Responses API: 40-80% better cache utilization
Result: Lower latency + reduced costs

Reasoning Performance:

Chat Completions: Reasoning dropped between turns
Responses API: Reasoning preserved across turns
Result: 5% better on TAUBench (GPT-5 with Responses vs Chat Completions)

Stateful Conversations

Automatic State Management

The Responses API can automatically manage conversation state using conversation IDs.

Creating a Conversation

// Create conversation with initial message
const conversation = await openai.conversations.create({
  metadata: { user_id: 'user_123' },
  items: [
    {
      type: 'message',
      role: 'user',
      content: 'Hello!',
    },
  ],
});

console.log(conversation.id); // "conv_abc123..."

Using Conversation ID

// First turn
const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: 'conv_abc123',
  input: 'What are the 5 Ds of dodgeball?',
});

console.log(response1.output_text);

// Second turn - model remembers previous context
const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: 'conv_abc123',
  input: 'Tell me more about the first one',
});

console.log(response2.output_text);
// Model automatically knows "first one" refers to first D from previous turn

Why this matters:

No manual history tracking required
Reasoning state preserved between turns
Automatic context management
Lower risk of context errors

Manual State Management (Alternative)

If you need full control, you can manually manage history:

let history = [
  { role: 'user', content: 'Tell me a joke' },
];

const response = await openai.responses.create({
  model: 'gpt-5',
  input: history,
  store: true, // Optional: store for retrieval later
});

// Add response to history
history = [
  ...history,
  ...response.output.map(el => ({
    role: el.role,
    content: el.content,
  })),
];

// Next turn
history.push({ role: 'user', content: 'Tell me another' });

const secondResponse = await openai.responses.create({
  model: 'gpt-5',
  input: history,
});

When to use manual management:

Need custom history pruning logic
Want to modify conversation history programmatically
Implementing custom caching strategies

Built-in Tools (Server-Side)

The Responses API includes server-side hosted tools that eliminate costly backend round trips.

Available Tools

Tool	Purpose	Use Case
Code Interpreter	Execute Python code	Data analysis, calculations, charts
File Search	RAG without vector stores	Search uploaded files for answers
Web Search	Real-time web information	Current events, fact-checking
Image Generation	DALL-E integration	Create images from descriptions
MCP	Connect external tools	Stripe, databases, custom APIs

Code Interpreter

Execute Python code server-side for data analysis, calculations, and visualizations.

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Calculate the mean, median, and mode of: 10, 20, 30, 40, 50',
  tools: [{ type: 'code_interpreter' }],
});

console.log(response.output_text);
// Model writes and executes Python code, returns results

Advanced Example: Data Analysis

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Analyze this sales data and create a bar chart showing monthly revenue: [data here]',
  tools: [{ type: 'code_interpreter' }],
});

// Check output for code execution results
response.output.forEach(item => {
  if (item.type === 'code_interpreter_call') {
    console.log('Code executed:', item.input);
    console.log('Result:', item.output);
  }
});

Why this matters:

No need to run Python locally
Sandboxed execution environment
Automatic chart generation
Can process uploaded files

File Search (RAG Without Vector Stores)

Search through uploaded files without building your own RAG pipeline.

// 1. Upload files first (one-time setup)
const file = await openai.files.create({
  file: fs.createReadStream('knowledge-base.pdf'),
  purpose: 'assistants',
});

// 2. Use file search
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What does the document say about pricing?',
  tools: [
    {
      type: 'file_search',
      file_ids: [file.id],
    },
  ],
});

console.log(response.output_text);
// Model searches file and provides answer with citations

Supported File Types:

PDFs, Word docs, text files
Markdown, HTML
Code files (Python, JavaScript, etc.)
Max: 512MB per file

Web Search

Get real-time information from the web.

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What are the latest updates on GPT-5?',
  tools: [{ type: 'web_search' }],
});

console.log(response.output_text);
// Model searches web and provides current information with sources

Why this matters:

No cutoff date limitations
Automatic source citations
Real-time data access
No need for external search APIs

Image Generation (DALL-E)

Generate images directly in the Responses API.

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Create an image of a futuristic cityscape at sunset',
  tools: [{ type: 'image_generation' }],
});

// Find image in output
response.output.forEach(item => {
  if (item.type === 'image_generation_call') {
    console.log('Image URL:', item.output.url);
  }
});

Models Available:

DALL-E 3 (default)
Various sizes and quality options

MCP Server Integration

The Responses API has built-in support for Model Context Protocol (MCP) servers, allowing you to connect external tools.

What Is MCP?

MCP is an open protocol that standardizes how applications provide context to LLMs. It allows you to:

Connect to external APIs (Stripe, databases, CRMs)
Use hosted MCP servers
Build custom tool integrations

Basic MCP Integration

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Roll 2d6 dice',
  tools: [
    {
      type: 'mcp',
      server_label: 'dice',
      server_url: 'https://example.com/mcp',
    },
  ],
});

// Model discovers available tools on MCP server and uses them
console.log(response.output_text);

MCP with Authentication (OAuth)

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Create a $20 payment link',
  tools: [
    {
      type: 'mcp',
      server_label: 'stripe',
      server_url: 'https://mcp.stripe.com',
      authorization: process.env.STRIPE_OAUTH_TOKEN,
    },
  ],
});

console.log(response.output_text);
// Model uses Stripe MCP server to create payment link

CRITICAL:

API does NOT store authorization tokens
Must provide token with each request
Use environment variables for security

Polymorphic Output: MCP Tool Calls

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Roll 2d4+1',
  tools: [
    {
      type: 'mcp',
      server_label: 'dice',
      server_url: 'https://dmcp.example.com',
    },
  ],
});

// Inspect tool calls
response.output.forEach(item => {
  if (item.type === 'mcp_call') {
    console.log('Tool:', item.name);
    console.log('Arguments:', item.arguments);
    console.log('Output:', item.output);
  }
  if (item.type === 'mcp_list_tools') {
    console.log('Available tools:', item.tools);
  }
});

Output Types:

mcp_list_tools - Tools discovered on server
mcp_call - Tool invocation and result
message - Final response to user

Reasoning Preservation

How It Works

The Responses API preserves the model's internal reasoning state across turns, unlike Chat Completions which discards it.

Visual Analogy:

Chat Completions: Model has a scratchpad, writes reasoning, then tears out the page before responding
Responses API: Model keeps the scratchpad open, previous reasoning visible for next turn

Performance Impact

TAUBench Results (GPT-5):

Chat Completions: Baseline score
Responses API: +5% better (purely from preserved reasoning)

Why This Matters:

Better multi-turn problem solving
More coherent long conversations
Improved step-by-step reasoning
Fewer context errors

Reasoning Summaries (Free!)

The Responses API provides reasoning summaries at no additional cost.

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Solve this complex math problem: [problem]',
});

// Inspect reasoning
response.output.forEach(item => {
  if (item.type === 'reasoning') {
    console.log('Model reasoning:', item.summary[0].text);
  }
  if (item.type === 'message') {
    console.log('Final answer:', item.content[0].text);
  }
});

Use Cases:

Debugging model decisions
Audit trails for compliance
Understanding model thought process
Building transparent AI systems

Background Mode (Long-Running Tasks)

For tasks that take longer than standard timeout limits, use background mode.

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Analyze this 500-page document and summarize key findings',
  background: true,
  tools: [{ type: 'file_search', file_ids: [fileId] }],
});

// Returns immediately with status
console.log(response.status); // "in_progress"
console.log(response.id); // Use to check status later

// Poll for completion
const checkStatus = async (responseId) => {
  const result = await openai.responses.retrieve(responseId);
  if (result.status === 'completed') {
    console.log(result.output_text);
  } else if (result.status === 'failed') {
    console.error('Task failed:', result.error);
  } else {
    // Still running, check again later
    setTimeout(() => checkStatus(responseId), 5000);
  }
};

checkStatus(response.id);

When to Use:

Large file processing
Complex calculations
Multi-step research tasks
Data analysis on large datasets

Timeout Limits:

Standard mode: 60 seconds
Background mode: Up to 10 minutes

Polymorphic Outputs

The Responses API returns multiple output types instead of a single message.

Output Types

Type	Description	Example
`message`	Text response to user	Final answer, explanation
`reasoning`	Model's internal thought process	Step-by-step reasoning summary
`code_interpreter_call`	Code execution	Python code + results
`mcp_call`	Tool invocation	Tool name, args, output
`mcp_list_tools`	Available tools	Tool definitions from MCP server
`file_search_call`	File search results	Matched chunks, citations
`web_search_call`	Web search results	URLs, snippets
`image_generation_call`	Image generation	Image URL

Processing Polymorphic Outputs

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Search the web for the latest AI news and summarize',
  tools: [{ type: 'web_search' }],
});

// Process different output types
response.output.forEach(item => {
  switch (item.type) {
    case 'reasoning':
      console.log('Reasoning:', item.summary[0].text);
      break;
    case 'web_search_call':
      console.log('Searched:', item.query);
      console.log('Sources:', item.results);
      break;
    case 'message':
      console.log('Response:', item.content[0].text);
      break;
  }
});

// Or use helper for text-only
console.log(response.output_text);

Why This Matters:

Better debugging (see all steps)
Audit trails (track all tool calls)
Richer UX (show progress to users)
Compliance (log all actions)

Migration from Chat Completions

Breaking Changes

Feature	Chat Completions	Responses API	Migration
Endpoint	`/v1/chat/completions`	`/v1/responses`	Update URL
Parameter	`messages`	`input`	Rename parameter
State	Manual (`messages` array)	Automatic (`conversation` ID)	Use conversation IDs
Tools	`tools` array with functions	Built-in types + MCP	Update tool definitions
Output	`choices[0].message.content`	`output_text` or `output` array	Update response parsing
Streaming	`data: {"choices":[...]}`	SSE with multiple item types	Update stream parser

Migration Example

Before (Chat Completions):

const response = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.choices[0].message.content);

After (Responses):

const response = await openai.responses.create({
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.output_text);

Key Differences:

chat.completions.create → responses.create
messages → input
system role → developer role
choices[0].message.content → output_text

When to Migrate

Migrate now if:

✅ Building new applications
✅ Need stateful conversations
✅ Using agentic patterns (reasoning + tools)
✅ Want better performance (preserved reasoning)

Stay on Chat Completions if:

✅ Simple one-off generations
✅ Legacy integrations
✅ No need for state management

Error Handling

Common Errors and Solutions

1. Session State Not Persisting

Error:

Conversation state not maintained between turns

Cause:

Not using conversation IDs
Using different conversation IDs per turn

Solution:

// Create conversation once
const conv = await openai.conversations.create();

// Reuse conversation ID for all turns
const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Same ID
  input: 'First message',
});

const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Same ID
  input: 'Follow-up message',
});

2. MCP Server Connection Failed

Error:

{
  "error": {
    "type": "mcp_connection_error",
    "message": "Failed to connect to MCP server"
  }
}

Causes:

Invalid server URL
Missing or expired authorization token
Server not responding

Solutions:

// 1. Verify URL is correct
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Test MCP',
  tools: [
    {
      type: 'mcp',
      server_label: 'test',
      server_url: 'https://api.example.com/mcp', // ✅ Full URL
      authorization: process.env.AUTH_TOKEN, // ✅ Valid token
    },
  ],
});

// 2. Test server URL manually
const testResponse = await fetch('https://api.example.com/mcp');
console.log(testResponse.status); // Should be 200

// 3. Check token expiration
console.log('Token expires:', parseJWT(token).exp);

3. Code Interpreter Timeout

Error:

{
  "error": {
    "type": "code_interpreter_timeout",
    "message": "Code execution exceeded time limit"
  }
}

Cause:

Code runs longer than 30 seconds

Solution:

// Use background mode for long-running code
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Process this large dataset',
  background: true, // ✅ Extended timeout
  tools: [{ type: 'code_interpreter' }],
});

// Poll for results
const result = await openai.responses.retrieve(response.id);

4. Image Generation Rate Limit

Error:

{
  "error": {
    "type": "rate_limit_error",
    "message": "DALL-E rate limit exceeded"
  }
}

Cause:

Too many image generation requests

Solution:

// Implement retry with exponential backoff
const generateImage = async (prompt, retries = 3) => {
  try {
    return await openai.responses.create({
      model: 'gpt-5',
      input: prompt,
      tools: [{ type: 'image_generation' }],
    });
  } catch (error) {
    if (error.type === 'rate_limit_error' && retries > 0) {
      const delay = (4 - retries) * 1000; // 1s, 2s, 3s
      await new Promise(resolve => setTimeout(resolve, delay));
      return generateImage(prompt, retries - 1);
    }
    throw error;
  }
};

5. File Search Relevance Issues

Problem:

File search returns irrelevant results

Solution:

// Use more specific queries
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Find sections about pricing in Q4 2024 specifically', // ✅ Specific
  // NOT: 'Find pricing' (too vague)
  tools: [{ type: 'file_search', file_ids: [fileId] }],
});

// Or filter results manually
response.output.forEach(item => {
  if (item.type === 'file_search_call') {
    const relevantChunks = item.results.filter(
      chunk => chunk.score > 0.7 // ✅ Only high-confidence matches
    );
  }
});

6. Cost Tracking Confusion

Problem:

Billing different than expected

Explanation:

Responses API bills for: input tokens + output tokens + tool usage + stored conversations
Chat Completions bills only: input tokens + output tokens

Solution:

// Monitor usage
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Hello',
  store: false, // ✅ Don't store if not needed
});

console.log('Usage:', response.usage);
// {
//   prompt_tokens: 10,
//   completion_tokens: 20,
//   tool_tokens: 5,
//   total_tokens: 35
// }

7. Conversation Not Found

Error:

{
  "error": {
    "type": "invalid_request_error",
    "message": "Conversation conv_xyz not found"
  }
}

Causes:

Conversation ID typo
Conversation deleted
Conversation expired (90 days)

Solution:

// Verify conversation exists before using
const conversations = await openai.conversations.list();
const exists = conversations.data.some(c => c.id === 'conv_xyz');

if (!exists) {
  // Create new conversation
  const newConv = await openai.conversations.create();
  // Use newConv.id
}

8. Tool Output Parsing Failed

Problem:

Can't access tool outputs correctly

Solution:

// Use helper methods
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Search for AI news',
  tools: [{ type: 'web_search' }],
});

// Helper: Get text-only output
console.log(response.output_text);

// Manual: Inspect all outputs
response.output.forEach(item => {
  console.log('Type:', item.type);
  console.log('Content:', item);
});

Production Patterns

Cost Optimization

1. Use Conversation IDs (Cache Benefits)

// ✅ GOOD: Reuse conversation ID
const conv = await openai.conversations.create();
const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id,
  input: 'Question 1',
});
// 40-80% better cache utilization

// ❌ BAD: New manual history each time
const response2 = await openai.responses.create({
  model: 'gpt-5',
  input: [...previousHistory, newMessage],
});
// No cache benefits

2. Disable Storage When Not Needed

// For one-off requests
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Quick question',
  store: false, // ✅ Don't store conversation
});

3. Use Smaller Models When Possible

// For simple tasks
const response = await openai.responses.create({
  model: 'gpt-5-mini', // ✅ 50% cheaper
  input: 'Summarize this paragraph',
});

Rate Limit Handling

const createResponseWithRetry = async (params, maxRetries = 3) => {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await openai.responses.create(params);
    } catch (error) {
      if (error.type === 'rate_limit_error' && i < maxRetries - 1) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        console.log(`Rate limited, retrying in ${delay}ms`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
};

Monitoring and Logging

const monitoredResponse = async (input) => {
  const startTime = Date.now();

  try {
    const response = await openai.responses.create({
      model: 'gpt-5',
      input,
    });

    // Log success metrics
    console.log({
      status: 'success',
      latency: Date.now() - startTime,
      tokens: response.usage.total_tokens,
      model: response.model,
      conversation: response.conversation_id,
    });

    return response;
  } catch (error) {
    // Log error metrics
    console.error({
      status: 'error',
      latency: Date.now() - startTime,
      error: error.message,
      type: error.type,
    });
    throw error;
  }
};

Node.js vs Cloudflare Workers

Node.js Implementation

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function handleRequest(input: string) {
  const response = await openai.responses.create({
    model: 'gpt-5',
    input,
    tools: [{ type: 'web_search' }],
  });

  return response.output_text;
}

Pros:

Full SDK support
Type safety
Streaming helpers

Cons:

Requires Node.js runtime
Larger bundle size

Cloudflare Workers Implementation

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { input } = await request.json();

    const response = await fetch('https://api.openai.com/v1/responses', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'gpt-5',
        input,
        tools: [{ type: 'web_search' }],
      }),
    });

    const data = await response.json();

    return new Response(data.output_text, {
      headers: { 'Content-Type': 'text/plain' },
    });
  },
};

Pros:

No dependencies
Edge deployment
Faster cold starts

Cons:

Manual request building
No type safety without custom types

Always Do / Never Do

✅ Always Do

Use conversation IDs for multi-turn interactions

const conv = await openai.conversations.create();
// Reuse conv.id for all related turns

Handle all output types in polymorphic responses

response.output.forEach(item => {
  if (item.type === 'reasoning') { /* log */ }
  if (item.type === 'message') { /* display */ }
});

Use background mode for long-running tasks

const response = await openai.responses.create({
  background: true, // ✅ For tasks >30s
  ...
});

Provide authorization tokens for MCP servers

tools: [{
  type: 'mcp',
  authorization: process.env.TOKEN, // ✅ Required
}]

Monitor token usage for cost control

console.log(response.usage.total_tokens);

❌ Never Do

Never expose API keys in client-side code

// ❌ DANGER: API key in browser
const response = await fetch('https://api.openai.com/v1/responses', {
  headers: { 'Authorization': 'Bearer sk-proj-...' }
});

Never assume single message output

// ❌ BAD: Ignores reasoning, tool calls
console.log(response.output[0].content);

// ✅ GOOD: Use helper or check all types
console.log(response.output_text);

Never reuse conversation IDs across users

// ❌ DANGER: User A sees User B's conversation
const sharedConv = 'conv_123';

Never ignore error types

// ❌ BAD: Generic error handling
try { ... } catch (e) { console.log('error'); }

// ✅ GOOD: Type-specific handling
catch (e) {
  if (e.type === 'rate_limit_error') { /* retry */ }
  if (e.type === 'mcp_connection_error') { /* alert */ }
}

Never poll faster than 1 second for background tasks

// ❌ BAD: Too frequent
setInterval(() => checkStatus(), 100);

// ✅ GOOD: Reasonable interval
setInterval(() => checkStatus(), 5000);

References

Official Documentation

Responses API Guide: https://platform.openai.com/docs/guides/responses
API Reference: https://platform.openai.com/docs/api-reference/responses
MCP Integration: https://platform.openai.com/docs/guides/tools-connectors-mcp
Blog Post (Why Responses API): https://developers.openai.com/blog/responses-api/
Starter App: https://github.com/openai/openai-responses-starter-app

Skill Resources

templates/ - Working code examples
references/responses-vs-chat-completions.md - Feature comparison
references/mcp-integration-guide.md - MCP server setup
references/built-in-tools-guide.md - Tool usage patterns
references/stateful-conversations.md - Conversation management
references/migration-guide.md - Chat Completions → Responses
references/top-errors.md - Common errors and solutions

Next Steps

✅ Read templates/basic-response.ts - Simple example
✅ Try templates/stateful-conversation.ts - Multi-turn chat
✅ Explore templates/mcp-integration.ts - External tools
✅ Review references/top-errors.md - Avoid common pitfalls
✅ Check references/migration-guide.md - If migrating from Chat Completions

Happy building with the Responses API! 🚀

Install Skill

SKILL.md

OpenAI Responses API

What Is the Responses API?

Key Innovation: Preserved Reasoning State

Why Use Responses Over Chat Completions?

Quick Start (5 Minutes)

1. Get API Key

2. Install SDK (Node.js)

3. Or Use Direct API (Cloudflare Workers)

Responses vs Chat Completions: Complete Comparison

When to Use Each

Architecture Differences

Performance Benefits

Stateful Conversations

Automatic State Management

Creating a Conversation

Using Conversation ID

Manual State Management (Alternative)

Built-in Tools (Server-Side)

Available Tools

Code Interpreter

File Search (RAG Without Vector Stores)

Web Search

Image Generation (DALL-E)

MCP Server Integration

What Is MCP?

Basic MCP Integration

MCP with Authentication (OAuth)

Polymorphic Output: MCP Tool Calls

Reasoning Preservation

How It Works

Performance Impact

Reasoning Summaries (Free!)

Background Mode (Long-Running Tasks)

Polymorphic Outputs

Output Types

Processing Polymorphic Outputs

Migration from Chat Completions

Breaking Changes

Migration Example

When to Migrate

Error Handling

Common Errors and Solutions

1. Session State Not Persisting

2. MCP Server Connection Failed

3. Code Interpreter Timeout

4. Image Generation Rate Limit

5. File Search Relevance Issues

6. Cost Tracking Confusion

7. Conversation Not Found

8. Tool Output Parsing Failed

Production Patterns

Cost Optimization

Rate Limit Handling

Monitoring and Logging

Node.js vs Cloudflare Workers

Node.js Implementation

Cloudflare Workers Implementation

Always Do / Never Do

✅ Always Do

❌ Never Do

References

Official Documentation

Skill Resources

Next Steps