name	runpod-serverless
description	Create and deploy a RunPod Serverless GPU worker. Use when setting up a new RunPod worker, creating a serverless endpoint, or deploying ML models to RunPod. Guides through handler.py, Dockerfile, hub.json, tests.json, GitHub repo setup, and release creation.
allowed-tools	Read, Write, Edit, Bash, Glob, Grep

RunPod Serverless Worker Setup

Create and deploy a RunPod Serverless worker from a GitHub repository.

Overview

This skill guides you through creating a complete RunPod Serverless worker that can be deployed via the RunPod Hub. It includes all required files, proper handler format, and automated deployment via GitHub releases.

Prerequisites

GitHub account
RunPod account
GitHub Personal Access Token (for automated releases)

Steps

Step 1: Gather Information

Ask the user for:

Worker name - e.g., my-awesome-worker
GitHub username - e.g., profzeller
Worker description - What does this worker do?
Category - One of: image, video, audio, text, embedding, other
Base image - e.g., nvidia/cuda:12.1.0-runtime-ubuntu22.04, python:3.11-slim
GPU requirements - e.g., RTX 4090, L40S, A100
VRAM requirements - e.g., 8GB, 24GB, 48GB
Python dependencies - List of pip packages needed

Step 2: Create Directory Structure

infrastructure/runpod/{worker-name}/
├── .runpod/
│   ├── hub.json          # RunPod Hub configuration
│   └── tests.json        # Test definitions
├── Dockerfile            # Container definition
├── handler.py            # RunPod serverless handler
├── requirements.txt      # Python dependencies (required)
└── README.md             # Documentation with badge

Create the directory:

mkdir -p infrastructure/runpod/{worker-name}/.runpod

Step 3: Create handler.py

The handler MUST follow this exact format:

"""
RunPod Serverless Handler for {Worker Name}
{Description}
"""

import runpod

# Global resources (loaded once per worker)
model = None


def load_resources():
    """Load models/resources on worker startup."""
    global model
    if model is not None:
        return model

    print("[Handler] Loading resources...")
    # TODO: Load your model/resources here
    # model = YourModel.load()
    print("[Handler] Resources loaded")
    return model


def handler(job: dict) -> dict:
    """
    Main RunPod handler function.

    Args:
        job: Dictionary containing 'input' key with request data

    Returns:
        Dictionary with results or error information
    """
    job_input = job.get("input", {})

    # Validate required inputs
    # Example:
    # if not job_input.get("prompt"):
    #     return {"error": "No prompt provided"}

    try:
        # Load resources
        resources = load_resources()

        # TODO: Process the request
        # result = resources.process(job_input)

        return {
            "status": "success",
            # "output": result,
        }

    except Exception as e:
        import traceback
        return {
            "error": str(e),
            "traceback": traceback.format_exc()
        }


# Optional: Pre-load resources on worker start
print("[Handler] Pre-loading resources...")
try:
    load_resources()
except Exception as e:
    print(f"[Handler] Warning: Could not pre-load: {e}")

# REQUIRED: RunPod serverless entry point
runpod.serverless.start({"handler": handler})

Critical Requirements:

import runpod at the top
handler(job) function that extracts job.get("input", {})
runpod.serverless.start({"handler": handler}) at the end

Step 4: Create Dockerfile

# Base image with CUDA support (adjust version as needed)
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# Install Python and system dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    python3-venv \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements first (for Docker layer caching)
COPY requirements.txt* ./
RUN if [ -f requirements.txt ]; then pip3 install --no-cache-dir -r requirements.txt; fi

# Install RunPod SDK
RUN pip3 install --no-cache-dir runpod

# Copy handler
COPY handler.py .

# TODO: Add any additional setup (model downloads, etc.)

# Run the handler
CMD ["python3", "handler.py"]

Step 5: Create .runpod/hub.json

{
  "title": "{Worker Display Name}",
  "description": "{Worker description for RunPod Hub}",
  "type": "serverless",
  "category": "{category}",
  "config": {
    "runsOn": "GPU",
    "containerDiskInGb": 20,
    "presets": [
      {
        "name": "Default",
        "defaults": {}
      }
    ],
    "env": [
      {
        "key": "EXAMPLE_VAR",
        "input": {
          "name": "Example Variable",
          "type": "string",
          "description": "Description of this variable",
          "default": ""
        }
      }
    ]
  }
}

Category options: image, video, audio, text, embedding, other

Step 6: Create .runpod/tests.json

{
  "tests": [
    {
      "name": "basic_test",
      "input": {
        "example_param": "test_value"
      },
      "timeout": 60000
    }
  ],
  "config": {
    "gpuTypeId": "{GPU Type}",
    "gpuCount": 1,
    "env": [],
    "allowedCudaVersions": [
      "12.7",
      "12.6",
      "12.5",
      "12.4",
      "12.3",
      "12.2",
      "12.1"
    ]
  }
}

GPU Type examples: NVIDIA GeForce RTX 4090, NVIDIA L40S, NVIDIA A100 80GB PCIe

Step 7: Create README.md

# {Worker Name}

[![RunPod](https://api.runpod.io/badge/{github-username}/{repo-name})](https://console.runpod.io/hub/{github-username}/{repo-name})

{Worker description}

## Features

- Feature 1
- Feature 2

## API Usage

### Basic Request

```json
{
  "input": {
    "param1": "value1"
  }
}

Response

{
  "status": "success",
  "output": "..."
}

Requirements

GPU: {GPU requirement}
VRAM: {VRAM requirement}

Local Development

docker build -t {worker-name} .
docker run --gpus all {worker-name}

License

MIT


### Step 8: Initialize Git Repository

```bash
cd infrastructure/runpod/{worker-name}

# Initialize git
git init

# Configure git user (if needed)
git config user.email "{user}@users.noreply.github.com"
git config user.name "{username}"

# Add all files
git add .

# Initial commit
git commit -m "Initial release of {worker-name} for RunPod Serverless"

# Rename branch to main
git branch -M main

Step 9: Create GitHub Repository

The user should create a new repository on GitHub:

Go to https://github.com/new
Repository name: {worker-name}
Keep it public (required for RunPod Hub)
Do NOT initialize with README (we already have one)

Step 10: Push to GitHub

git remote add origin https://github.com/{username}/{worker-name}.git
git push -u origin main

Step 11: Create GitHub Release

Using the GitHub API (requires token):

curl -X POST "https://api.github.com/repos/{username}/{worker-name}/releases" \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer {GITHUB_TOKEN}" \
  -d '{
    "tag_name": "v1.0.0",
    "name": "v1.0.0 - Initial Release",
    "body": "Initial release of {worker-name} for RunPod Serverless.",
    "draft": false,
    "prerelease": false
  }'

Or manually:

Go to https://github.com/{username}/{worker-name}/releases/new
Tag: v1.0.0
Title: v1.0.0 - Initial Release
Publish release

Step 12: Deploy on RunPod

After the release is created:

Go to https://console.runpod.io/hub/{username}/{worker-name}
Click "Deploy"
Configure:
- GPU Type: Select appropriate GPU
- Max Workers: 1-3
- Idle Timeout: 5 seconds (for cost savings)
Add Network Volume if needed for model storage
Deploy endpoint

Step 13: Get Endpoint ID

After deployment:

Go to RunPod Serverless dashboard
Find your endpoint
Copy the Endpoint ID (e.g., abc123xyz)

This ID is used to call the endpoint:

https://api.runpod.ai/v2/{endpoint-id}/runsync

Common Handler Patterns

Image Generation (returns base64)

import base64

def handler(job):
    job_input = job.get("input", {})
    prompt = job_input.get("prompt", "")

    # Generate image
    image = generate_image(prompt)

    # Convert to base64
    img_bytes = image_to_bytes(image)
    img_b64 = base64.b64encode(img_bytes).decode("utf-8")

    return {
        "image_base64": img_b64,
        "width": image.width,
        "height": image.height
    }

Audio Generation (returns base64 WAV)

import base64
import io
import soundfile as sf

def handler(job):
    job_input = job.get("input", {})
    text = job_input.get("text", "")

    # Generate audio
    audio_array, sample_rate = generate_audio(text)

    # Convert to base64 WAV
    buffer = io.BytesIO()
    sf.write(buffer, audio_array, sample_rate, format="WAV")
    buffer.seek(0)
    audio_b64 = base64.b64encode(buffer.read()).decode("utf-8")

    return {
        "audio_base64": audio_b64,
        "sample_rate": sample_rate,
        "duration_seconds": len(audio_array) / sample_rate
    }

Long-Running Jobs (async polling)

For jobs taking more than 30 seconds, clients should use async mode:

# Client calls /run instead of /runsync
POST https://api.runpod.ai/v2/{endpoint-id}/run

# Then polls for status
GET https://api.runpod.ai/v2/{endpoint-id}/status/{job-id}

Troubleshooting

Handler not found

Ensure runpod.serverless.start({"handler": handler}) is at the end of handler.py

GPU not available

Check that Dockerfile uses CUDA base image
Verify GPU type is available in your region

Model download fails

Use Network Volume for large models
Set HF_TOKEN for gated models

Timeout errors

Increase timeout in tests.json
Consider async mode for long jobs

runpod-serverless

Install Skill

SKILL.md