| name | runpod-serverless |
| description | Create and deploy a RunPod Serverless GPU worker. Use when setting up a new RunPod worker, creating a serverless endpoint, or deploying ML models to RunPod. Guides through handler.py, Dockerfile, hub.json, tests.json, GitHub repo setup, and release creation. |
| allowed-tools | Read, Write, Edit, Bash, Glob, Grep |
RunPod Serverless Worker Setup
Create and deploy a RunPod Serverless worker from a GitHub repository.
Overview
This skill guides you through creating a complete RunPod Serverless worker that can be deployed via the RunPod Hub. It includes all required files, proper handler format, and automated deployment via GitHub releases.
Prerequisites
- GitHub account
- RunPod account
- GitHub Personal Access Token (for automated releases)
Steps
Step 1: Gather Information
Ask the user for:
- Worker name - e.g.,
my-awesome-worker - GitHub username - e.g.,
profzeller - Worker description - What does this worker do?
- Category - One of:
image,video,audio,text,embedding,other - Base image - e.g.,
nvidia/cuda:12.1.0-runtime-ubuntu22.04,python:3.11-slim - GPU requirements - e.g.,
RTX 4090,L40S,A100 - VRAM requirements - e.g.,
8GB,24GB,48GB - Python dependencies - List of pip packages needed
Step 2: Create Directory Structure
infrastructure/runpod/{worker-name}/
├── .runpod/
│ ├── hub.json # RunPod Hub configuration
│ └── tests.json # Test definitions
├── Dockerfile # Container definition
├── handler.py # RunPod serverless handler
├── requirements.txt # Python dependencies (required)
└── README.md # Documentation with badge
Create the directory:
mkdir -p infrastructure/runpod/{worker-name}/.runpod
Step 3: Create handler.py
The handler MUST follow this exact format:
"""
RunPod Serverless Handler for {Worker Name}
{Description}
"""
import runpod
# Global resources (loaded once per worker)
model = None
def load_resources():
"""Load models/resources on worker startup."""
global model
if model is not None:
return model
print("[Handler] Loading resources...")
# TODO: Load your model/resources here
# model = YourModel.load()
print("[Handler] Resources loaded")
return model
def handler(job: dict) -> dict:
"""
Main RunPod handler function.
Args:
job: Dictionary containing 'input' key with request data
Returns:
Dictionary with results or error information
"""
job_input = job.get("input", {})
# Validate required inputs
# Example:
# if not job_input.get("prompt"):
# return {"error": "No prompt provided"}
try:
# Load resources
resources = load_resources()
# TODO: Process the request
# result = resources.process(job_input)
return {
"status": "success",
# "output": result,
}
except Exception as e:
import traceback
return {
"error": str(e),
"traceback": traceback.format_exc()
}
# Optional: Pre-load resources on worker start
print("[Handler] Pre-loading resources...")
try:
load_resources()
except Exception as e:
print(f"[Handler] Warning: Could not pre-load: {e}")
# REQUIRED: RunPod serverless entry point
runpod.serverless.start({"handler": handler})
Critical Requirements:
import runpodat the tophandler(job)function that extractsjob.get("input", {})runpod.serverless.start({"handler": handler})at the end
Step 4: Create Dockerfile
# Base image with CUDA support (adjust version as needed)
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
# Install Python and system dependencies
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
python3-venv \
git \
wget \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy requirements first (for Docker layer caching)
COPY requirements.txt* ./
RUN if [ -f requirements.txt ]; then pip3 install --no-cache-dir -r requirements.txt; fi
# Install RunPod SDK
RUN pip3 install --no-cache-dir runpod
# Copy handler
COPY handler.py .
# TODO: Add any additional setup (model downloads, etc.)
# Run the handler
CMD ["python3", "handler.py"]
Step 5: Create .runpod/hub.json
{
"title": "{Worker Display Name}",
"description": "{Worker description for RunPod Hub}",
"type": "serverless",
"category": "{category}",
"config": {
"runsOn": "GPU",
"containerDiskInGb": 20,
"presets": [
{
"name": "Default",
"defaults": {}
}
],
"env": [
{
"key": "EXAMPLE_VAR",
"input": {
"name": "Example Variable",
"type": "string",
"description": "Description of this variable",
"default": ""
}
}
]
}
}
Category options: image, video, audio, text, embedding, other
Step 6: Create .runpod/tests.json
{
"tests": [
{
"name": "basic_test",
"input": {
"example_param": "test_value"
},
"timeout": 60000
}
],
"config": {
"gpuTypeId": "{GPU Type}",
"gpuCount": 1,
"env": [],
"allowedCudaVersions": [
"12.7",
"12.6",
"12.5",
"12.4",
"12.3",
"12.2",
"12.1"
]
}
}
GPU Type examples: NVIDIA GeForce RTX 4090, NVIDIA L40S, NVIDIA A100 80GB PCIe
Step 7: Create README.md
# {Worker Name}
[](https://console.runpod.io/hub/{github-username}/{repo-name})
{Worker description}
## Features
- Feature 1
- Feature 2
## API Usage
### Basic Request
```json
{
"input": {
"param1": "value1"
}
}
Response
{
"status": "success",
"output": "..."
}
Requirements
- GPU: {GPU requirement}
- VRAM: {VRAM requirement}
Local Development
docker build -t {worker-name} .
docker run --gpus all {worker-name}
License
MIT
### Step 8: Initialize Git Repository
```bash
cd infrastructure/runpod/{worker-name}
# Initialize git
git init
# Configure git user (if needed)
git config user.email "{user}@users.noreply.github.com"
git config user.name "{username}"
# Add all files
git add .
# Initial commit
git commit -m "Initial release of {worker-name} for RunPod Serverless"
# Rename branch to main
git branch -M main
Step 9: Create GitHub Repository
The user should create a new repository on GitHub:
- Go to https://github.com/new
- Repository name:
{worker-name} - Keep it public (required for RunPod Hub)
- Do NOT initialize with README (we already have one)
Step 10: Push to GitHub
git remote add origin https://github.com/{username}/{worker-name}.git
git push -u origin main
Step 11: Create GitHub Release
Using the GitHub API (requires token):
curl -X POST "https://api.github.com/repos/{username}/{worker-name}/releases" \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer {GITHUB_TOKEN}" \
-d '{
"tag_name": "v1.0.0",
"name": "v1.0.0 - Initial Release",
"body": "Initial release of {worker-name} for RunPod Serverless.",
"draft": false,
"prerelease": false
}'
Or manually:
- Go to
https://github.com/{username}/{worker-name}/releases/new - Tag:
v1.0.0 - Title:
v1.0.0 - Initial Release - Publish release
Step 12: Deploy on RunPod
After the release is created:
- Go to https://console.runpod.io/hub/{username}/{worker-name}
- Click "Deploy"
- Configure:
- GPU Type: Select appropriate GPU
- Max Workers: 1-3
- Idle Timeout: 5 seconds (for cost savings)
- Add Network Volume if needed for model storage
- Deploy endpoint
Step 13: Get Endpoint ID
After deployment:
- Go to RunPod Serverless dashboard
- Find your endpoint
- Copy the Endpoint ID (e.g.,
abc123xyz)
This ID is used to call the endpoint:
https://api.runpod.ai/v2/{endpoint-id}/runsync
Common Handler Patterns
Image Generation (returns base64)
import base64
def handler(job):
job_input = job.get("input", {})
prompt = job_input.get("prompt", "")
# Generate image
image = generate_image(prompt)
# Convert to base64
img_bytes = image_to_bytes(image)
img_b64 = base64.b64encode(img_bytes).decode("utf-8")
return {
"image_base64": img_b64,
"width": image.width,
"height": image.height
}
Audio Generation (returns base64 WAV)
import base64
import io
import soundfile as sf
def handler(job):
job_input = job.get("input", {})
text = job_input.get("text", "")
# Generate audio
audio_array, sample_rate = generate_audio(text)
# Convert to base64 WAV
buffer = io.BytesIO()
sf.write(buffer, audio_array, sample_rate, format="WAV")
buffer.seek(0)
audio_b64 = base64.b64encode(buffer.read()).decode("utf-8")
return {
"audio_base64": audio_b64,
"sample_rate": sample_rate,
"duration_seconds": len(audio_array) / sample_rate
}
Long-Running Jobs (async polling)
For jobs taking more than 30 seconds, clients should use async mode:
# Client calls /run instead of /runsync
POST https://api.runpod.ai/v2/{endpoint-id}/run
# Then polls for status
GET https://api.runpod.ai/v2/{endpoint-id}/status/{job-id}
Troubleshooting
Handler not found
- Ensure
runpod.serverless.start({"handler": handler})is at the end of handler.py
GPU not available
- Check that Dockerfile uses CUDA base image
- Verify GPU type is available in your region
Model download fails
- Use Network Volume for large models
- Set HF_TOKEN for gated models
Timeout errors
- Increase timeout in tests.json
- Consider async mode for long jobs