Claude Code Plugins

Community-maintained marketplace

Feedback

runpod-serverless

@profzeller/claude-skills
0
0

Create and deploy a RunPod Serverless GPU worker. Use when setting up a new RunPod worker, creating a serverless endpoint, or deploying ML models to RunPod. Guides through handler.py, Dockerfile, hub.json, tests.json, GitHub repo setup, and release creation.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name runpod-serverless
description Create and deploy a RunPod Serverless GPU worker. Use when setting up a new RunPod worker, creating a serverless endpoint, or deploying ML models to RunPod. Guides through handler.py, Dockerfile, hub.json, tests.json, GitHub repo setup, and release creation.
allowed-tools Read, Write, Edit, Bash, Glob, Grep

RunPod Serverless Worker Setup

Create and deploy a RunPod Serverless worker from a GitHub repository.

Overview

This skill guides you through creating a complete RunPod Serverless worker that can be deployed via the RunPod Hub. It includes all required files, proper handler format, and automated deployment via GitHub releases.

Prerequisites

  • GitHub account
  • RunPod account
  • GitHub Personal Access Token (for automated releases)

Steps

Step 1: Gather Information

Ask the user for:

  1. Worker name - e.g., my-awesome-worker
  2. GitHub username - e.g., profzeller
  3. Worker description - What does this worker do?
  4. Category - One of: image, video, audio, text, embedding, other
  5. Base image - e.g., nvidia/cuda:12.1.0-runtime-ubuntu22.04, python:3.11-slim
  6. GPU requirements - e.g., RTX 4090, L40S, A100
  7. VRAM requirements - e.g., 8GB, 24GB, 48GB
  8. Python dependencies - List of pip packages needed

Step 2: Create Directory Structure

infrastructure/runpod/{worker-name}/
├── .runpod/
│   ├── hub.json          # RunPod Hub configuration
│   └── tests.json        # Test definitions
├── Dockerfile            # Container definition
├── handler.py            # RunPod serverless handler
├── requirements.txt      # Python dependencies (required)
└── README.md             # Documentation with badge

Create the directory:

mkdir -p infrastructure/runpod/{worker-name}/.runpod

Step 3: Create handler.py

The handler MUST follow this exact format:

"""
RunPod Serverless Handler for {Worker Name}
{Description}
"""

import runpod

# Global resources (loaded once per worker)
model = None


def load_resources():
    """Load models/resources on worker startup."""
    global model
    if model is not None:
        return model

    print("[Handler] Loading resources...")
    # TODO: Load your model/resources here
    # model = YourModel.load()
    print("[Handler] Resources loaded")
    return model


def handler(job: dict) -> dict:
    """
    Main RunPod handler function.

    Args:
        job: Dictionary containing 'input' key with request data

    Returns:
        Dictionary with results or error information
    """
    job_input = job.get("input", {})

    # Validate required inputs
    # Example:
    # if not job_input.get("prompt"):
    #     return {"error": "No prompt provided"}

    try:
        # Load resources
        resources = load_resources()

        # TODO: Process the request
        # result = resources.process(job_input)

        return {
            "status": "success",
            # "output": result,
        }

    except Exception as e:
        import traceback
        return {
            "error": str(e),
            "traceback": traceback.format_exc()
        }


# Optional: Pre-load resources on worker start
print("[Handler] Pre-loading resources...")
try:
    load_resources()
except Exception as e:
    print(f"[Handler] Warning: Could not pre-load: {e}")

# REQUIRED: RunPod serverless entry point
runpod.serverless.start({"handler": handler})

Critical Requirements:

  • import runpod at the top
  • handler(job) function that extracts job.get("input", {})
  • runpod.serverless.start({"handler": handler}) at the end

Step 4: Create Dockerfile

# Base image with CUDA support (adjust version as needed)
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# Install Python and system dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    python3-venv \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements first (for Docker layer caching)
COPY requirements.txt* ./
RUN if [ -f requirements.txt ]; then pip3 install --no-cache-dir -r requirements.txt; fi

# Install RunPod SDK
RUN pip3 install --no-cache-dir runpod

# Copy handler
COPY handler.py .

# TODO: Add any additional setup (model downloads, etc.)

# Run the handler
CMD ["python3", "handler.py"]

Step 5: Create .runpod/hub.json

{
  "title": "{Worker Display Name}",
  "description": "{Worker description for RunPod Hub}",
  "type": "serverless",
  "category": "{category}",
  "config": {
    "runsOn": "GPU",
    "containerDiskInGb": 20,
    "presets": [
      {
        "name": "Default",
        "defaults": {}
      }
    ],
    "env": [
      {
        "key": "EXAMPLE_VAR",
        "input": {
          "name": "Example Variable",
          "type": "string",
          "description": "Description of this variable",
          "default": ""
        }
      }
    ]
  }
}

Category options: image, video, audio, text, embedding, other

Step 6: Create .runpod/tests.json

{
  "tests": [
    {
      "name": "basic_test",
      "input": {
        "example_param": "test_value"
      },
      "timeout": 60000
    }
  ],
  "config": {
    "gpuTypeId": "{GPU Type}",
    "gpuCount": 1,
    "env": [],
    "allowedCudaVersions": [
      "12.7",
      "12.6",
      "12.5",
      "12.4",
      "12.3",
      "12.2",
      "12.1"
    ]
  }
}

GPU Type examples: NVIDIA GeForce RTX 4090, NVIDIA L40S, NVIDIA A100 80GB PCIe

Step 7: Create README.md

# {Worker Name}

[![RunPod](https://api.runpod.io/badge/{github-username}/{repo-name})](https://console.runpod.io/hub/{github-username}/{repo-name})

{Worker description}

## Features

- Feature 1
- Feature 2

## API Usage

### Basic Request

```json
{
  "input": {
    "param1": "value1"
  }
}

Response

{
  "status": "success",
  "output": "..."
}

Requirements

  • GPU: {GPU requirement}
  • VRAM: {VRAM requirement}

Local Development

docker build -t {worker-name} .
docker run --gpus all {worker-name}

License

MIT


### Step 8: Initialize Git Repository

```bash
cd infrastructure/runpod/{worker-name}

# Initialize git
git init

# Configure git user (if needed)
git config user.email "{user}@users.noreply.github.com"
git config user.name "{username}"

# Add all files
git add .

# Initial commit
git commit -m "Initial release of {worker-name} for RunPod Serverless"

# Rename branch to main
git branch -M main

Step 9: Create GitHub Repository

The user should create a new repository on GitHub:

  1. Go to https://github.com/new
  2. Repository name: {worker-name}
  3. Keep it public (required for RunPod Hub)
  4. Do NOT initialize with README (we already have one)

Step 10: Push to GitHub

git remote add origin https://github.com/{username}/{worker-name}.git
git push -u origin main

Step 11: Create GitHub Release

Using the GitHub API (requires token):

curl -X POST "https://api.github.com/repos/{username}/{worker-name}/releases" \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer {GITHUB_TOKEN}" \
  -d '{
    "tag_name": "v1.0.0",
    "name": "v1.0.0 - Initial Release",
    "body": "Initial release of {worker-name} for RunPod Serverless.",
    "draft": false,
    "prerelease": false
  }'

Or manually:

  1. Go to https://github.com/{username}/{worker-name}/releases/new
  2. Tag: v1.0.0
  3. Title: v1.0.0 - Initial Release
  4. Publish release

Step 12: Deploy on RunPod

After the release is created:

  1. Go to https://console.runpod.io/hub/{username}/{worker-name}
  2. Click "Deploy"
  3. Configure:
    • GPU Type: Select appropriate GPU
    • Max Workers: 1-3
    • Idle Timeout: 5 seconds (for cost savings)
  4. Add Network Volume if needed for model storage
  5. Deploy endpoint

Step 13: Get Endpoint ID

After deployment:

  1. Go to RunPod Serverless dashboard
  2. Find your endpoint
  3. Copy the Endpoint ID (e.g., abc123xyz)

This ID is used to call the endpoint:

https://api.runpod.ai/v2/{endpoint-id}/runsync

Common Handler Patterns

Image Generation (returns base64)

import base64

def handler(job):
    job_input = job.get("input", {})
    prompt = job_input.get("prompt", "")

    # Generate image
    image = generate_image(prompt)

    # Convert to base64
    img_bytes = image_to_bytes(image)
    img_b64 = base64.b64encode(img_bytes).decode("utf-8")

    return {
        "image_base64": img_b64,
        "width": image.width,
        "height": image.height
    }

Audio Generation (returns base64 WAV)

import base64
import io
import soundfile as sf

def handler(job):
    job_input = job.get("input", {})
    text = job_input.get("text", "")

    # Generate audio
    audio_array, sample_rate = generate_audio(text)

    # Convert to base64 WAV
    buffer = io.BytesIO()
    sf.write(buffer, audio_array, sample_rate, format="WAV")
    buffer.seek(0)
    audio_b64 = base64.b64encode(buffer.read()).decode("utf-8")

    return {
        "audio_base64": audio_b64,
        "sample_rate": sample_rate,
        "duration_seconds": len(audio_array) / sample_rate
    }

Long-Running Jobs (async polling)

For jobs taking more than 30 seconds, clients should use async mode:

# Client calls /run instead of /runsync
POST https://api.runpod.ai/v2/{endpoint-id}/run

# Then polls for status
GET https://api.runpod.ai/v2/{endpoint-id}/status/{job-id}

Troubleshooting

Handler not found

  • Ensure runpod.serverless.start({"handler": handler}) is at the end of handler.py

GPU not available

  • Check that Dockerfile uses CUDA base image
  • Verify GPU type is available in your region

Model download fails

  • Use Network Volume for large models
  • Set HF_TOKEN for gated models

Timeout errors

  • Increase timeout in tests.json
  • Consider async mode for long jobs