Claude Code Plugins

Community-maintained marketplace

Feedback

Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name modal-knowledge
description Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices

Modal Knowledge Skill

Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.

Activation Triggers

Activate this skill when users ask about:

  • Modal.com platform features and capabilities
  • GPU-accelerated Python functions
  • Serverless container configuration
  • Modal pricing and billing
  • Modal CLI commands
  • Web endpoints and APIs on Modal
  • Scheduled/cron jobs on Modal
  • Modal volumes, secrets, and storage
  • Parallel processing with Modal
  • Modal deployment and CI/CD

Platform Overview

Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:

  • Zero Configuration: Everything defined in Python code
  • Fast GPU Startup: ~1 second container spin-up
  • Automatic Scaling: Scale to zero, scale to thousands
  • Per-Second Billing: Only pay for active compute
  • Multi-Cloud: AWS, GCP, Oracle Cloud Infrastructure

Core Components Reference

Apps and Functions

import modal

app = modal.App("app-name")

@app.function()
def basic_function(arg: str) -> str:
    return f"Result: {arg}"

@app.local_entrypoint()
def main():
    result = basic_function.remote("test")
    print(result)

Function Decorator Parameters

Parameter Type Description
image Image Container image configuration
gpu str/list GPU type(s): "T4", "A100", ["H100", "A100"]
cpu float CPU cores (0.125 to 64)
memory int Memory in MB (128 to 262144)
timeout int Max execution seconds
retries int Retry attempts on failure
secrets list Secrets to inject
volumes dict Volume mount points
schedule Cron/Period Scheduled execution
concurrency_limit int Max concurrent executions
container_idle_timeout int Seconds to keep warm
include_source bool Auto-sync source code

GPU Reference

Available GPUs

GPU Memory Use Case ~Cost/hr
T4 16 GB Small inference $0.59
L4 24 GB Medium inference $0.80
A10G 24 GB Inference/fine-tuning $1.10
L40S 48 GB Heavy inference $1.50
A100-40GB 40 GB Training $2.00
A100-80GB 80 GB Large models $3.00
H100 80 GB Cutting-edge $5.00
H200 141 GB Largest models $5.00
B200 180+ GB Latest gen $6.25

GPU Configuration

# Single GPU
@app.function(gpu="A100")

# Specific memory variant
@app.function(gpu="A100-80GB")

# Multi-GPU
@app.function(gpu="H100:4")

# Fallbacks (tries in order)
@app.function(gpu=["H100", "A100", "any"])

# "any" = L4, A10G, or T4
@app.function(gpu="any")

Image Building

Base Images

# Debian slim (recommended)
modal.Image.debian_slim(python_version="3.11")

# From Dockerfile
modal.Image.from_dockerfile("./Dockerfile")

# From Docker registry
modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")

Package Installation

# pip (standard)
image.pip_install("torch", "transformers")

# uv (FASTER - 10-100x)
image.uv_pip_install("torch", "transformers")

# System packages
image.apt_install("ffmpeg", "libsm6")

# Shell commands
image.run_commands("apt-get update", "make install")

Adding Files

# Single file
image.add_local_file("./config.json", "/app/config.json")

# Directory
image.add_local_dir("./models", "/app/models")

# Python source
image.add_local_python_source("my_module")

# Environment variables
image.env({"VAR": "value"})

Build-Time Function

def download_model():
    from huggingface_hub import snapshot_download
    snapshot_download("model-name")

image.run_function(download_model, secrets=[...])

Storage

Volumes

# Create/reference volume
vol = modal.Volume.from_name("my-vol", create_if_missing=True)

# Mount in function
@app.function(volumes={"/data": vol})
def func():
    # Read/write to /data
    vol.commit()  # Persist changes

Secrets

# From dashboard (recommended)
modal.Secret.from_name("secret-name")

# From dictionary
modal.Secret.from_dict({"KEY": "value"})

# From local env
modal.Secret.from_local_environ(["KEY1", "KEY2"])

# From .env file
modal.Secret.from_dotenv()

# Usage
@app.function(secrets=[modal.Secret.from_name("api-keys")])
def func():
    import os
    key = os.environ["API_KEY"]

Dict and Queue

# Distributed dict
d = modal.Dict.from_name("cache", create_if_missing=True)
d["key"] = "value"
d.put("key", "value", ttl=3600)

# Distributed queue
q = modal.Queue.from_name("jobs", create_if_missing=True)
q.put("task")
item = q.get()

Web Endpoints

FastAPI Endpoint (Simple)

@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World"):
    return {"message": f"Hello, {name}!"}

ASGI App (Full FastAPI)

from fastapi import FastAPI
web_app = FastAPI()

@web_app.post("/predict")
def predict(text: str):
    return {"result": process(text)}

@app.function()
@modal.asgi_app()
def fastapi_app():
    return web_app

WSGI App (Flask)

from flask import Flask
flask_app = Flask(__name__)

@app.function()
@modal.wsgi_app()
def flask_endpoint():
    return flask_app

Custom Web Server

@app.function()
@modal.web_server(port=8000)
def custom_server():
    subprocess.run(["python", "-m", "http.server", "8000"])

Custom Domains

@modal.asgi_app(custom_domains=["api.example.com"])

Scheduling

Cron

# Daily at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * *"))

# With timezone
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))

Period

@app.function(schedule=modal.Period(hours=5))
@app.function(schedule=modal.Period(days=1))

Note: Scheduled functions only run with modal deploy, not modal run.


Parallel Processing

Map

# Parallel execution (up to 1000 concurrent)
results = list(func.map(items))

# Unordered (faster)
results = list(func.map(items, order_outputs=False))

Starmap

# Spread args
pairs = [(1, 2), (3, 4)]
results = list(add.starmap(pairs))

Spawn

# Async job (returns immediately)
call = func.spawn(data)
result = call.get()  # Get result later

# Spawn many
calls = [func.spawn(item) for item in items]
results = [call.get() for call in calls]

Container Lifecycle (Classes)

@app.cls(gpu="A100", container_idle_timeout=300)
class Server:

    @modal.enter()
    def load(self):
        self.model = load_model()

    @modal.method()
    def predict(self, text):
        return self.model(text)

    @modal.exit()
    def cleanup(self):
        del self.model

Concurrency

@modal.concurrent(max_inputs=100, target_inputs=80)
@modal.method()
def batched(self, item):
    pass

CLI Commands

Development

modal run app.py              # Run function
modal serve app.py            # Hot-reload dev server
modal shell app.py            # Interactive shell
modal shell app.py --gpu A100 # Shell with GPU

Deployment

modal deploy app.py           # Deploy
modal app list                # List apps
modal app logs app-name       # View logs
modal app stop app-name       # Stop app

Resources

# Volumes
modal volume create name
modal volume list
modal volume put name local remote
modal volume get name remote local

# Secrets
modal secret create name KEY=value
modal secret list

# Environments
modal environment create staging

Pricing (2025)

Plans

Plan Price Containers GPU Concurrency
Starter Free ($30 credits) 100 10
Team $250/month 1000 50
Enterprise Custom Unlimited Custom

Compute

  • CPU: $0.0000131/core/sec
  • Memory: $0.00000222/GiB/sec
  • GPUs: See GPU table above

Special Programs

  • Startups: Up to $25k credits
  • Researchers: Up to $10k credits

Best Practices

  1. Use @modal.enter() for model loading
  2. Use uv_pip_install for faster builds
  3. Use GPU fallbacks for availability
  4. Set appropriate timeouts and retries
  5. Use environments (dev/staging/prod)
  6. Download models during build, not runtime
  7. Use order_outputs=False when order doesn't matter
  8. Set container_idle_timeout to balance cost/latency
  9. Monitor costs in Modal dashboard
  10. Test with modal run before modal deploy

Common Patterns

LLM Inference

@app.cls(gpu="A100", container_idle_timeout=300)
class LLM:
    @modal.enter()
    def load(self):
        from vllm import LLM
        self.llm = LLM(model="...")

    @modal.method()
    def generate(self, prompt):
        return self.llm.generate([prompt])

Batch Processing

@app.function(volumes={"/data": vol})
def process(file):
    # Process file
    vol.commit()

# Parallel
results = list(process.map(files))

Scheduled ETL

@app.function(
    schedule=modal.Cron("0 6 * * *"),
    secrets=[modal.Secret.from_name("db")]
)
def daily_etl():
    extract()
    transform()
    load()

Quick Reference

Task Code
Create app app = modal.App("name")
Basic function @app.function()
With GPU @app.function(gpu="A100")
With image @app.function(image=img)
Web endpoint @modal.asgi_app()
Scheduled schedule=modal.Cron("...")
Mount volume volumes={"/path": vol}
Use secret secrets=[modal.Secret.from_name("x")]
Parallel map func.map(items)
Async spawn func.spawn(arg)
Class pattern @app.cls() with @modal.enter()