Modal Knowledge Skill
Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.
Activation Triggers
Activate this skill when users ask about:
- Modal.com platform features and capabilities
- GPU-accelerated Python functions
- Serverless container configuration
- Modal pricing and billing
- Modal CLI commands
- Web endpoints and APIs on Modal
- Scheduled/cron jobs on Modal
- Modal volumes, secrets, and storage
- Parallel processing with Modal
- Modal deployment and CI/CD
Platform Overview
Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:
- Zero Configuration: Everything defined in Python code
- Fast GPU Startup: ~1 second container spin-up
- Automatic Scaling: Scale to zero, scale to thousands
- Per-Second Billing: Only pay for active compute
- Multi-Cloud: AWS, GCP, Oracle Cloud Infrastructure
Core Components Reference
Apps and Functions
import modal
app = modal.App("app-name")
@app.function()
def basic_function(arg: str) -> str:
return f"Result: {arg}"
@app.local_entrypoint()
def main():
result = basic_function.remote("test")
print(result)
Function Decorator Parameters
| Parameter |
Type |
Description |
image |
Image |
Container image configuration |
gpu |
str/list |
GPU type(s): "T4", "A100", ["H100", "A100"] |
cpu |
float |
CPU cores (0.125 to 64) |
memory |
int |
Memory in MB (128 to 262144) |
timeout |
int |
Max execution seconds |
retries |
int |
Retry attempts on failure |
secrets |
list |
Secrets to inject |
volumes |
dict |
Volume mount points |
schedule |
Cron/Period |
Scheduled execution |
concurrency_limit |
int |
Max concurrent executions |
container_idle_timeout |
int |
Seconds to keep warm |
include_source |
bool |
Auto-sync source code |
GPU Reference
Available GPUs
| GPU |
Memory |
Use Case |
~Cost/hr |
| T4 |
16 GB |
Small inference |
$0.59 |
| L4 |
24 GB |
Medium inference |
$0.80 |
| A10G |
24 GB |
Inference/fine-tuning |
$1.10 |
| L40S |
48 GB |
Heavy inference |
$1.50 |
| A100-40GB |
40 GB |
Training |
$2.00 |
| A100-80GB |
80 GB |
Large models |
$3.00 |
| H100 |
80 GB |
Cutting-edge |
$5.00 |
| H200 |
141 GB |
Largest models |
$5.00 |
| B200 |
180+ GB |
Latest gen |
$6.25 |
GPU Configuration
# Single GPU
@app.function(gpu="A100")
# Specific memory variant
@app.function(gpu="A100-80GB")
# Multi-GPU
@app.function(gpu="H100:4")
# Fallbacks (tries in order)
@app.function(gpu=["H100", "A100", "any"])
# "any" = L4, A10G, or T4
@app.function(gpu="any")
Image Building
Base Images
# Debian slim (recommended)
modal.Image.debian_slim(python_version="3.11")
# From Dockerfile
modal.Image.from_dockerfile("./Dockerfile")
# From Docker registry
modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")
Package Installation
# pip (standard)
image.pip_install("torch", "transformers")
# uv (FASTER - 10-100x)
image.uv_pip_install("torch", "transformers")
# System packages
image.apt_install("ffmpeg", "libsm6")
# Shell commands
image.run_commands("apt-get update", "make install")
Adding Files
# Single file
image.add_local_file("./config.json", "/app/config.json")
# Directory
image.add_local_dir("./models", "/app/models")
# Python source
image.add_local_python_source("my_module")
# Environment variables
image.env({"VAR": "value"})
Build-Time Function
def download_model():
from huggingface_hub import snapshot_download
snapshot_download("model-name")
image.run_function(download_model, secrets=[...])
Storage
Volumes
# Create/reference volume
vol = modal.Volume.from_name("my-vol", create_if_missing=True)
# Mount in function
@app.function(volumes={"/data": vol})
def func():
# Read/write to /data
vol.commit() # Persist changes
Secrets
# From dashboard (recommended)
modal.Secret.from_name("secret-name")
# From dictionary
modal.Secret.from_dict({"KEY": "value"})
# From local env
modal.Secret.from_local_environ(["KEY1", "KEY2"])
# From .env file
modal.Secret.from_dotenv()
# Usage
@app.function(secrets=[modal.Secret.from_name("api-keys")])
def func():
import os
key = os.environ["API_KEY"]
Dict and Queue
# Distributed dict
d = modal.Dict.from_name("cache", create_if_missing=True)
d["key"] = "value"
d.put("key", "value", ttl=3600)
# Distributed queue
q = modal.Queue.from_name("jobs", create_if_missing=True)
q.put("task")
item = q.get()
Web Endpoints
FastAPI Endpoint (Simple)
@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World"):
return {"message": f"Hello, {name}!"}
ASGI App (Full FastAPI)
from fastapi import FastAPI
web_app = FastAPI()
@web_app.post("/predict")
def predict(text: str):
return {"result": process(text)}
@app.function()
@modal.asgi_app()
def fastapi_app():
return web_app
WSGI App (Flask)
from flask import Flask
flask_app = Flask(__name__)
@app.function()
@modal.wsgi_app()
def flask_endpoint():
return flask_app
Custom Web Server
@app.function()
@modal.web_server(port=8000)
def custom_server():
subprocess.run(["python", "-m", "http.server", "8000"])
Custom Domains
@modal.asgi_app(custom_domains=["api.example.com"])
Scheduling
Cron
# Daily at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * *"))
# With timezone
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
Period
@app.function(schedule=modal.Period(hours=5))
@app.function(schedule=modal.Period(days=1))
Note: Scheduled functions only run with modal deploy, not modal run.
Parallel Processing
Map
# Parallel execution (up to 1000 concurrent)
results = list(func.map(items))
# Unordered (faster)
results = list(func.map(items, order_outputs=False))
Starmap
# Spread args
pairs = [(1, 2), (3, 4)]
results = list(add.starmap(pairs))
Spawn
# Async job (returns immediately)
call = func.spawn(data)
result = call.get() # Get result later
# Spawn many
calls = [func.spawn(item) for item in items]
results = [call.get() for call in calls]
Container Lifecycle (Classes)
@app.cls(gpu="A100", container_idle_timeout=300)
class Server:
@modal.enter()
def load(self):
self.model = load_model()
@modal.method()
def predict(self, text):
return self.model(text)
@modal.exit()
def cleanup(self):
del self.model
Concurrency
@modal.concurrent(max_inputs=100, target_inputs=80)
@modal.method()
def batched(self, item):
pass
CLI Commands
Development
modal run app.py # Run function
modal serve app.py # Hot-reload dev server
modal shell app.py # Interactive shell
modal shell app.py --gpu A100 # Shell with GPU
Deployment
modal deploy app.py # Deploy
modal app list # List apps
modal app logs app-name # View logs
modal app stop app-name # Stop app
Resources
# Volumes
modal volume create name
modal volume list
modal volume put name local remote
modal volume get name remote local
# Secrets
modal secret create name KEY=value
modal secret list
# Environments
modal environment create staging
Pricing (2025)
Plans
| Plan |
Price |
Containers |
GPU Concurrency |
| Starter |
Free ($30 credits) |
100 |
10 |
| Team |
$250/month |
1000 |
50 |
| Enterprise |
Custom |
Unlimited |
Custom |
Compute
- CPU: $0.0000131/core/sec
- Memory: $0.00000222/GiB/sec
- GPUs: See GPU table above
Special Programs
- Startups: Up to $25k credits
- Researchers: Up to $10k credits
Best Practices
- Use
@modal.enter() for model loading
- Use
uv_pip_install for faster builds
- Use GPU fallbacks for availability
- Set appropriate timeouts and retries
- Use environments (dev/staging/prod)
- Download models during build, not runtime
- Use
order_outputs=False when order doesn't matter
- Set
container_idle_timeout to balance cost/latency
- Monitor costs in Modal dashboard
- Test with
modal run before modal deploy
Common Patterns
LLM Inference
@app.cls(gpu="A100", container_idle_timeout=300)
class LLM:
@modal.enter()
def load(self):
from vllm import LLM
self.llm = LLM(model="...")
@modal.method()
def generate(self, prompt):
return self.llm.generate([prompt])
Batch Processing
@app.function(volumes={"/data": vol})
def process(file):
# Process file
vol.commit()
# Parallel
results = list(process.map(files))
Scheduled ETL
@app.function(
schedule=modal.Cron("0 6 * * *"),
secrets=[modal.Secret.from_name("db")]
)
def daily_etl():
extract()
transform()
load()
Quick Reference
| Task |
Code |
| Create app |
app = modal.App("name") |
| Basic function |
@app.function() |
| With GPU |
@app.function(gpu="A100") |
| With image |
@app.function(image=img) |
| Web endpoint |
@modal.asgi_app() |
| Scheduled |
schedule=modal.Cron("...") |
| Mount volume |
volumes={"/path": vol} |
| Use secret |
secrets=[modal.Secret.from_name("x")] |
| Parallel map |
func.map(items) |
| Async spawn |
func.spawn(arg) |
| Class pattern |
@app.cls() with @modal.enter() |