| name | veo |
| description | Video generation with Veo 3.1 |
Veo Video Generation
Generate high-fidelity videos with Google Veo 3.1 via API. Supports text-to-video, image-to-video, and video-to-video workflows with advanced cinematography control and synchronized audio generation.
Quick Start
Installation
uv pip install google-genai
Basic Generation
from google import genai
from google.genai import types
import time
import os
# Initialize client
client = genai.Client(
vertexai=True,
project=os.getenv("GOOGLE_CLOUD_PROJECT"), # Set via environment variable
location="us-central1"
)
# Generate video from text
operation = client.models.generate_videos(
model='veo-3.1-generate-preview',
prompt='A neon hologram of a cat driving at top speed',
config=types.GenerateVideosConfig(
number_of_videos=1,
duration_seconds=5,
# enhance_prompt defaults to True and cannot be disabled in Veo 3.1
),
)
# Poll until complete (typically 2-5 minutes)
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
# Check for errors before accessing response
if operation.error:
raise Exception(f"Video generation failed: {operation.error}")
# Save the generated video
video = operation.response.generated_videos[0].video
with open('output.mp4', 'wb') as f:
f.write(video.video_bytes)
Core Capabilities
- Resolutions: 720p or 1080p
- Aspect ratios: 16:9 or 9:16
- Clip lengths: 4, 6, or 8 seconds
- Rich audio: Synchronized dialogue, sound effects, ambient noise
- Advanced controls: Image-to-video, first/last frame transitions, consistent characters
The Prompting Formula
For consistent, high-quality results, structure prompts using:
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]
Example
Medium shot, a tired corporate worker, rubbing his temples in exhaustion,
in front of a bulky 1980s computer in a cluttered office late at night.
The scene is lit by the harsh fluorescent overhead lights and the green
glow of the monochrome monitor. Retro aesthetic, shot as if on 1980s
color film, slightly grainy.
Formula Components
- Cinematography - Camera work and shot composition
- Subject - Main character or focal point
- Action - What is happening
- Context - Environment and setting
- Style & Ambiance - Mood, lighting, artistic style
For detailed cinematography language (camera movements, composition, lens techniques), see references/prompting-guide.md.
Generation Workflows
1. Text-to-Video
Generate video from text prompt only.
operation = client.models.generate_videos(
model='veo-3.1-generate-preview',
prompt='Your detailed prompt here',
config=types.GenerateVideosConfig(
number_of_videos=1,
duration_seconds=6,
aspect_ratio='16:9',
resolution='1080p',
),
)
# Poll and save
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
# Check for errors
if operation.error:
raise Exception(f"Video generation failed: {operation.error}")
video = operation.response.generated_videos[0].video
with open('output.mp4', 'wb') as f:
f.write(video.video_bytes)
2. Image-to-Video
Animate a source image with optional prompt guidance.
from PIL import Image as PILImage
# Load image with proper MIME type
with open('path/to/image.jpg', 'rb') as f:
image_data = f.read()
image = types.Image(
image_bytes=image_data,
mime_type='image/jpeg' # or 'image/png' for PNG files
)
# Detect aspect ratio from image
pil_image = PILImage.open('path/to/image.jpg')
width, height = pil_image.size
aspect_ratio = '16:9' if width > height else '9:16'
operation = client.models.generate_videos(
model='veo-3.1-generate-preview',
prompt='Slow dolly shot moving closer, cinematic lighting',
image=image,
config=types.GenerateVideosConfig(
number_of_videos=1,
duration_seconds=5,
aspect_ratio=aspect_ratio,
resolution='1080p',
),
)
# Poll and save
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
# Check for errors
if operation.error:
raise Exception(f"Video generation failed: {operation.error}")
video = operation.response.generated_videos[0].video
with open('output.mp4', 'wb') as f:
f.write(video.video_bytes)
3. Video-to-Video
Edit or transform existing video content.
video_input = types.Video(
uri="gs://bucket-name/video.mp4", # GCS URI for Vertex AI
)
operation = client.models.generate_videos(
model='veo-3.1-generate-preview',
prompt='Transform into cyberpunk style with neon lights',
video=video_input,
config=types.GenerateVideosConfig(
number_of_videos=1,
duration_seconds=5,
),
)
# Poll and save
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
# Check for errors
if operation.error:
raise Exception(f"Video generation failed: {operation.error}")
video = operation.response.generated_videos[0].video
with open('output.mp4', 'wb') as f:
f.write(video.video_bytes)
Audio Direction
Veo 3.1 generates complete soundtracks based on text instructions.
Dialogue
Use quotation marks for specific speech:
A woman says, "We have to leave now."
Sound Effects
Describe sounds explicitly:
SFX: thunder cracks in the distance
Ambient Noise
Define background soundscape:
Ambient noise: the quiet hum of a starship bridge
Advanced Workflows
For complex projects requiring precise control, use multi-step workflows combining Veo with Gemini 2.5 Flash Image.
When to Use Advanced Workflows
- First and Last Frame: Create controlled transitions between two specific viewpoints
- Ingredients to Video: Maintain consistent characters/objects across multiple shots
- Timestamp Prompting: Direct multi-shot sequences with precise timing in single generation
See references/prompting-guide.md for detailed workflow instructions.
Reference Guides
When to Read Each Guide
| Situation | Reference |
|---|---|
| Need cinematography vocabulary or camera techniques | prompting-guide.md |
| Want advanced audio direction or negative prompts | prompting-guide.md |
| Need multi-shot workflows with Gemini integration | prompting-guide.md |
| Need complete working code examples | api-examples.md |
| Implementing error handling or retry logic | api-examples.md |
| Using advanced features (first/last frame, ingredients) | api-examples.md |
Key Prompting Tips
- Be specific: Detailed prompts yield more precise results
- Use the formula: Structure every prompt with all five components
- Master cinematography: Camera work conveys emotion and tone
- Direct audio explicitly: Specify dialogue, SFX, and ambient noise
- Experiment: Test different approaches to find what works best
Resources
Workflow Summary
- Structure your prompt using the five-part formula
- Choose generation method: text-to-video, image-to-video, or video-to-video
- Specify technical parameters: duration, resolution, aspect ratio
- Add audio direction: dialogue, SFX, ambient noise
- Poll for completion (typically 2-5 minutes)
- Iterate and refine based on results
For detailed prompting techniques and advanced workflows, consult the reference guides.
Troubleshooting
Permission Denied (403 PERMISSION_DENIED)
If you encounter Permission 'aiplatform.endpoints.predict' denied:
Authenticate: Set up Application Default Credentials
gcloud auth application-default loginAdd IAM role: Grant Vertex AI access to your account
gcloud projects add-iam-policy-binding PROJECT_ID \ --member="user:YOUR_EMAIL" \ --role="roles/aiplatform.user"Enable API: Activate Vertex AI API for your project
gcloud services enable aiplatform.googleapis.com --project=PROJECT_ID