| name | gemini-imagegen |
| description | This skill should be used when generating and editing images using the Gemini API (Nano Banana Pro). It applies when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images. |
Gemini Image Generation (Nano Banana Pro)
Generate and edit images using Google's Gemini API. The environment variable GEMINI_API_KEY must be set.
CLI Scripts
Self-installing scripts using uv. Dependencies install automatically on first run.
IMPORTANT: Always run scripts with uv run to ensure dependencies are available:
# Generate image from prompt
uv run ./scripts/generate_image.py "A sunset over mountains" sunset.jpg
# Edit existing image
uv run ./scripts/edit_image.py photo.jpg "Add a rainbow" edited.jpg
# Compose multiple images
uv run ./scripts/compose_images.py "Combine these into a collage" result.jpg img1.jpg img2.jpg
# Interactive multi-turn chat
uv run ./scripts/multi_turn_chat.py
Options available for all scripts:
--model/-m: Model selection (gemini-2.5-flash-imageorgemini-3-pro-image-preview)--aspect/-a: Aspect ratio (1:1, 16:9, 9:16, etc.)--size/-s: Resolution (1K, 2K, 4K)--open/-o: Open the generated image with the default system viewer
Python Library
gemini_images.py provides a high-level API for use in custom scripts:
from gemini_images import GeminiImageGenerator
gen = GeminiImageGenerator()
# Generate
gen.generate("A sunset over mountains", "sunset.jpg")
# Edit
gen.edit("input.jpg", "Add clouds", "output.jpg")
# Compose multiple images
gen.compose("Combine into collage", ["img1.jpg", "img2.jpg"], "result.jpg")
# Multi-turn chat
chat = gen.chat()
img, text = chat.send("Create a logo for Acme Corp")
img, text = chat.send("Make the text bolder")
Default Model
| Model | Resolution | Best For |
|---|---|---|
gemini-3-pro-image-preview |
1K-4K | All image generation (default) |
Note: Always use this Pro model. Only use a different model if explicitly requested.
Quick Reference
Default Settings
- Model:
gemini-3-pro-image-preview - Resolution: 1K (default, options: 1K, 2K, 4K)
- Aspect Ratio: 1:1 (default)
Available Aspect Ratios
1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Available Resolutions
1K (default), 2K, 4K
Core API Pattern
import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
# Basic generation (1K, 1:1 - defaults)
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Your prompt here"],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
),
)
for part in response.parts:
if part.text:
print(part.text)
elif part.inline_data:
image = part.as_image()
image.save("output.png")
Custom Resolution & Aspect Ratio
from google.genai import types
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[prompt],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(
aspect_ratio="16:9", # Wide format
image_size="2K" # Higher resolution
),
)
)
Resolution Examples
# 1K (default) - Fast, good for previews
image_config=types.ImageConfig(image_size="1K")
# 2K - Balanced quality/speed
image_config=types.ImageConfig(image_size="2K")
# 4K - Maximum quality, slower
image_config=types.ImageConfig(image_size="4K")
Aspect Ratio Examples
# Square (default)
image_config=types.ImageConfig(aspect_ratio="1:1")
# Landscape wide
image_config=types.ImageConfig(aspect_ratio="16:9")
# Ultra-wide panoramic
image_config=types.ImageConfig(aspect_ratio="21:9")
# Portrait
image_config=types.ImageConfig(aspect_ratio="9:16")
# Photo standard
image_config=types.ImageConfig(aspect_ratio="4:3")
Editing Images
Pass existing images with text prompts:
from PIL import Image
img = Image.open("input.png")
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Add a sunset to this scene", img],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
),
)
Multi-Turn Refinement
Use chat for iterative editing:
from google.genai import types
chat = client.chats.create(
model="gemini-3-pro-image-preview",
config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
)
response = chat.send_message("Create a logo for 'Acme Corp'")
# Save first image...
response = chat.send_message("Make the text bolder and add a blue gradient")
# Save refined image...
Prompting Best Practices
Photorealistic Scenes
Include camera details: lens type, lighting, angle, mood.
"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
Stylized Art
Specify style explicitly:
"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
Text in Images
Be explicit about font style and placement:
"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
Product Mockups
Describe lighting setup and surface:
"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
Advanced Features
Google Search Grounding
Generate images based on real-time data:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Visualize today's weather in Tokyo as an infographic"],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
tools=[{"google_search": {}}]
)
)
Multiple Reference Images (Up to 14)
Combine elements from multiple sources:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[
"Create a group photo of these people in an office",
Image.open("person1.png"),
Image.open("person2.png"),
Image.open("person3.png"),
],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
),
)
Important: File Format & Media Type
CRITICAL: "The Gemini API returns images in JPEG format by default." When saving, use .jpg extension to prevent media type mismatches.
# CORRECT - Use .jpg extension (Gemini returns JPEG)
image.save("output.jpg")
# WRONG - Will cause "Image does not match media type" errors
image.save("output.png") # Creates JPEG with PNG extension!
Converting to PNG (if needed)
If you specifically need PNG format:
from PIL import Image
# Generate with Gemini
for part in response.parts:
if part.inline_data:
img = part.as_image()
# Convert to PNG by saving with explicit format
img.save("output.png", format="PNG")
Verifying Image Format
Check actual format vs extension with the file command:
file image.png
# If output shows "JPEG image data" - rename to .jpg!
Notes
- All generated images include SynthID watermarks
- Gemini returns JPEG format by default - always use
.jpgextension - Image-only mode (
responseModalities: ["IMAGE"]) won't work with Google Search grounding - For editing, describe changes conversationally—the model understands semantic masking
- Default to 1K resolution for speed; use 2K/4K when quality is critical