| name | nano-banana |
| description | AI image generation and editing using Google's Nano Banana (Gemini 2.5 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) APIs. Use this skill when the user wants to generate, edit, or compose images using AI. Triggers include requests to create images from text descriptions, edit existing images, add/remove elements from photos, apply style transfers, maintain character consistency across images, generate images with text overlays (logos, posters, infographics), or create multi-image compositions. Also use when users mention "Nano Banana", "Gemini image", or want AI-generated visuals. |
Nano Banana Image Generation Skill
Generate and edit images using Google's Nano Banana (Gemini 2.5 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) APIs.
Model Selection
| Model | ID | Best For | Resolution | Cost |
|---|---|---|---|---|
| Nano Banana | gemini-2.5-flash-image |
Fast generation, iteration, basic edits | Up to 1024px | ~$0.039/image |
| Nano Banana Pro | gemini-3-pro-image-preview |
Professional assets, text rendering, complex compositions | Up to 4K | Higher cost |
Selection Guide:
- Use Nano Banana for: rapid prototyping, simple edits, high-volume generation
- Use Nano Banana Pro for: text in images, 4K output, up to 14 reference images, Google Search grounding
Core Capabilities
- Text-to-Image: Generate images from text descriptions
- Image Editing: Add, remove, modify elements in existing images
- Multi-Image Composition: Blend up to 14 images (Pro only)
- Character Consistency: Maintain same character across multiple generations
- Text Rendering: Generate legible text in images (Pro excels here)
- Style Transfer: Apply artistic styles to images
- Iterative Refinement: Conversational multi-turn editing
Quick Start
Generate an Image
python scripts/generate_image.py "A cozy coffee shop interior with warm lighting" --output coffee_shop.png
Edit an Image
python scripts/edit_image.py input.jpg "Add a cat sitting on the chair" --output output.png
Prompting Best Practices
Core Principle: Describe the scene, don't just list keywords.
The model understands natural language narratives better than comma-separated tags.
Prompt Structure (for best results)
Include these elements in your prompts:
- Subject: Who/what is in the image (be specific)
- Action: What is happening
- Environment/Location: Setting and context
- Lighting: Natural, studio, golden hour, etc.
- Style: Photorealistic, illustration, watercolor, etc.
- Composition: Camera angle, framing, perspective
- Mood/Atmosphere: Emotional tone
Example - Good vs Bad Prompts
Bad: cat, hat, wizard, cute
Good: A fluffy ginger cat wearing a tiny knitted wizard hat, sitting on a wooden floor in a cozy living room. Soft natural light streams through a nearby window, creating a warm, magical atmosphere. Photorealistic, shot with an 85mm portrait lens.
For comprehensive prompting strategies, see: references/prompting-guide.md
API Configuration
Required Environment Variable
export GEMINI_API_KEY="your-api-key-here"
Get your API key from: https://aistudio.google.com/apikey
Response Modalities
Always set responseModalities: ["TEXT", "IMAGE"] to receive generated images.
Image Configuration Options
image_config = {
"aspect_ratio": "16:9", # Options: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
"image_size": "2K" # Options: 1K, 2K, 4K (Pro only for 4K)
}
For complete API reference, see: references/api-reference.md
Scripts Reference
| Script | Purpose |
|---|---|
scripts/generate_image.py |
Text-to-image generation |
scripts/edit_image.py |
Edit existing images with text prompts |
scripts/multi_image_compose.py |
Compose multiple images (Pro only) |
Important Notes
- All generated images include invisible SynthID watermarks
- Pro model uses "thinking" mode for complex prompts (enabled by default)
- For multi-turn editing, maintain conversation history
- Supported input formats: JPEG, PNG, WebP (up to 5MB)
- For best performance use languages: EN, es-MX, ja-JP, zh-CN, hi-IN