name	image-gen
description	Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) with workflow-based prompting
triggers	create image, generate image, make infographic, create infographic, generate diagram, make diagram, design visual, create visual
allowed-tools	Read, Write, Bash
version	0.1.0

Image Generation Skill

Generate professional images, infographics, and diagrams using Google's Nano Banana Pro model (gemini-3-pro-image-preview).

Model Capabilities

Nano Banana Pro (released November 20, 2025):

Text rendering - Accurate, legible text in images
Google Search grounding - Real-time data (weather, stocks, etc.)
Multi-turn conversation - Iterative refinement
Up to 14 reference images - For composition and style transfer
Resolutions: 1K, 2K, 4K
Aspect ratios: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9

Scripts

All scripts use Python via uv run with inline dependencies.

generate.py - Text to Image

uv run scripts/generate.py "prompt" output.png [aspect_ratio] [size]

Examples:

# Basic image
uv run scripts/generate.py "A cozy coffee shop in autumn" coffee.png

# Infographic with specific aspect ratio
uv run scripts/generate.py "Infographic explaining how neural networks work" nn.png 16:9 2K

# 4K professional image
uv run scripts/generate.py "Professional headshot, studio lighting" headshot.png 3:2 4K

edit.py - Image Editing

uv run scripts/edit.py input.png "edit instructions" output.png

Examples:

# Edit existing image
uv run scripts/edit.py photo.png "Change the background to a beach sunset" edited.png

compose.py - Multi-Image Composition

uv run scripts/compose.py "prompt" output.png --refs image1.png image2.png

Examples:

# Combine styles from multiple images
uv run scripts/compose.py "Combine these styles into a logo" logo.png --refs style1.png style2.png

Workflows

Workflows provide structured approaches for specific visual types. Each workflow follows the PAI 6-step editorial process:

Extract narrative - Understand the complete story/concept
Derive visual concept - Single metaphor with 2-3 physical objects
Apply aesthetic - Define style, colors, mood
Construct prompt - Build detailed generation instructions
Generate - Execute via script
Validate - Check against criteria, regenerate if needed

Available Workflows

infographic.md - Data visualization, statistics, explainers
diagram.md - Technical diagrams, flowcharts, architecture

Workflow Usage

When generating images, follow the appropriate workflow:

For Infographics

1. What data/concept needs visualization?
2. What's the key insight or takeaway?
3. Aspect ratio: 16:9 (landscape) recommended
4. Include: clear hierarchy, minimal text, supporting icons
5. Generate at 2K minimum for text clarity

For Diagrams

1. What system/process is being illustrated?
2. What are the key components and relationships?
3. Style: flat colors, clean lines, minimal detail
4. Generate at 2K for label clarity

Environment Setup

Requires GEMINI_API_KEY environment variable. This should be set from Geoffrey's secrets:

source ~/Library/Mobile\ Documents/com~apple~CloudDocs/Geoffrey/secrets/.env

Best Practices

Infographics

Use simple, direct prompts: "Infographic explaining how X works"
Model auto-includes relevant icons/logos
16:9 aspect ratio works best
Generate at 2K+ for readable text

General

Multi-turn refinement: generate, then ask for specific changes
Reference images improve consistency
Be specific about style, mood, lighting
SynthID watermark is automatic (Google provenance)

Output Location

By default, save images to /tmp/ or user-specified paths. For persistent storage, use:

~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/images/

⚠️ CRITICAL: Never Read Generated Images

DO NOT use the Read tool on generated images.

Why:

4K images (3840x2160) are within the 8000px limit
2K images (2560x1440) are also safe
BUT: Do not Read them - they're for user consumption, not analysis
For edits, use edit.py script, not Read tool

Workflow:

Generate image with script
Return file path to user
User views the high-quality output

Limitations

No photorealistic humans (safety filter)
No copyrighted characters
Maximum 14 reference images for composition
4K only available with Nano Banana Pro

Pricing

Size	Cost per Image
1K	Free tier / $0.04
2K	$0.134
4K	$0.24

image-gen

Install Skill

SKILL.md

Image Generation Skill

Model Capabilities

Scripts

generate.py - Text to Image

edit.py - Image Editing

compose.py - Multi-Image Composition

Workflows

Available Workflows

Workflow Usage

For Infographics

For Diagrams

Environment Setup

Best Practices

Infographics

General

Output Location

⚠️ CRITICAL: Never Read Generated Images

Limitations

Pricing