| name | invoking-gemini |
| description | Invokes Google Gemini models for structured outputs, multi-modal tasks, and Google-specific features. Use when users request Gemini, structured JSON output, Google API integration, or cost-effective parallel processing. |
Invoking Gemini
Delegate tasks to Google's Gemini models when they offer advantages over Claude.
When to Use Gemini
Structured outputs:
- JSON Schema validation with property ordering guarantees
- Pydantic model compliance
- Strict schema adherence (enum values, required fields)
Cost optimization:
- Parallel batch processing (Gemini Flash is lightweight)
- High-volume simple tasks
- Budget-constrained operations
Google ecosystem:
- Integration with Google services
- Vertex AI workflows
- Google-specific APIs
Multi-modal tasks:
- Image analysis with JSON output
- Video processing
- Audio transcription with structure
Available Models
gemini-2.0-flash-exp (Recommended):
- Fast, cost-effective
- Native JSON Schema support
- Good for structured outputs
gemini-1.5-pro:
- More capable reasoning
- Better for complex tasks
- Higher cost
gemini-1.5-flash:
- Balanced speed/quality
- Good for most tasks
See references/models.md for full model details.
Setup
Prerequisites:
Install google-generativeai:
pip install google-generativeai --break-system-packagesConfigure API key via api-credentials skill:
# Create /home/claude/config.json { "google_api_key": "AIzaSy..." }
Basic Usage
Import the client:
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini
# Simple prompt
response = invoke_gemini(
prompt="Explain quantum computing in 3 bullet points",
model="gemini-2.0-flash-exp"
)
print(response)
Structured Output
Use Pydantic models for guaranteed JSON Schema compliance:
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output
class BookAnalysis(BaseModel):
title: str
genre: str = Field(description="Primary genre")
key_themes: list[str] = Field(max_length=5)
rating: int = Field(ge=1, le=5)
result = invoke_with_structured_output(
prompt="Analyze the book '1984' by George Orwell",
pydantic_model=BookAnalysis
)
# result is a BookAnalysis instance
print(result.title) # "1984"
print(result.genre) # "Dystopian Fiction"
Advantages over Claude:
- Guaranteed property ordering in JSON
- Strict enum enforcement
- Native schema validation (no prompt engineering)
- Lower cost for simple extractions
Parallel Invocation
Process multiple prompts concurrently:
from gemini_client import invoke_parallel
prompts = [
"Summarize the plot of Hamlet",
"Summarize the plot of Macbeth",
"Summarize the plot of Othello"
]
results = invoke_parallel(
prompts=prompts,
model="gemini-2.0-flash-exp"
)
for prompt, result in zip(prompts, results):
print(f"Q: {prompt[:30]}...")
print(f"A: {result[:100]}...\n")
Use cases:
- Batch classification tasks
- Data labeling
- Multiple independent analyses
- A/B testing prompts
Error Handling
The client handles common errors:
from gemini_client import invoke_gemini
response = invoke_gemini(
prompt="Your prompt here",
model="gemini-2.0-flash-exp"
)
if response is None:
print("Error: API call failed")
# Check /home/claude/config.json for valid google_api_key
Common issues:
- Missing API key → Returns None, prints error
- Invalid model → Raises ValueError
- Rate limit → Automatically retries with backoff
- Network error → Returns None after retries
Advanced Features
Custom Generation Config
response = invoke_gemini(
prompt="Write a haiku",
model="gemini-2.0-flash-exp",
temperature=0.9,
max_output_tokens=100,
top_p=0.95
)
Multi-modal Input
# Image analysis with structured output
from pydantic import BaseModel
class ImageDescription(BaseModel):
objects: list[str]
scene: str
colors: list[str]
result = invoke_with_structured_output(
prompt="Describe this image",
pydantic_model=ImageDescription,
image_path="/mnt/user-data/uploads/photo.jpg"
)
See references/advanced.md for more patterns.
Comparison: Gemini vs Claude
Use Gemini when:
- Structured output is primary goal
- Cost is a constraint
- Property ordering matters
- Batch processing many simple tasks
Use Claude when:
- Complex reasoning required
- Long context needed (200K tokens)
- Code generation quality matters
- Nuanced instruction following
Use both:
- Claude for planning/reasoning
- Gemini for structured extraction
- Parallel workflows with different strengths
Token Efficiency Pattern
Gemini Flash is cost-effective for sub-tasks:
# Claude (you) plans the approach
# Gemini executes structured extractions
data_points = []
for file in uploaded_files:
# Gemini extracts structured data
result = invoke_with_structured_output(
prompt=f"Extract contact info from {file}",
pydantic_model=ContactInfo
)
data_points.append(result)
# Claude synthesizes results
# ... your analysis here ...
Limitations
Not suitable for:
- Tasks requiring deep reasoning
- Long context (>1M tokens)
- Complex code generation
- Subjective creative writing
Token limits:
- gemini-2.0-flash-exp: ~1M input tokens
- gemini-1.5-pro: ~2M input tokens
Rate limits:
- Vary by API tier
- Client handles automatic retry
Examples
See references/examples.md for:
- Data extraction from documents
- Batch classification
- Multi-modal analysis
- Hybrid Claude+Gemini workflows
Troubleshooting
"API key not configured":
# Create config file
cat > /home/claude/config.json << EOF
{
"google_api_key": "AIzaSy..."
}
EOF
Import errors:
pip install google-generativeai --break-system-packages
Schema validation failures:
- Check Pydantic model definitions
- Ensure prompt is clear about expected structure
- Add examples to prompt if needed
Cost Comparison
Approximate pricing (as of 2024):
Gemini 2.0 Flash:
- Input: $0.15 / 1M tokens
- Output: $0.60 / 1M tokens
Claude Sonnet:
- Input: $3.00 / 1M tokens
- Output: $15.00 / 1M tokens
For 1000 simple extraction tasks (100 tokens each):
- Gemini Flash: ~$0.10
- Claude Sonnet: ~$2.00
Strategy: Use Claude for complex reasoning, Gemini for high-volume simple tasks.