name	gemini-token-optimization
description	Optimize token usage when delegating to Gemini CLI. Covers token caching, batch queries, model selection (Flash vs Pro), and cost tracking. Use when planning bulk Gemini operations.
allowed-tools	Read, Skill

Gemini Token Optimization

🚨 MANDATORY: Invoke gemini-cli-docs First

STOP - Before providing ANY response about Gemini token usage:

INVOKE gemini-cli-docs skill

QUERY for the specific token or pricing topic

BASE all responses EXCLUSIVELY on official documentation loaded

Overview

Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.

When to Use This Skill

Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens

Use this skill when:

Planning bulk Gemini operations
Optimizing costs for large-scale analysis
Choosing between Flash and Pro models
Understanding token caching benefits
Tracking usage across sessions

Token Caching

Gemini CLI automatically caches context to reduce costs by reusing previously processed content.

Availability

Auth Method	Caching Available
API key (Gemini API)	YES
Vertex AI	YES
OAuth (personal/enterprise)	NO

How It Works

System instructions and repeated context are cached
Cached tokens don't count toward billing
View savings via /stats command or JSON output

Maximizing Cache Hits

Use consistent system prompts - Same prefix increases cache reuse
Batch similar queries - Group related analysis together
Reuse context files - Same files in same order

Monitoring Cache Usage

result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"

Model Selection

Model Comparison

Model	Context Window	Speed	Cost	Quality
gemini-2.5-flash	Large	Fast	Lower	Good
gemini-2.5-pro	Very large	Slower	Higher	Best

Selection Criteria

Use Flash (-m gemini-2.5-flash) when:

Processing large files (bulk analysis)
Simple extraction tasks
Cost is a primary concern
Speed is critical
Task is straightforward

Use Pro (-m gemini-2.5-pro) when:

Complex reasoning required
Quality is critical
Nuanced analysis needed
Task requires deep understanding
Context exceeds 1M tokens

Model Selection Examples

# Bulk file analysis - use Flash
for file in src/*.ts; do
  gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done

# Security audit - use Pro for quality
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts

# Cost tracking with model info
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"

Batching Strategy

Why Batch?

Reduces API overhead
Increases cache hit rate
Provides consistent context

Batching Patterns

Pattern 1: Concatenate Files

# Instead of N separate calls
# Do one call with all files
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json

Pattern 2: Batch Prompts

# Combine related questions
gemini "Answer these questions about the codebase:
1. What is the main architecture pattern?
2. How is authentication handled?
3. What database is used?" --output-format json

Pattern 3: Staged Analysis

# First pass: Quick overview with Flash
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)

# Second pass: Deep dive critical areas with Pro
echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
  gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
done

Cost Tracking

Per-Query Tracking

result=$(gemini "query" --output-format json)

# Extract all cost-relevant stats
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')

echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log

Session Tracking

# Track cumulative usage across a session
total_session_tokens=0
total_session_cached=0
total_session_calls=0

track_usage() {
  local result="$1"
  local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
  local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')

  total_session_tokens=$((total_session_tokens + tokens))
  total_session_cached=$((total_session_cached + cached))
  total_session_calls=$((total_session_calls + 1))
}

# Use in workflow
result=$(gemini "query 1" --output-format json)
track_usage "$result"

result=$(gemini "query 2" --output-format json)
track_usage "$result"

echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"

Optimization Checklist

Before Large Operations

Choose appropriate model (Flash vs Pro)
Check if caching is available (API key or Vertex)
Plan batching strategy
Set up usage tracking

During Operations

Monitor cache hit rates
Track per-query costs
Adjust model if quality insufficient
Batch similar queries

After Operations

Review total usage
Calculate effective cost
Identify optimization opportunities
Document learnings

Quick Reference

Cost-Saving Commands

# Use Flash for bulk
gemini "query" -m gemini-2.5-flash --output-format json

# Check cache effectiveness
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'

# Minimal output (fewer output tokens)
gemini "Answer in one sentence: {question}" --output-format json

Cost Estimation

Rough token estimates:

1 token ~ 4 characters (English)
1 page of code ~ 500-1000 tokens
Typical source file ~ 200-2000 tokens

Keyword Registry (Delegates to gemini-cli-docs)

Topic	Query Keywords
Caching	`token caching`, `cached tokens`, `/stats`
Model selection	`model routing`, `flash vs pro`, `-m flag`
Costs	`quota pricing`, `token usage`, `billing`
Output control	`output format`, `json output`

Test Scenarios

Scenario 1: Check Token Usage

Query: "How do I see how many tokens Gemini used?" Expected Behavior:

Skill activates on "token usage" or "gemini cost"
Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts

Scenario 2: Reduce Costs

Query: "How do I reduce Gemini CLI costs for bulk analysis?" Expected Behavior:

Skill activates on "cost optimization" or "reduce tokens"
Recommends Flash model and batching Success Criteria: User receives cost optimization strategies

Scenario 3: Model Selection

Query: "Should I use Flash or Pro for this task?" Expected Behavior:

Skill activates on "flash vs pro" or "model selection"
Provides decision criteria table Success Criteria: User receives model comparison and recommendation

References

Query gemini-cli-docs for official documentation on:

"token caching"
"model selection"
"quota and pricing"

Version History

v1.1.0 (2025-12-01): Added Test Scenarios section
v1.0.0 (2025-11-25): Initial release

gemini-token-optimization

Install Skill

SKILL.md