name	run-benchmark
description	Run and interpret the File API vs Inline benchmark for Gemini performance testing. Use when discussing performance optimization, caching strategies, or comparing document upload approaches.
allowed-tools	Bash, Read

Run Benchmark Tool

Purpose

Compare Gemini File API vs Inline document approaches for performance.

What It Tests

File API: Upload documents once, reuse cached URIs for multiple queries
Inline: Send raw document bytes with each request

The benchmark shuffles document order each round to prevent Gemini's native caching from affecting results.

Running the Benchmark

Basic Usage

export GEMINI_API_KEY="your-key"
make build
./bin/benchmark -docs test_loan_files/loan_file_1_LN-2024-001847

With Options

./bin/benchmark \
  -docs /path/to/documents \
  -rounds 20 \
  -max-docs 10 \
  -json

CLI Flags

-docs string      Directory containing documents (required)
-rounds int       Number of test rounds per method (default 10)
-max-docs int     Maximum income documents to use (default 6)
-json             Output results as JSON
-income           Only use income documents (default true)

Using Makefile

make benchmark DOCS=test_loan_files/loan_file_1_LN-2024-001847 ROUNDS=10

Understanding Results

Output Structure

PHASE 1: INLINE DOCUMENTS
  Round 1: [shuffled order] -> time, tokens
  ...

PHASE 2: FILE API
  Upload: X seconds (one-time)
  Round 1: [shuffled order] -> time, tokens
  ...

FINAL COMPARISON
  - Total time comparison
  - Average per-query time
  - Token usage
  - Winner & speedup factor
  - Break-even analysis

Key Metrics

Metric	Meaning
Upload time	One-time cost for File API
Total time	Sum of all operations
Avg per round	Mean time per query
Min/Max round	Query time variance
Speedup	How much faster winner is
Break-even	Queries needed for File API to win

Interpreting Results

File API wins when:

Many queries against same documents
Break-even point is low (< 10 queries)
Per-query savings compound

Inline wins when:

Few queries (< break-even)
Different documents each time
Simplicity preferred

Example Output

TIMING COMPARISON
┌─────────────────┬──────────────────┬──────────────────┐
│ Metric          │ File API         │ Inline Docs      │
├─────────────────┼──────────────────┼──────────────────┤
│ Upload (1x)     │           1.976s │              N/A │
│ Total time      │        1m13.733s │        1m14.454s │
│ Avg per round   │           7.176s │           7.445s │
└─────────────────┴──────────────────┴──────────────────┘

BREAK-EVEN ANALYSIS
   Upload overhead:      1.976s
   Savings per query:    270ms
   Break-even at:        7.3 queries

Why Shuffled Order?

Documents are shuffled each round because:

Gemini may cache based on content/order
Shuffling ensures each query is "fresh"
Gives accurate per-query timing
More realistic for production workloads

Test Questions

The benchmark uses varied income-related questions:

Annual/monthly income extraction
Employer information
YTD income calculation
Deductions and withholdings
Income source classification
Tax year coverage
And more...

Recommendations

Scenario	Recommendation
Underwriter iterating on loan	File API
One-off document analysis	Inline
Batch processing same docs	File API
Real-time different docs	Inline

Related Files

cmd/benchmark/main.go - Benchmark implementation
internal/gemini/client.go - Both API approaches
internal/gemini/cache.go - File caching logic

run-benchmark

Install Skill

SKILL.md