benchmark-analyzer

name	benchmark-analyzer
description	ユーザーが「ベンチマーク実行」「パフォーマンス測定」「性能分析」「速度を測って」「ベンチマークして」等と要求した時に発動。 Go benchmarkを実行し、結果を分析してボトルネックを特定し、改善提案を行う。既存のベンチマークがない場合は、まず作成する。
allowed-tools	Read, Write, Edit, Grep, Glob, Bash

手順

ベンチマーク対象の特定:
- ユーザー指定がある場合: その関数
- 指定がない場合: パフォーマンスが重要な関数を検出
既存ベンチマークの確認:
- *_test.go 内の Benchmark 関数を検索
- 存在しない場合は、ベンチマーク関数を生成
ベンチマーク実行:
- go test -bench=. -benchmem -benchtime=3s を実行
- 複数回実行して安定した結果を取得
結果の分析:
- ns/op (1操作あたりのナノ秒)
- B/op (1操作あたりのバイトアロケーション)
- allocs/op (1操作あたりのアロケーション回数)
ボトルネック特定:
- CPU プロファイリング（必要に応じて）
- メモリプロファイリング（アロケーションが多い場合）
- go test -bench=. -cpuprofile=cpu.prof
改善提案:
- 具体的な最適化案を提示
- トレードオフ（可読性 vs パフォーマンス）を明記
レポート生成: 結果と提案をマークダウン形式で出力

ベンチマーク関数の生成例

func BenchmarkSum(b *testing.B) {
    nums := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        Sum(nums)
    }
}

func BenchmarkSumLarge(b *testing.B) {
    nums := make([]int, 10000)
    for i := range nums {
        nums[i] = i
    }
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        Sum(nums)
    }
}

分析レポートフォーマット

# Benchmark Analysis Report

## Summary
Function: `Sum()`
Date: 2025-10-23

## Benchmark Results

### Small Input (10 elements)

BenchmarkSum-8 50000000 25.3 ns/op 0 B/op 0 allocs/op


### Large Input (10,000 elements)

BenchmarkSumLarge-8 100000 11234 ns/op 0 B/op 0 allocs/op


## Analysis

### Performance Characteristics
- ✅ **Good**: Zero allocations - operates on input slice without copying
- ✅ **Good**: Linear time complexity O(n) as expected
- ⚠️ **Note**: Performance scales linearly with input size

### Bottlenecks
None detected. Implementation is optimal for this use case.

## Optimization Opportunities

### 1. SIMD Optimization (Advanced)
For very large slices (>100k elements), could consider SIMD instructions.
**Trade-off**: Significant complexity increase for marginal gains.
**Recommendation**: Not worth it for this use case.

### 2. Parallel Processing
For slices >10M elements, could use goroutines.
```go
func SumParallel(nums []int) int {
    // Split into chunks, sum in parallel, combine
}

Trade-off: Overhead makes it slower for typical use cases. Recommendation: Only if consistently processing huge datasets.

Comparison with Standard Library

Implementation	Time (ns/op)	Allocs
Current Sum()	25.3	0
Naive approach	28.1	0

Current implementation is 10% faster than naive approach.

Recommendations

✅ Keep current implementation - Already optimized
✅ Add benchmark to CI - Detect performance regressions
❌ Don't optimize further - Diminishing returns

Action Items

Add benchmarks to test suite
Document performance characteristics in GoDoc


### パフォーマンス最適化の観点

#### メモリアロケーション削減
- 不要な `append` を避ける
- スライス容量を事前確保（`make([]T, 0, capacity)`）
- `sync.Pool` でオブジェクト再利用
- ポインタ vs 値のトレードオフ

#### CPU最適化
- ループアンローリング（コンパイラが通常やる）
- 分岐予測を意識（ホットパスを最適化）
- インライン化（小さな関数）
- 不要な型変換を避ける

#### プロファイリングコマンド
```bash
# CPU プロファイル
go test -bench=. -cpuprofile=cpu.prof
go tool pprof cpu.prof

# メモリプロファイル
go test -bench=. -memprofile=mem.prof
go tool pprof mem.prof

# トレース
go test -bench=. -trace=trace.out
go tool trace trace.out

ベンチマークのベストプラクティス

b.ResetTimer() で初期化コストを除外
複数の入力サイズでテスト（small, medium, large）
-benchtime=3s で安定した結果を取得
-count=5 で統計的信頼性を確保
ベンチマーク専用のデータを使う（本番データは避ける）

警告: 早すぎる最適化

まず正確性、次に可読性、最後にパフォーマンス
計測してから最適化（推測しない）
ボトルネックを特定してから対処
パフォーマンス改善は具体的な数値で示す

Install Skill

SKILL.md

手順