Claude Code Plugins

Community-maintained marketplace

Feedback

benchmark

@thetanil/hbf
0
0

Benchmark HBF performance using Apache Bench (ab). Measures static asset serving, versioned filesystem operations, and QuickJS runtime routes. Maintains historical data for performance tracking over time.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name benchmark
description Benchmark HBF performance using Apache Bench (ab). Measures static asset serving, versioned filesystem operations, and QuickJS runtime routes. Maintains historical data for performance tracking over time.

HBF Benchmark Skill

This skill provides reproducible performance benchmarking for HBF using Apache Bench (ab). It tests three main categories of operations and maintains historical data for comparison over time.

Benchmark Categories

1. Static Assets (latest_fs reads)

Tests serving static files from the embedded SQLite database via the latest_files view.

Endpoints:

  • /static/style.css - Small CSS file
  • /static/favicon.ico - Empty file

2. QuickJS Runtime Routes

Tests endpoints that require QuickJS execution and JSON serialization.

Endpoints:

  • /hello - Simple JSON response
  • /user/42 - Parameterized route with path parsing
  • /echo - Echo endpoint reflecting request details

3. Versioned Filesystem Writes (dev mode)

Tests write operations to the versioned filesystem (requires --dev flag).

Endpoints:

  • PUT /__dev/api/file?name=static/bench.txt - Write new version to versioned filesystem
  • GET /__dev/api/file?name=static/bench.txt - Read from versioned filesystem

Benchmark Parameters

Default test parameters (configurable):

  • Requests: 1,000 requests per endpoint
  • Concurrency: 10 concurrent connections
  • Port: 5309 (HBF default)

These defaults provide faster test runs while maintaining statistical validity.

Historical Data Storage

Results are stored in /workspaces/hbf/.benchmark/results.db using SQLite:

CREATE TABLE IF NOT EXISTS benchmark_runs (
    run_id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    git_commit TEXT,
    git_branch TEXT,
    build_mode TEXT,
    notes TEXT
);

CREATE TABLE IF NOT EXISTS benchmark_results (
    result_id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id INTEGER NOT NULL,
    category TEXT NOT NULL,
    endpoint TEXT NOT NULL,
    requests INTEGER NOT NULL,
    concurrency INTEGER NOT NULL,
    time_taken REAL NOT NULL,
    requests_per_sec REAL NOT NULL,
    time_per_request REAL NOT NULL,
    transfer_rate REAL NOT NULL,
    failed_requests INTEGER NOT NULL,
    FOREIGN KEY (run_id) REFERENCES benchmark_runs(run_id)
);

Usage

When you invoke this skill, it will:

  1. Build HBF binary (optimized release mode by default)
  2. Start server in background on port 5309
  3. Run benchmarks for all endpoints across three categories
  4. Store results in historical database
  5. Display summary with comparison to previous runs
  6. Clean up (stop server, save results)

Commands

Run Full Benchmark Suite

# Default: 1k requests, concurrency 10
./benchmark.sh

# Custom parameters
./benchmark.sh --requests 50000 --concurrency 50

# With build mode specification
./benchmark.sh --build-mode opt --requests 10000

# Add notes for this run
./benchmark.sh --notes "After QuickJS optimization"

View Historical Results

# Show recent benchmark runs
./show_results.sh

# Compare specific runs
./show_results.sh --compare run1 run2

# Show trend for specific endpoint
./show_results.sh --trend /hello

Benchmark Script Implementation

The main benchmark.sh script performs:

  1. Environment setup

    • Check prerequisites (ab, sqlite3)
    • Initialize results database
    • Get git metadata (commit, branch)
  2. Build binary

    • bazel build //:hbf --compilation_mode=opt
    • Use consistent build flags for reproducibility
  3. Start server

    • Launch in background with --port 5309
    • Launch dev mode server with --dev for write tests
    • Wait for health check
  4. Run benchmarks

    • Category 1: Static assets
    • Category 2: QuickJS routes
    • Category 3: Versioned filesystem ops (with --dev)
  5. Parse results

    • Extract key metrics from ab output
    • Store in SQLite database
  6. Generate report

    • Summary table with all endpoints
    • Comparison with previous run (if available)
    • Highlight regressions/improvements
  7. Cleanup

    • Kill background servers
    • Save database

Output Format

The skill generates a markdown-formatted report:

# HBF Benchmark Results

**Run ID:** 42
**Timestamp:** 2025-01-15 10:30:00
**Commit:** abc1234
**Branch:** main
**Build Mode:** opt

## Summary

| Category | Endpoint | Req/sec | Avg Time (ms) | Failed |
|----------|----------|---------|---------------|--------|
| Static | /static/style.css | 45,230 | 0.22 | 0 |
| Runtime | /hello | 38,500 | 0.26 | 0 |
| Runtime | /user/42 | 37,800 | 0.26 | 0 |
| FS Write | PUT /__dev/api/file | 8,500 | 1.18 | 0 |

## Comparison with Previous Run

| Endpoint | Previous | Current | Change |
|----------|----------|---------|--------|
| /hello | 38,200 | 38,500 | +0.8% 📈 |
| /static/style.css | 44,800 | 45,230 | +1.0% 📈 |

## Recommendations

✅ All endpoints performing within expected ranges
⚠️  Consider investigating if any regressions > 5%

Files

This skill includes:

  • SKILL.md (this file) - Skill documentation
  • benchmark.sh - Main benchmark runner script
  • show_results.sh - Historical results viewer
  • lib/db.sh - Database operations helper
  • lib/server.sh - Server lifecycle management
  • lib/parser.sh - Apache Bench output parser

Prerequisites

  • Apache Bench (ab) installed (included in apache2-utils)
  • SQLite3 CLI (sqlite3)
  • Bazel build system
  • Git (for commit metadata)

Reproducibility

For reproducible benchmarks:

  1. Consistent environment: Run on same hardware/VM
  2. Isolated execution: Close other applications
  3. Fixed parameters: Use same request count and concurrency
  4. Build mode: Use --compilation_mode=opt for release builds
  5. System state: Run when system is idle (low load)

Tips

  • Run benchmarks multiple times and average results
  • Use --notes to document changes between runs
  • Compare similar build modes (opt vs opt, dbg vs dbg)
  • Watch for failed requests (should always be 0)
  • Monitor system resources during benchmark

Example Workflow

# Initial baseline
./benchmark.sh --notes "Baseline before optimization"

# Make code changes...
# (edit versioned filesystem code)

# Run new benchmark
./benchmark.sh --notes "After index optimization"

# Compare results
./show_results.sh --compare $(sqlite3 .benchmark/results.db "SELECT run_id FROM benchmark_runs ORDER BY run_id DESC LIMIT 2")

Performance Expectations

Typical performance on modern hardware (2020+ CPU):

  • Static assets: 40,000-60,000 req/sec
  • QuickJS routes: 30,000-50,000 req/sec
  • FS writes: 5,000-10,000 req/sec (limited by SQLite WAL)
  • FS reads: 35,000-50,000 req/sec

Lower performance may indicate:

  • Debug build mode (use --compilation_mode=opt)
  • High system load
  • Disk I/O bottlenecks
  • Memory pressure

Limitations

  • Tests only GET and PUT operations (no DELETE benchmarks)
  • Single-threaded HBF server (one core utilized)
  • Local benchmarking only (no network latency)
  • SQLite WAL mode may affect write performance
  • Apache Bench limitation: cannot test SSE endpoints

Future Enhancements

Potential improvements:

  • Add percentile latency measurements (p50, p95, p99)
  • Test different concurrency levels automatically
  • Add memory profiling integration
  • Support custom pod benchmarking
  • Add warmup phase before measurements
  • Export results to CSV/JSON for external analysis