| name | benchmark |
| description | Benchmark HBF performance using Apache Bench (ab). Measures static asset serving, versioned filesystem operations, and QuickJS runtime routes. Maintains historical data for performance tracking over time. |
HBF Benchmark Skill
This skill provides reproducible performance benchmarking for HBF using Apache Bench (ab). It tests three main categories of operations and maintains historical data for comparison over time.
Benchmark Categories
1. Static Assets (latest_fs reads)
Tests serving static files from the embedded SQLite database via the latest_files view.
Endpoints:
/static/style.css- Small CSS file/static/favicon.ico- Empty file
2. QuickJS Runtime Routes
Tests endpoints that require QuickJS execution and JSON serialization.
Endpoints:
/hello- Simple JSON response/user/42- Parameterized route with path parsing/echo- Echo endpoint reflecting request details
3. Versioned Filesystem Writes (dev mode)
Tests write operations to the versioned filesystem (requires --dev flag).
Endpoints:
PUT /__dev/api/file?name=static/bench.txt- Write new version to versioned filesystemGET /__dev/api/file?name=static/bench.txt- Read from versioned filesystem
Benchmark Parameters
Default test parameters (configurable):
- Requests: 1,000 requests per endpoint
- Concurrency: 10 concurrent connections
- Port: 5309 (HBF default)
These defaults provide faster test runs while maintaining statistical validity.
Historical Data Storage
Results are stored in /workspaces/hbf/.benchmark/results.db using SQLite:
CREATE TABLE IF NOT EXISTS benchmark_runs (
run_id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
git_commit TEXT,
git_branch TEXT,
build_mode TEXT,
notes TEXT
);
CREATE TABLE IF NOT EXISTS benchmark_results (
result_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER NOT NULL,
category TEXT NOT NULL,
endpoint TEXT NOT NULL,
requests INTEGER NOT NULL,
concurrency INTEGER NOT NULL,
time_taken REAL NOT NULL,
requests_per_sec REAL NOT NULL,
time_per_request REAL NOT NULL,
transfer_rate REAL NOT NULL,
failed_requests INTEGER NOT NULL,
FOREIGN KEY (run_id) REFERENCES benchmark_runs(run_id)
);
Usage
When you invoke this skill, it will:
- Build HBF binary (optimized release mode by default)
- Start server in background on port 5309
- Run benchmarks for all endpoints across three categories
- Store results in historical database
- Display summary with comparison to previous runs
- Clean up (stop server, save results)
Commands
Run Full Benchmark Suite
# Default: 1k requests, concurrency 10
./benchmark.sh
# Custom parameters
./benchmark.sh --requests 50000 --concurrency 50
# With build mode specification
./benchmark.sh --build-mode opt --requests 10000
# Add notes for this run
./benchmark.sh --notes "After QuickJS optimization"
View Historical Results
# Show recent benchmark runs
./show_results.sh
# Compare specific runs
./show_results.sh --compare run1 run2
# Show trend for specific endpoint
./show_results.sh --trend /hello
Benchmark Script Implementation
The main benchmark.sh script performs:
Environment setup
- Check prerequisites (ab, sqlite3)
- Initialize results database
- Get git metadata (commit, branch)
Build binary
bazel build //:hbf --compilation_mode=opt- Use consistent build flags for reproducibility
Start server
- Launch in background with
--port 5309 - Launch dev mode server with
--devfor write tests - Wait for health check
- Launch in background with
Run benchmarks
- Category 1: Static assets
- Category 2: QuickJS routes
- Category 3: Versioned filesystem ops (with
--dev)
Parse results
- Extract key metrics from ab output
- Store in SQLite database
Generate report
- Summary table with all endpoints
- Comparison with previous run (if available)
- Highlight regressions/improvements
Cleanup
- Kill background servers
- Save database
Output Format
The skill generates a markdown-formatted report:
# HBF Benchmark Results
**Run ID:** 42
**Timestamp:** 2025-01-15 10:30:00
**Commit:** abc1234
**Branch:** main
**Build Mode:** opt
## Summary
| Category | Endpoint | Req/sec | Avg Time (ms) | Failed |
|----------|----------|---------|---------------|--------|
| Static | /static/style.css | 45,230 | 0.22 | 0 |
| Runtime | /hello | 38,500 | 0.26 | 0 |
| Runtime | /user/42 | 37,800 | 0.26 | 0 |
| FS Write | PUT /__dev/api/file | 8,500 | 1.18 | 0 |
## Comparison with Previous Run
| Endpoint | Previous | Current | Change |
|----------|----------|---------|--------|
| /hello | 38,200 | 38,500 | +0.8% 📈 |
| /static/style.css | 44,800 | 45,230 | +1.0% 📈 |
## Recommendations
✅ All endpoints performing within expected ranges
⚠️ Consider investigating if any regressions > 5%
Files
This skill includes:
SKILL.md(this file) - Skill documentationbenchmark.sh- Main benchmark runner scriptshow_results.sh- Historical results viewerlib/db.sh- Database operations helperlib/server.sh- Server lifecycle managementlib/parser.sh- Apache Bench output parser
Prerequisites
- Apache Bench (
ab) installed (included in apache2-utils) - SQLite3 CLI (
sqlite3) - Bazel build system
- Git (for commit metadata)
Reproducibility
For reproducible benchmarks:
- Consistent environment: Run on same hardware/VM
- Isolated execution: Close other applications
- Fixed parameters: Use same request count and concurrency
- Build mode: Use
--compilation_mode=optfor release builds - System state: Run when system is idle (low load)
Tips
- Run benchmarks multiple times and average results
- Use
--notesto document changes between runs - Compare similar build modes (opt vs opt, dbg vs dbg)
- Watch for failed requests (should always be 0)
- Monitor system resources during benchmark
Example Workflow
# Initial baseline
./benchmark.sh --notes "Baseline before optimization"
# Make code changes...
# (edit versioned filesystem code)
# Run new benchmark
./benchmark.sh --notes "After index optimization"
# Compare results
./show_results.sh --compare $(sqlite3 .benchmark/results.db "SELECT run_id FROM benchmark_runs ORDER BY run_id DESC LIMIT 2")
Performance Expectations
Typical performance on modern hardware (2020+ CPU):
- Static assets: 40,000-60,000 req/sec
- QuickJS routes: 30,000-50,000 req/sec
- FS writes: 5,000-10,000 req/sec (limited by SQLite WAL)
- FS reads: 35,000-50,000 req/sec
Lower performance may indicate:
- Debug build mode (use
--compilation_mode=opt) - High system load
- Disk I/O bottlenecks
- Memory pressure
Limitations
- Tests only GET and PUT operations (no DELETE benchmarks)
- Single-threaded HBF server (one core utilized)
- Local benchmarking only (no network latency)
- SQLite WAL mode may affect write performance
- Apache Bench limitation: cannot test SSE endpoints
Future Enhancements
Potential improvements:
- Add percentile latency measurements (p50, p95, p99)
- Test different concurrency levels automatically
- Add memory profiling integration
- Support custom pod benchmarking
- Add warmup phase before measurements
- Export results to CSV/JSON for external analysis