| name | browser-benchmarking |
| description | Guidance on benchmarking WHATWG implementations against browser engines |
| license | MIT |
Browser Benchmarking Skill
Purpose
This skill provides guidance on how to benchmark the WHATWG Streams implementation against browser engines, identify optimization opportunities, and measure stream operation performance.
Why Benchmark Against Browsers?
Streams must match browser behavior - both functionally AND performantly.
Key Reasons
Compatibility Verification
- Ensure stream operations match Chrome, Firefox, Safari
- Validate backpressure handling
- Confirm queuing behavior
Performance Targets
- Set realistic performance goals based on browser implementations
- Identify optimization opportunities
- Avoid premature optimization (measure first!)
Regression Detection
- Catch performance regressions early
- Track improvements over time
- Validate optimization effectiveness
Streams Performance Characteristics
What to Measure
| Operation | Why Measure | Browser Typical Performance |
|---|---|---|
| Stream reading | Core hot path | 0.5-2 μs per read |
| Stream writing | Frequent operation | 0.5-2 μs per write |
| Chunk enqueueing | Queue management | 0.1-0.5 μs per chunk |
| Pipe operations | Connecting streams | 2-10 μs per pipe setup |
| Backpressure signaling | Flow control | < 0.1 μs |
| BYOB reading | Zero-copy reads | 0.3-1 μs per read |
Browser Streams Implementations
Chromium (Blink):
- Location:
third_party/blink/renderer/core/streams/ - Fast path for byte streams (Uint8Array)
- Optimized chunk queuing with ArrayDeque
- Inline storage for small queues
Firefox (Gecko):
- Location:
dom/streams/ - Lazy stream initialization
- Cached reader/writer objects
- Minimal allocations for backpressure signals
WebKit:
- Location:
Source/WebCore/Modules/streams/ - Similar patterns to Chromium
- Fast paths for ReadableStream body responses
Benchmarking Strategy
1. Microbenchmarks (Single Operation)
Purpose: Measure individual stream operations in isolation
Example:
test "benchmark - read from byte stream" {
const allocator = std.testing.allocator;
var stream = try ReadableStream.init(allocator, .{
.pull = testPullFn,
});
defer stream.deinit();
var timer = try std.time.Timer.start();
const iterations = 100_000;
var i: usize = 0;
while (i < iterations) : (i += 1) {
const reader = try stream.getReader();
defer reader.releaseLock();
const result = try reader.read();
if (result.value) |chunk| allocator.free(chunk);
}
const elapsed = timer.read();
const ns_per_op = elapsed / iterations;
std.debug.print("stream.read(): {} ns/op\n", .{ns_per_op});
// Target: < 2000 ns (2 μs) per read for byte streams
try std.testing.expect(ns_per_op < 2000);
}
2. Macro benchmarks (Real-World Scenarios)
Purpose: Measure realistic streaming workloads
Examples:
- Stream 1000 chunks through pipe chain
- Measure end-to-end transform stream processing
- Test backpressure under high load
3. Comparison Benchmarks (Against Browsers)
Purpose: Compare Zig performance to browser implementations
Approach:
// JavaScript (run in browser)
console.time("read-100k-chunks");
const stream = new ReadableStream({
pull(controller) {
controller.enqueue(new Uint8Array([1, 2, 3, 4]));
}
});
const reader = stream.getReader();
for (let i = 0; i < 100000; i++) {
await reader.read();
}
console.timeEnd("read-100k-chunks");
Then compare against Zig implementation.
Streams-Specific Optimization Opportunities
1. Byte Stream Fast Path
Pattern: Optimize for Uint8Array chunks (most common)
Browser implementation (Chromium):
// Fast path for byte streams
if (stream->IsByteStream()) {
return ReadByteStreamChunk(stream);
}
// Slow path for object streams
return ReadGenericChunk(stream);
Zig implementation:
/// Fast path for byte stream reading
pub fn read(self: *ReadableStreamDefaultReader) !ReadResult {
const stream = self.stream;
// Fast path: byte stream with queued data
if (stream.isByteStream() and stream.queue.items.len > 0) {
const chunk = stream.queue.orderedRemove(0);
return ReadResult{ .done = false, .value = chunk };
}
// Slow path: object stream or empty queue
return try readSlow(self);
}
Benchmark:
test "benchmark - byte stream reading" {
const allocator = std.testing.allocator;
var stream = try ReadableStream.initByteStream(allocator, .{
.pull = bytePullFn,
});
defer stream.deinit();
var timer = try std.time.Timer.start();
var i: usize = 0;
while (i < 100_000) : (i += 1) {
const encoded = try percentEncode(allocator, input, .path);
defer allocator.free(encoded);
}
const elapsed = timer.read();
std.debug.print("percentEncode: {} ns/op\n", .{elapsed / 100_000});
}
2. Chunk Queue Optimization
Pattern: Use inline storage for small queues
Implementation:
pub const StreamQueue = struct {
// Inline storage for small queues (most common case)
small_buffer: [4][]const u8,
small_len: usize,
// Heap storage for large queues
large_buffer: ?std.ArrayList([]const u8),
pub fn enqueue(self: *StreamQueue, chunk: []const u8) !void {
// Fast path: use inline buffer
if (self.small_len < 4 and self.large_buffer == null) {
self.small_buffer[self.small_len] = chunk;
self.small_len += 1;
return;
}
// Slow path: spill to ArrayList
if (self.large_buffer == null) {
self.large_buffer = std.ArrayList([]const u8).init(allocator);
for (self.small_buffer[0..self.small_len]) |item| {
try self.large_buffer.?.append(item);
}
}
try self.large_buffer.?.append(chunk);
}
};
Benchmark:
test "benchmark - small vs large queue" {
const allocator = std.testing.allocator;
// Small queue (inline storage)
var small_queue = StreamQueue.init();
var timer = try std.time.Timer.start();
var i: usize = 0;
while (i < 100_000) : (i += 1) {
try small_queue.enqueue("chunk");
_ = small_queue.dequeue();
}
const small_time = timer.read();
// Large queue (ArrayList)
var large_queue = StreamQueue.init();
timer.reset();
i = 0;
while (i < 100_000) : (i += 1) {
try large_queue.enqueue("chunk");
if (i % 10 == 0) _ = large_queue.dequeue(); // Keep queue large
}
const large_time = timer.read();
std.debug.print("Small queue: {} ns/op, Large queue: {} ns/op\n",
.{small_time / 100_000, large_time / 100_000});
// Small queue should be faster
try std.testing.expect(small_time < large_time);
}
3. Backpressure Signal Fast Path
Pattern: Quick boolean check before complex logic
Implementation:
pub fn getDesiredSize(self: *ReadableStream) f64 {
// Fast path: no queue, return high water mark
if (self.queue.items.len == 0) {
return self.strategy.highWaterMark;
}
// Calculate queue size
var queue_size: f64 = 0;
for (self.queue.items) |chunk| {
if (self.strategy.size) |size_fn| {
queue_size += size_fn(chunk);
} else {
queue_size += 1;
}
}
return self.strategy.highWaterMark - queue_size;
}
pub inline fn shouldApplyBackpressure(self: *ReadableStream) bool {
return self.getDesiredSize() <= 0;
}
4. URL Serialization with Capacity Hints
Pattern: Preallocate buffer based on component sizes
Implementation:
pub fn serialize(self: *const URL, allocator: Allocator) ![]u8 {
// Calculate minimum size needed
var capacity: usize = 0;
capacity += self.scheme.len + 1; // "scheme:"
if (self.host) |host| {
capacity += 2; // "//"
capacity += estimateHostSize(host);
}
if (self.port) |_| {
capacity += 6; // ":12345"
}
capacity += self.path.len;
if (self.query) |q| capacity += 1 + q.len;
if (self.fragment) |f| capacity += 1 + f.len;
// Preallocate (avoids reallocation)
var result = try std.ArrayList(u8).initCapacity(allocator, capacity);
errdefer result.deinit();
// Serialize components
try result.appendSlice(self.scheme);
try result.append(':');
// ... (rest of serialization)
return result.toOwnedSlice();
}
5. Special Scheme Fast Paths
Pattern: Optimize for common schemes (http, https, file)
Implementation:
pub fn parseURL(allocator: Allocator, input: []const u8) !URL {
// Detect special schemes early
const scheme = try extractScheme(input);
if (isSpecialScheme(scheme)) {
// Fast path for http, https, file, ftp, ws, wss
return parseSpecialURL(allocator, input, scheme);
} else {
// Generic URL parsing
return parseGenericURL(allocator, input, scheme);
}
}
inline fn isSpecialScheme(scheme: []const u8) bool {
return std.mem.eql(u8, scheme, "http") or
std.mem.eql(u8, scheme, "https") or
std.mem.eql(u8, scheme, "file") or
std.mem.eql(u8, scheme, "ftp") or
std.mem.eql(u8, scheme, "ws") or
std.mem.eql(u8, scheme, "wss");
}
Benchmarking Tools
1. Zig Built-in Timer
const std = @import("std");
pub fn benchmark(comptime name: []const u8, iterations: usize, func: anytype) !void {
var timer = try std.time.Timer.start();
var i: usize = 0;
while (i < iterations) : (i += 1) {
func();
}
const elapsed = timer.read();
const ns_per_op = elapsed / iterations;
std.debug.print("{s}: {} ns/op ({d:.2} μs/op)\n",
.{name, ns_per_op, @as(f64, @floatFromInt(ns_per_op)) / 1000.0});
}
2. Browser DevTools (for comparison)
Chrome DevTools:
// Microbenchmark
console.time("parse-url");
for (let i = 0; i < 100000; i++) {
new URL("https://example.com/path?query#fragment");
}
console.timeEnd("parse-url");
// Detailed profiling
performance.mark("start");
for (let i = 0; i < 100000; i++) {
new URL("https://example.com/path?query#fragment");
}
performance.mark("end");
performance.measure("url-parse", "start", "end");
console.log(performance.getEntriesByName("url-parse"));
3. Criterion-like Benchmarking
// Simple benchmark suite
pub const BenchmarkSuite = struct {
allocator: Allocator,
pub fn init(allocator: Allocator) BenchmarkSuite {
return .{ .allocator = allocator };
}
pub fn run(self: BenchmarkSuite, comptime name: []const u8, func: anytype) !void {
const iterations = 100_000;
var timer = try std.time.Timer.start();
var i: usize = 0;
while (i < iterations) : (i += 1) {
try func(self.allocator);
}
const elapsed = timer.read();
const ns_per_op = elapsed / iterations;
std.debug.print("{s}: {} ns/op\n", .{name, ns_per_op});
}
};
// Usage
test "benchmark suite" {
var suite = BenchmarkSuite.init(std.testing.allocator);
try suite.run("parse-simple-url", parseSimpleURL);
try suite.run("parse-complex-url", parseComplexURL);
try suite.run("parse-relative-url", parseRelativeURL);
}
Real-World URL Datasets
Test Against Real URLs
Alexa Top 1000:
- Download real-world URLs
- Parse each one
- Measure aggregate performance
Example dataset:
https://www.google.com
https://www.youtube.com
https://www.facebook.com
https://www.amazon.com
https://www.wikipedia.org
...
Benchmark:
test "benchmark - real-world URLs" {
const urls = @embedFile("../data/alexa-top-1000.txt");
var lines = std.mem.split(u8, urls, "\n");
var timer = try std.time.Timer.start();
var count: usize = 0;
while (lines.next()) |line| {
if (line.len == 0) continue;
const url = try URL.parse(allocator, line);
defer url.deinit();
count += 1;
}
const elapsed = timer.read();
const ns_per_url = elapsed / count;
std.debug.print("Parsed {} real URLs: {} ns/op\n", .{count, ns_per_url});
}
Performance Targets
Realistic Goals (Based on Browser Performance)
| Operation | Target | Notes |
|---|---|---|
| Stream read | < 2 μs | Byte stream reading |
| Stream write | < 2 μs | Byte stream writing |
| Chunk enqueue | < 500 ns | Adding to queue |
| Chunk dequeue | < 500 ns | Removing from queue |
| Pipe setup | < 10 μs | Connecting streams |
| Backpressure check | < 100 ns | getDesiredSize() |
| BYOB read | < 1 μs | Zero-copy reading |
Note: These are approximate targets. Actual browser performance varies by:
- CPU architecture
- Compiler optimizations
- Chunk characteristics (size, type)
Avoiding Premature Optimization
Optimization Workflow
Implement correctly first
- Follow WHATWG Streams spec exactly
- Pass all tests
- Ensure spec compliance
Measure baseline
- Benchmark current implementation
- Identify hot spots with profiling
- Set realistic targets
Optimize hot paths
- Focus on frequently-called operations
- Measure before and after
- Verify correctness still holds
Validate improvements
- Re-run full test suite
- Compare against browsers
- Check for regressions
Red Flags (Don't Optimize Yet)
- ❌ No baseline measurements
- ❌ No clear performance problem
- ❌ Tests don't pass
- ❌ Spec compliance uncertain
Green Lights (OK to Optimize)
- ✅ Tests pass (100% coverage)
- ✅ Spec compliant
- ✅ Baseline measured
- ✅ Clear bottleneck identified
Integration with Other Skills
With whatwg_spec
- Read
specs/url.mdto understand algorithm complexity - Identify optimization opportunities in spec algorithms
- Ensure optimizations don't break spec compliance
With testing_requirements
- Write performance regression tests
- Ensure optimizations pass all functional tests
- Add benchmark tests for critical paths
With performance_optimization
- Apply general Zig optimization patterns
- Use lookup tables, fast paths, inline functions
- Minimize allocations
Summary
Key Principles:
- Measure before optimizing - Get baseline performance
- Compare against browsers - Set realistic targets
- Focus on hot paths - URL parsing, host parsing, percent encoding
- Use lookup tables - Avoid range checks
- Fast path for ASCII - Most URLs are ASCII-only
- Preallocate buffers - Avoid reallocation
- Verify correctness - Never sacrifice spec compliance for speed
Workflow:
- Implement correctly (spec-compliant)
- Measure baseline
- Identify bottlenecks
- Optimize hot paths
- Verify improvements
- Compare against browsers
Remember: URL parsing must match browser behavior. Performance matters, but correctness comes first.