| name | fuzz-testing |
| description | Use when testing input validation, discovering edge cases, finding security vulnerabilities, testing parsers/APIs with random inputs, or integrating fuzzing tools (AFL, libFuzzer, Atheris) - provides fuzzing strategies, tool selection, and crash triage workflows |
Fuzz Testing
Overview
Core principle: Fuzz testing feeds random/malformed inputs to find crashes, hangs, and security vulnerabilities that manual tests miss.
Rule: Fuzzing finds bugs you didn't know to test for. Use it for security-critical code (parsers, validators, APIs).
Fuzz Testing vs Other Testing
| Test Type | Input | Goal |
|---|---|---|
| Unit Testing | Known valid/invalid inputs | Verify expected behavior |
| Property-Based Testing | Generated valid inputs | Verify invariants hold |
| Fuzz Testing | Random/malformed inputs | Find crashes, hangs, memory issues |
Fuzzing finds: Buffer overflows, null pointer dereferences, infinite loops, unhandled exceptions
Fuzzing does NOT find: Logic bugs, performance issues
When to Use Fuzz Testing
Good candidates:
- Input parsers (JSON, XML, CSV, binary formats)
- Network protocol handlers
- Image/video codecs
- Cryptographic functions
- User input validators (file uploads, form data)
- APIs accepting untrusted data
Poor candidates:
- Business logic (use property-based testing)
- UI interactions (use E2E tests)
- Database queries (use integration tests)
Tool Selection
| Tool | Language | Type | When to Use |
|---|---|---|---|
| Atheris | Python | Coverage-guided | Python applications, libraries |
| AFL (American Fuzzy Lop) | C/C++ | Coverage-guided | Native code, high performance |
| libFuzzer | C/C++/Rust | Coverage-guided | Integrated with LLVM/Clang |
| Jazzer | Java/JVM | Coverage-guided | Java applications |
| go-fuzz | Go | Coverage-guided | Go applications |
Coverage-guided: Tracks which code paths are executed, generates inputs to explore new paths
Basic Fuzzing Example (Python + Atheris)
Installation
pip install atheris
Simple Fuzz Test
import atheris
import sys
def parse_email(data):
"""Function to fuzz - finds bugs we didn't know about."""
if "@" not in data:
raise ValueError("Invalid email")
local, domain = data.split("@", 1)
if "." not in domain:
raise ValueError("Invalid domain")
# BUG: Crashes on multiple @ symbols!
# "user@@example.com" → crashes with ValueError
return (local, domain)
@atheris.instrument_func
def TestOneInput(data):
"""Fuzz harness - called repeatedly with random inputs."""
try:
parse_email(data.decode('utf-8', errors='ignore'))
except (ValueError, UnicodeDecodeError):
# Expected exceptions - not crashes
pass
# Any other exception = crash found!
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
Run:
python fuzz_email.py
Output:
INFO: Seed: 1234567890
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
#1: NEW coverage: 10 exec/s: 1000
#100: NEW coverage: 15 exec/s: 5000
CRASH: input was 'user@@example.com'
Advanced Fuzzing Patterns
Structured Fuzzing (JSON)
Problem: Random bytes rarely form valid JSON
import atheris
import json
@atheris.instrument_func
def TestOneInput(data):
try:
# Parse as JSON
obj = json.loads(data.decode('utf-8', errors='ignore'))
# Fuzz your JSON handler
process_user_data(obj)
except (json.JSONDecodeError, ValueError, KeyError):
pass # Expected for invalid JSON
def process_user_data(data):
"""Crashes on: {"name": "", "age": -1}"""
if len(data["name"]) == 0:
raise ValueError("Name cannot be empty")
if data["age"] < 0:
raise ValueError("Age cannot be negative")
Fuzzing with Corpus (Seed Inputs)
Corpus: Collection of valid inputs to start from
import atheris
import sys
import os
# Seed corpus: Valid examples
CORPUS_DIR = "./corpus"
os.makedirs(CORPUS_DIR, exist_ok=True)
# Create seed files
with open(f"{CORPUS_DIR}/valid1.txt", "wb") as f:
f.write(b"user@example.com")
with open(f"{CORPUS_DIR}/valid2.txt", "wb") as f:
f.write(b"alice+tag@subdomain.example.org")
@atheris.instrument_func
def TestOneInput(data):
try:
parse_email(data.decode('utf-8'))
except ValueError:
pass
atheris.Setup(sys.argv, TestOneInput, corpus_dir=CORPUS_DIR)
atheris.Fuzz()
Benefits: Faster convergence to interesting inputs
Crash Triage Workflow
1. Reproduce Crash
# Atheris outputs crash input
CRASH: input was b'user@@example.com'
# Save to file
echo "user@@example.com" > crash.txt
2. Minimize Input
Find smallest input that triggers crash:
# Original: "user@@example.com" (19 bytes)
# Minimized: "@@" (2 bytes)
# Atheris does this automatically
python fuzz_email.py crash.txt
3. Root Cause Analysis
def parse_email(data):
# Crash: data = "@@"
local, domain = data.split("@", 1)
# local = "", domain = "@"
if "." not in domain:
# domain = "@" → no "." → raises ValueError
raise ValueError("Invalid domain")
# FIX: Validate before splitting
if data.count("@") != 1:
raise ValueError("Email must have exactly one @")
4. Write Regression Test
def test_email_multiple_at_symbols():
"""Regression test for fuzz-found bug."""
with pytest.raises(ValueError, match="exactly one @"):
parse_email("user@@example.com")
Integration with CI/CD
Continuous Fuzzing (GitHub Actions)
# .github/workflows/fuzz.yml
name: Fuzz Testing
on:
schedule:
- cron: '0 2 * * *' # Nightly at 2 AM
workflow_dispatch:
jobs:
fuzz:
runs-on: ubuntu-latest
timeout-minutes: 60 # Run for 1 hour
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: pip install atheris
- name: Run fuzzing
run: |
timeout 3600 python fuzz_email.py || true
- name: Upload crashes
if: failure()
uses: actions/upload-artifact@v3
with:
name: fuzz-crashes
path: crash-*
Why nightly: Fuzzing is CPU-intensive, not suitable for every PR
AFL (C/C++) Example
Installation
# Ubuntu/Debian
sudo apt-get install afl++
# macOS
brew install afl++
Fuzz Target
// fuzz_target.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void parse_command(const char *input) {
char buffer[64];
// BUG: Buffer overflow if input > 64 bytes!
strcpy(buffer, input);
if (strcmp(buffer, "exit") == 0) {
exit(0);
}
}
int main(int argc, char **argv) {
if (argc < 2) return 1;
FILE *f = fopen(argv[1], "rb");
if (!f) return 1;
char buffer[1024];
size_t len = fread(buffer, 1, sizeof(buffer), f);
fclose(f);
buffer[len] = '\0';
parse_command(buffer);
return 0;
}
Compile and Run
# Compile with AFL instrumentation
afl-gcc fuzz_target.c -o fuzz_target
# Create corpus directory
mkdir -p corpus
echo "exit" > corpus/input1.txt
# Run fuzzer
afl-fuzz -i corpus -o findings -- ./fuzz_target @@
Output:
american fuzzy lop 4.00a
path : findings/queue
crashes : 1
hangs : 0
execs done : 1000000
Crashes found in: findings/crashes/
Anti-Patterns Catalog
❌ Fuzzing Without Sanitizers
Symptom: Memory bugs don't crash, just corrupt silently
Fix: Compile with AddressSanitizer (ASan)
# C/C++: Compile with ASan
afl-gcc -fsanitize=address fuzz_target.c -o fuzz_target
# Python: Use PyASan (if available)
What ASan catches: Buffer overflows, use-after-free, memory leaks
❌ Ignoring Hangs
Symptom: Fuzzer reports hangs, not investigated
What hangs mean: Infinite loops, algorithmic complexity attacks
Fix: Investigate and add timeout checks
import signal
def timeout_handler(signum, frame):
raise TimeoutError("Operation timed out")
@atheris.instrument_func
def TestOneInput(data):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1) # 1-second timeout
try:
parse_data(data.decode('utf-8'))
except (ValueError, TimeoutError):
pass
finally:
signal.alarm(0)
❌ No Regression Tests
Symptom: Same bugs found repeatedly
Fix: Add regression test for every crash
# After fuzzing finds crash on input "@@"
def test_regression_double_at():
with pytest.raises(ValueError):
parse_email("@@")
Bottom Line
Fuzz testing finds crashes and security vulnerabilities by feeding random/malformed inputs. Use it for security-critical code (parsers, validators, APIs).
Setup:
- Use Atheris (Python), AFL (C/C++), or language-specific fuzzer
- Start with corpus (valid examples)
- Run nightly in CI (1-24 hours)
Workflow:
- Fuzzer finds crash
- Minimize crashing input
- Root cause analysis
- Fix bug
- Add regression test
If your code accepts untrusted input (files, network data, user input), you should be fuzzing it. Fuzzing finds bugs that manual testing misses.