| name | zstd-compression-engineer |
| description | Expert guide for implementing Zstandard (zstd) compression and decompression. Use when working with zstd library for data compression tasks including simple compression, streaming operations, dictionary-based compression, or when users need help with compression performance optimization, error handling, or choosing the right API for their use case. |
Zstd Compression Engineer
Expert guidance for implementing Zstandard (zstd) compression in any programming language.
Quick Decision Tree
Choose your API based on the use case:
- Simple one-off compression → Use
ZSTD_compress()/ZSTD_decompress() - Large files or unknown sizes → Use streaming API (
ZSTD_compressStream2()/ZSTD_decompressStream()) - Many small similar files → Use dictionary compression (
ZSTD_compress_usingCDict()) - Repeated operations → Reuse contexts (
ZSTD_compressCCtx()/ZSTD_decompressDCtx())
Core Implementation Patterns
Pattern 1: Simple Compression
// Allocate destination buffer
size_t dstCapacity = ZSTD_compressBound(srcSize);
void* dst = malloc(dstCapacity);
// Compress
size_t compressedSize = ZSTD_compress(dst, dstCapacity, src, srcSize, compressionLevel);
// Always check for errors
if (ZSTD_isError(compressedSize)) {
fprintf(stderr, "Compression failed: %s\n", ZSTD_getErrorName(compressedSize));
// Handle error
}
Key points:
- Use
ZSTD_compressBound()to calculate required buffer size - Default compression level is 3 (balance of speed/ratio)
- Levels 1-3: fast, 4-9: balanced, 10-19: high compression, 20-22: ultra (memory intensive)
Pattern 2: Context Reuse for Multiple Operations
// Create context once
ZSTD_CCtx* cctx = ZSTD_createCCtx();
// Use for multiple compressions
for (each file) {
size_t result = ZSTD_compressCCtx(cctx, dst, dstCapacity, src, srcSize, level);
// Process result
}
// Cleanup
ZSTD_freeCCtx(cctx);
Benefits:
- Reuses allocated memory across operations
- Better performance than creating new contexts
- No impact on compression ratio
Pattern 3: Streaming for Large Data
See references/streaming-api.md for complete streaming implementation guide.
Use streaming when:
- Source data doesn't fit in memory
- Decompressed size is unknown
- Processing data incrementally (network streams, pipes)
Buffer size recommendations:
- Input:
ZSTD_CStreamInSize()/ZSTD_DStreamInSize() - Output:
ZSTD_CStreamOutSize()/ZSTD_DStreamOutSize()
Pattern 4: Dictionary Compression
See references/dictionary-compression.md for complete dictionary usage guide.
Use dictionaries when:
- Compressing many small similar files (< 100KB each)
- Data has repeated patterns across files
- Working with structured data (JSON, XML, logs)
Critical rule: Pre-digest dictionaries with ZSTD_createCDict() for repeated use. Loading raw dictionaries repeatedly kills performance.
Error Handling
Always check results:
size_t result = ZSTD_compress(...);
if (ZSTD_isError(result)) {
const char* errMsg = ZSTD_getErrorName(result);
// Handle error
}
Context recovery after errors:
- Contexts may be in undefined state after errors
- Reset before reuse:
ZSTD_CCtx_reset()orZSTD_DCtx_reset()
Untrusted data validation:
- Always validate decompressed sizes from untrusted sources
- Use
ZSTD_getFrameContentSize()to check size before allocating - Implement application-specific size limits
- Prefer streaming decompression for untrusted data
Thread Safety
Per-thread contexts:
- Maintain separate
ZSTD_CCtxper thread - Never share contexts across threads
Shared thread pools (optional):
ZSTD_threadPool* pool = ZSTD_createThreadPool(numThreads);
ZSTD_CCtx_refThreadPool(cctx, pool);
Common Pitfalls
- Forgetting to check
ZSTD_compressBound()→ Buffer overflow - Loading dictionaries repeatedly → Performance degradation
- Not checking
ZSTD_isError()→ Silent failures - Sharing contexts across threads → Undefined behavior
- Trusting decompressed sizes → Memory exhaustion attacks
Performance Tuning
Compression level selection:
- Level 1-3: Real-time compression, minimal CPU
- Level 4-9: General purpose (recommended starting point)
- Level 10-19: Offline compression, archival
- Level 20-22: Maximum compression, high memory usage
Advanced parameters:
- Window log: Controls memory usage and compression ratio
- Strategy: fast, dfast, greedy, lazy, btopt (automatic selection usually best)
- See
references/api-reference.mdfor complete parameter list
Language-Specific Notes
C/C++: Direct library access, use patterns above
Python: Use zstandard package (python-zstandard)
Node.js: Use @mongodb-js/zstd or node-zstd
Go: Use github.com/klauspost/compress/zstd
Rust: Use zstd crate
Java: Use com.github.luben:zstd-jni
All language bindings follow the same conceptual patterns: simple compression, streaming, dictionary support.
Reference Documentation
For detailed API specifications:
- Streaming API guide:
references/streaming-api.md - Dictionary compression:
references/dictionary-compression.md - Complete API reference:
references/api-reference.md - Official docs: https://facebook.github.io/zstd/doc/api_manual_latest.html
Implementation Checklist
When implementing zstd compression:
- Choose correct API (simple/streaming/dictionary)
- Calculate buffer sizes with
ZSTD_compressBound() - Select appropriate compression level
- Implement error checking with
ZSTD_isError() - Reuse contexts for multiple operations
- Handle context reset after errors
- Validate untrusted data sizes
- Test with actual data to verify correctness