name	write-compressor
description	This skill provides guidance for implementing custom compression encoders that must be compatible with existing decoders (especially arithmetic coding). It should be used when the task requires writing a compressor/encoder that produces output compatible with a given decompressor/decoder, or when implementing arithmetic coding or similar bit-level compression schemes.

Write Compressor

Overview

This skill guides the implementation of compression encoders that must produce output compatible with an existing decoder. The key challenge is ensuring encoder state transitions exactly mirror decoder expectations—particularly critical for arithmetic coding where even minor state drift causes decompression failures.

Critical Success Factors

1. Decoder-First Analysis

Before writing any encoder code:

Create a decoder simulator in Python - Reimplement the decoder logic to enable step-by-step tracing
Trace with minimal inputs - Create the smallest possible test cases (single bit, single byte) and manually compute expected values
Map the state machine - Document exactly how fraction, range, low, and other state variables change on each operation
Derive the byte output formula mathematically - Work out the exact relationship between encoder state and output bytes on paper before coding

2. Incremental Complexity Approach

Build encoder functionality in strict order of complexity:

Single bit encoding - Verify encoding a single 0 bit, then a single 1 bit
Single integer encoding - Verify encoding one small integer
Multiple values - Encode a sequence and verify each step
Full compression - Only attempt complete file compression after simpler cases work

Never skip to full file compression before validating simpler cases.

3. Dual Simulation Verification

Maintain both encoder and decoder implementations in Python that can be run side-by-side:

For each encoding operation:
  1. Run encoder to produce output
  2. Run decoder simulator consuming that output
  3. Assert encoder state matches decoder state
  4. Assert decoded value matches original value

4. State Synchronization

For arithmetic coding specifically:

Renormalization must match exactly - The encoder's decision to output a byte must align with when the decoder expects to read one
Count arrays must stay synchronized - Any probability model updates must happen identically in encoder and decoder
Range calculations must use identical formulas - Integer division behavior must match

Verification Strategy

Unit Tests Before Integration

Create explicit unit tests for each encoding primitive:

def test_encode_bit():
    # Test that encode_bit(0) followed by decode produces 0
    # Test that encode_bit(1) followed by decode produces 1

def test_encode_integer():
    # Test encoding/decoding small integers
    # Test boundary cases

def test_state_sync():
    # After each operation, encoder.low, encoder.range
    # must predict decoder.fraction, decoder.range

Debug Output Protocol

Add comprehensive debug output to both encoder and decoder:

Operation: encode_bit(ctx=10, bit=1)
  Before: low=0x00000000 range=0x10000000
  Split:  0x08000000
  After:  low=0x08000000 range=0x08000000
  Output: [none]

Compare encoder debug output against decoder debug output line by line.

Minimal Test Cases

Always start with the simplest possible test:

Empty file - Just header/footer, no data
Single character - Smallest meaningful compression
Repeated character - Tests probability updates
Two-character alternation - Tests context switching

Common Pitfalls

1. Trial-and-Error Implementation

Problem: Creating multiple encoder implementations without understanding why each fails.

Solution: When an implementation fails, systematically determine the root cause:

At which byte does decoder diverge?
What value did encoder output vs. what decoder expected?
Trace backward to find the state divergence point

2. Wrong Byte Output Formula

Problem: The renormalization formula for computing output bytes is often misunderstood.

Solution: If the decoder does:

fraction = fraction * radix + (byte - 1)

Then mathematically derive what byte value steers fraction into the target range [low, low+range). Do not guess—derive the formula.

3. Premature Optimization

Problem: Focusing on compression ratio or file size before achieving correctness.

Solution: Ignore size constraints until:

Single-bit encoding works
Full file compresses and decompresses correctly
Only then optimize for size

4. Testing Full Files Before Simple Cases

Problem: Attempting to compress a large file when basic operations are untested.

Solution: A segmentation fault or corrupted output on a full file provides almost no diagnostic information. Always verify with minimal inputs first.

5. Insufficient Decoder Analysis

Problem: Starting encoder implementation before fully understanding decoder behavior.

Solution: Spend significant time (potentially hours) analyzing the decoder:

Read every line of decoder code
Add debug output to the decoder
Run decoder on known-good compressed files
Document the state machine completely

Workflow

Phase 1: Analysis (Do Not Skip)

Read and annotate the decoder source code completely
Create a Python decoder simulator
Add debug output to trace every state transition
Run decoder on existing compressed files to understand expected behavior
Document the exact byte output formula mathematically

Phase 2: Minimal Implementation

Implement encoder for single-bit encoding only
Add matching debug output to encoder
Verify encoder output decodes correctly
Compare state traces between encoder and decoder

Phase 3: Incremental Extension

Add integer encoding, verify
Add any additional primitives (back-references, etc.), verify each
Implement full compression logic
Verify with progressively larger inputs

Phase 4: Optimization (Only After Correctness)

Profile compression ratio
Adjust parameters for better compression
Verify correctness after each optimization change

Red Flags

Stop and reassess if:

Creating a third encoder implementation without understanding why previous ones failed
Decoder produces segmentation fault (indicates fundamental state corruption)
Debug output shows decoder reading unexpected values
Tempted to "try different values" without mathematical justification
Skipping minimal test cases to test on full file

Resources

references/arithmetic_coding_analysis.md

Detailed guide for analyzing arithmetic coding decoders, including:

State variable identification
Deriving the byte output formula mathematically
Python templates for decoder simulator and encoder
Step-by-step debugging of state divergence
Probability model synchronization

write-compressor

Install Skill

SKILL.md