| name | extract-elf |
| description | Guidance for extracting and processing data from ELF (Executable and Linkable Format) binary files. This skill should be used when tasks involve parsing ELF headers, reading program segments, extracting memory contents, or converting binary data to structured formats like JSON. Applicable to reverse engineering, binary analysis, and memory dump extraction tasks. |
ELF Binary Data Extraction
This skill provides guidance for tasks involving extraction of data from ELF binary files, including reading headers, parsing segments, and converting binary content to structured output formats.
Approach Overview
ELF extraction tasks typically require:
- Parsing the ELF header to understand file structure
- Reading program headers to identify LOAD segments
- Extracting data from segments at correct virtual addresses
- Converting binary data to the required output format
Implementation Steps
Step 1: Validate ELF Header
Before processing, verify the file is a valid ELF binary:
- Check magic bytes at offset 0:
0x7F 'E' 'L' 'F'(hex:7f 45 4c 46) - Identify ELF class (32-bit vs 64-bit) at offset 4
- Identify endianness at offset 5 (1 = little-endian, 2 = big-endian)
Step 2: Parse ELF Header Fields
Extract key header fields based on ELF class:
For 32-bit ELF:
- Program header offset: bytes 28-31
- Program header entry size: bytes 42-43
- Number of program headers: bytes 44-45
For 64-bit ELF:
- Program header offset: bytes 32-39
- Program header entry size: bytes 54-55
- Number of program headers: bytes 56-57
Step 3: Process Program Headers
Iterate through program headers and identify LOAD segments (type = 1):
- Extract virtual address (p_vaddr)
- Extract file offset (p_offset)
- Extract file size (p_filesz)
- Extract memory size (p_memsz)
Step 4: Extract Segment Data
For each LOAD segment:
- Read data from file at p_offset
- Map data to virtual addresses starting at p_vaddr
- Handle alignment and padding as specified
Critical Data Type Considerations
Signed vs Unsigned Integers
This is the most common source of errors in binary extraction tasks.
When reading multi-byte integer values from binary data:
- Memory addresses are always unsigned
- Size fields are always unsigned
- Data values should typically be read as unsigned unless the task explicitly requires signed interpretation
Common API distinctions:
- Node.js Buffer:
readUInt32LEvsreadInt32LE - Python struct:
'I'(unsigned) vs'i'(signed) - C/C++:
uint32_tvsint32_t
Verification: If output contains negative numbers but the expected output shows only positive integers, the wrong signedness was used.
Endianness
Match the endianness specified in the ELF header:
- Little-endian (most common on x86/x64): Use
LEvariants - Big-endian: Use
BEvariants
Integer Sizes
ELF fields vary by class:
- 32-bit ELF: addresses and offsets are 4 bytes
- 64-bit ELF: addresses and offsets are 8 bytes
Verification Strategies
Before Declaring Success
- Validate output format: Ensure JSON is well-formed, keys are correct types
- Check address ranges: Verify addresses fall within expected segment boundaries
- Sample value verification: Manually compute expected values for a few addresses using hex inspection tools
Manual Verification Commands
Use these tools to verify extracted values:
# View ELF header information
readelf -h <binary>
# View program headers (segments)
readelf -l <binary>
# Dump section contents in hex
objdump -s <binary>
# View raw hex bytes at specific offset
xxd -s <offset> -l <length> <binary>
# Calculate expected value from hex bytes (little-endian example)
# For bytes: 41 42 43 44 -> value = 0x44434241 = 1145258561
Value Sanity Checks
- If the example output shows only positive integers, verify output contains no negative values
- Compare a few computed values against manual hex calculation
- Verify address coverage matches expected segment ranges
Common Pitfalls
Using signed integer reads for unsigned data - Results in negative numbers for values with high bit set (e.g., -98693133 instead of 4196274163)
Incorrect endianness handling - Produces completely wrong values; verify against ELF header byte 5
Off-by-one errors in segment boundaries - Carefully track whether sizes are inclusive/exclusive
Assuming 4-byte alignment - Check if segment sizes are multiples of the read size; handle partial reads at boundaries
Mixing 32-bit and 64-bit field sizes - Always check ELF class and use appropriate field sizes
Overconfidence without verification - Never assume "values are read directly from binary, so they should match" - always verify sample values manually
Output Format Considerations
When producing structured output (e.g., JSON):
- Use string keys for addresses if they need to be JSON object keys (JSON requires string keys)
- Ensure integer values are within JavaScript/JSON safe integer range (2^53 - 1 for full precision)
- Consider whether addresses should be decimal or hexadecimal strings based on task requirements