name	extract-elf
description	Guidance for extracting and processing data from ELF (Executable and Linkable Format) binary files. This skill should be used when tasks involve parsing ELF headers, reading program segments, extracting memory contents, or converting binary data to structured formats like JSON. Applicable to reverse engineering, binary analysis, and memory dump extraction tasks.

ELF Binary Data Extraction

This skill provides guidance for tasks involving extraction of data from ELF binary files, including reading headers, parsing segments, and converting binary content to structured output formats.

Approach Overview

ELF extraction tasks typically require:

Parsing the ELF header to understand file structure
Reading program headers to identify LOAD segments
Extracting data from segments at correct virtual addresses
Converting binary data to the required output format

Implementation Steps

Step 1: Validate ELF Header

Before processing, verify the file is a valid ELF binary:

Check magic bytes at offset 0: 0x7F 'E' 'L' 'F' (hex: 7f 45 4c 46)
Identify ELF class (32-bit vs 64-bit) at offset 4
Identify endianness at offset 5 (1 = little-endian, 2 = big-endian)

Step 2: Parse ELF Header Fields

Extract key header fields based on ELF class:

For 32-bit ELF:

Program header offset: bytes 28-31
Program header entry size: bytes 42-43
Number of program headers: bytes 44-45

For 64-bit ELF:

Program header offset: bytes 32-39
Program header entry size: bytes 54-55
Number of program headers: bytes 56-57

Step 3: Process Program Headers

Iterate through program headers and identify LOAD segments (type = 1):

Extract virtual address (p_vaddr)
Extract file offset (p_offset)
Extract file size (p_filesz)
Extract memory size (p_memsz)

Step 4: Extract Segment Data

For each LOAD segment:

Read data from file at p_offset
Map data to virtual addresses starting at p_vaddr
Handle alignment and padding as specified

Critical Data Type Considerations

Signed vs Unsigned Integers

This is the most common source of errors in binary extraction tasks.

When reading multi-byte integer values from binary data:

Memory addresses are always unsigned
Size fields are always unsigned
Data values should typically be read as unsigned unless the task explicitly requires signed interpretation

Common API distinctions:

Node.js Buffer: readUInt32LE vs readInt32LE
Python struct: 'I' (unsigned) vs 'i' (signed)
C/C++: uint32_t vs int32_t

Verification: If output contains negative numbers but the expected output shows only positive integers, the wrong signedness was used.

Endianness

Match the endianness specified in the ELF header:

Little-endian (most common on x86/x64): Use LE variants
Big-endian: Use BE variants

Integer Sizes

ELF fields vary by class:

32-bit ELF: addresses and offsets are 4 bytes
64-bit ELF: addresses and offsets are 8 bytes

Verification Strategies

Before Declaring Success

Validate output format: Ensure JSON is well-formed, keys are correct types
Check address ranges: Verify addresses fall within expected segment boundaries
Sample value verification: Manually compute expected values for a few addresses using hex inspection tools

Manual Verification Commands

Use these tools to verify extracted values:

# View ELF header information
readelf -h <binary>

# View program headers (segments)
readelf -l <binary>

# Dump section contents in hex
objdump -s <binary>

# View raw hex bytes at specific offset
xxd -s <offset> -l <length> <binary>

# Calculate expected value from hex bytes (little-endian example)
# For bytes: 41 42 43 44 -> value = 0x44434241 = 1145258561

Value Sanity Checks

If the example output shows only positive integers, verify output contains no negative values
Compare a few computed values against manual hex calculation
Verify address coverage matches expected segment ranges

Common Pitfalls

Using signed integer reads for unsigned data - Results in negative numbers for values with high bit set (e.g., -98693133 instead of 4196274163)
Incorrect endianness handling - Produces completely wrong values; verify against ELF header byte 5
Off-by-one errors in segment boundaries - Carefully track whether sizes are inclusive/exclusive
Assuming 4-byte alignment - Check if segment sizes are multiples of the read size; handle partial reads at boundaries
Mixing 32-bit and 64-bit field sizes - Always check ELF class and use appropriate field sizes
Overconfidence without verification - Never assume "values are read directly from binary, so they should match" - always verify sample values manually

Output Format Considerations

When producing structured output (e.g., JSON):

Use string keys for addresses if they need to be JSON object keys (JSON requires string keys)
Ensure integer values are within JavaScript/JSON safe integer range (2^53 - 1 for full precision)
Consider whether addresses should be decimal or hexadecimal strings based on task requirements

extract-elf

Install Skill

SKILL.md