| name | Binary Analysis and Reverse Engineering |
| description | Systematic approach to analyzing compiled binaries, understanding program behavior, and identifying vulnerabilities without source code access |
| when_to_use | When analyzing closed-source software, malware analysis, understanding proprietary protocols, or assessing compiled applications for security vulnerabilities |
| version | 1.0.0 |
| languages | assembly, c, python |
Binary Analysis and Reverse Engineering
Overview
Binary analysis examines compiled executables without access to source code. This skill combines static analysis (disassembly, decompilation) with dynamic analysis (debugging, tracing) to understand program behavior, identify vulnerabilities, and reverse engineer functionality.
Core principle: Combine static and dynamic analysis. Static reveals structure; dynamic reveals behavior.
When to Use
- Analyzing closed-source software for vulnerabilities
- Malware analysis and understanding
- Reverse engineering proprietary protocols
- Understanding third-party libraries or dependencies
- CTF challenges and security research
- Verifying security claims of binary-only software
Analysis Workflow
Phase 1: Initial Assessment
Goal: Understand what you're analyzing and gather basic information.
# File type and architecture
file binary
# ELF 64-bit LSB executable, x86-64, dynamically linked
# Strings (quick insight into functionality)
strings binary | less
# Dependencies
ldd binary
# Check for linked libraries
# Security features
checksec binary
# RELRO, Stack Canary, NX, PIE, FORTIFY
# Basic metadata
readelf -h binary # ELF headers
objdump -p binary # Program headers
Phase 2: Static Analysis
Goal: Understand program structure without execution.
Disassembly
# Linear disassembly
objdump -d binary > disassembly.txt
# Intelligent disassembly with Ghidra
# 1. Import binary into Ghidra
# 2. Analyze with default options
# 3. Review function list, strings, cross-references
# IDA Pro (commercial but powerful)
# - Advanced decompilation
# - Cross-references
# - Graph view
Function Analysis
# Using radare2
"""
$ r2 binary
[0x00001000]> aa # Analyze all
[0x00001000]> afl # List functions
[0x00001000]> pdf @ main # Disassemble main
[0x00001000]> VV @ main # Visual graph mode
"""
# Common patterns to look for:
# - Entry point and main function
# - Vulnerable functions (strcpy, gets, sprintf)
# - Cryptographic operations
# - Network operations (socket, connect, send)
# - File operations (fopen, read, write)
# - Privilege operations (setuid, system)
Control Flow Analysis
"""
Analyze program flow:
1. Identify entry point
2. Follow execution paths
3. Identify decision points (if/else, switch)
4. Map loops and recursion
5. Identify error handling
6. Find return points
Key questions:
- What are the main code paths?
- Where does user input enter?
- What validation occurs?
- Where are dangerous operations?
"""
Phase 3: Dynamic Analysis
Goal: Observe actual program behavior during execution.
Debugging with GDB
# Start GDB
gdb ./binary
# Set breakpoints
(gdb) break main
(gdb) break *0x401234 # Specific address
# Run with arguments
(gdb) run arg1 arg2
# Examine registers
(gdb) info registers
# Examine memory
(gdb) x/10x $rsp # 10 hex words at stack pointer
(gdb) x/s 0x404000 # String at address
# Step through code
(gdb) stepi # Step one instruction
(gdb) nexti # Step over function call
(gdb) continue # Continue to next breakpoint
# Display on each step
(gdb) display/i $pc # Show current instruction
(gdb) display/x $rax # Show RAX register
Enhanced GDB with PEDA/GEF/pwndbg
# Install PEDA
git clone https://github.com/longld/peda.git ~/peda
echo "source ~/peda/peda.py" >> ~/.gdbinit
# Or GEF (recommended)
bash -c "$(curl -fsSL https://gef.blah.cat/sh)"
# Enhanced features:
# - Color coding
# - Automatic display of registers, stack, code
# - Pattern creation/offset calculation
# - ROP gadget search
# - Heap analysis
System Call Tracing
# Trace system calls
strace ./binary
# Trace with details
strace -v -s 1024 ./binary
# Trace specific syscalls
strace -e trace=open,read,write ./binary
# Trace library calls
ltrace ./binary
# Follow child processes
strace -f ./binary
Dynamic Instrumentation
# Using Frida for runtime instrumentation
import frida
import sys
def on_message(message, data):
print(f"[*] {message}")
# Attach to process
session = frida.attach("target_process")
# JavaScript to inject
script_code = """
Interceptor.attach(Module.findExportByName(null, 'strcmp'), {
onEnter: function(args) {
console.log('[*] strcmp called');
console.log(' arg1: ' + Memory.readUtf8String(args[0]));
console.log(' arg2: ' + Memory.readUtf8String(args[1]));
},
onLeave: function(retval) {
console.log(' return: ' + retval);
}
});
"""
script = session.create_script(script_code)
script.on('message', on_message)
script.load()
sys.stdin.read()
Phase 4: Vulnerability Identification
Common Vulnerability Patterns:
Buffer Overflow
; Look for unsafe string operations
call strcpy ; No bounds checking
call gets ; Always unsafe
call sprintf ; No bounds checking
; Check for:
; - Fixed-size buffers
; - User-controlled input
; - No size validation
Format String
; User input directly to printf
mov rdi, [user_input]
call printf ; Dangerous if user_input has format specifiers
Use After Free
; Pattern:
call free ; Free memory
; ... later ...
mov rax, [ptr] ; Use freed pointer
Integer Overflow
; Look for:
; - Size calculations
; - Loop counters
; - Memory allocation sizes
imul eax, [count], 8 ; Can overflow
call malloc ; Allocates wrong size
Phase 5: Exploit Development
See skills/exploitation/exploit-dev-workflow for detailed exploitation process.
Tool Ecosystem
Disassemblers/Decompilers
Ghidra (Free)
# NSA's reverse engineering suite
# Features:
# - Decompiler (C-like output)
# - Cross-references
# - Scripting (Python/Java)
# - Collaborative analysis
# Download from: https://ghidra-sre.org/
IDA Pro (Commercial)
- Industry standard
- Best-in-class decompiler (Hex-Rays)
- Extensive plugin ecosystem
- IDA Free available with limitations
Binary Ninja (Commercial)
- Modern UI
- Medium-level IL
- Good API for automation
- Active development
radare2 (Free)
# Command-line focused
# Steep learning curve but powerful
# Basic workflow
r2 binary
[0x00001000]> aa # Analyze
[0x00001000]> afl # Functions
[0x00001000]> s main # Seek to main
[0x00001000]> pdf # Disassemble
[0x00001000]> VV # Visual graph
Debuggers
GDB with Extensions
- PEDA - Python Exploit Development Assistance
- GEF - GDB Enhanced Features
- pwndbg - Exploit development focused
WinDbg (Windows)
- Microsoft's debugger
- Kernel and user-mode debugging
- Essential for Windows analysis
x64dbg (Windows)
- Modern UI
- Plugin support
- Good for malware analysis
Dynamic Analysis
Frida
- Dynamic instrumentation
- JavaScript API
- Cross-platform
- Runtime modification
PIN/DynamoRIO
- Dynamic binary instrumentation frameworks
- Research-grade tools
- Performance analysis
Valgrind
# Memory debugging
valgrind --leak-check=full ./binary
# Memory profiling
valgrind --tool=massif ./binary
Common Analysis Scenarios
Scenario 1: Finding Hardcoded Credentials
# Search strings
strings binary | grep -i "password\|secret\|key"
# In Ghidra:
# 1. Window -> Defined Strings
# 2. Search for interesting patterns
# 3. Check cross-references to see usage
Scenario 2: Understanding Network Protocol
# 1. Trace network calls
strace -e trace=network ./binary
# 2. Analyze send/recv calls in disassembly
# 3. Set breakpoints on socket operations
gdb ./binary
(gdb) break send
(gdb) break recv
# 4. Capture actual packets
tcpdump -i lo -w capture.pcap
# 5. Analyze protocol structure
wireshark capture.pcap
Scenario 3: Bypassing License Check
# 1. Search for error messages
strings binary | grep -i "license\|trial\|expired"
# 2. Find string references in Ghidra
# 3. Understand check logic
# 4. Identify bypass point
# Options:
# - Patch binary (change jump condition)
# - Hook function at runtime (Frida)
# - Modify return value in debugger
Scenario 4: Extracting Encryption Keys
# Dynamic approach - hook crypto functions
"""
Interceptor.attach(Module.findExportByName('libcrypto.so', 'AES_set_encrypt_key'), {
onEnter: function(args) {
console.log('[*] AES key:');
console.log(hexdump(args[0], { length: 32 }));
}
});
"""
# Static approach - look for key initialization
# - Search for crypto constants (S-boxes, magic numbers)
# - Analyze key derivation functions
# - Check for embedded keys
Practical Tips
Naming Conventions
# In disassemblers, rename variables/functions for clarity
# Bad: FUN_00401234(local_10, DAT_00404000)
# Good: validate_input(user_buffer, key_string)
# Document as you analyze
# Add comments explaining complex logic
# Create structure definitions for data
Cross-Reference Analysis
# Follow data flow:
# 1. Find interesting data (strings, constants)
# 2. Find references (where it's used)
# 3. Understand context of use
# 4. Trace back to source
# Example: Password validation
# "Invalid password" string
# -> Used in check_password()
# -> Called from login()
# -> Gets input from get_user_input()
Identifying Compiler Artifacts
; Stack canary check (GCC)
mov rax, fs:0x28
mov [rbp-0x8], rax
; ... function body ...
mov rdx, [rbp-0x8]
xor rdx, fs:0x28
je .no_corruption
call __stack_chk_fail
; C++ name mangling
_ZN6Class14memberFunctionEi ; Class1::memberFunction(int)
Anti-Analysis Techniques
Detection:
- Debugger detection (ptrace, IsDebuggerPresent)
- Timing checks
- Code obfuscation
- Anti-disassembly tricks
Countermeasures:
# Patch debugger checks
# In GDB:
(gdb) break ptrace
(gdb) return 0
# Use stealthy debuggers
# - ScyllaHide (plugin)
# - Custom tools
# Deobfuscation
# - Symbolic execution (angr)
# - Dynamic unpacking
# - Pattern matching
Common Pitfalls
| Mistake | Impact | Solution |
|---|---|---|
| Only static analysis | Miss runtime behavior | Combine static + dynamic |
| Not documenting findings | Lose context | Take detailed notes |
| Analyzing without goal | Waste time | Define specific objectives |
| Ignoring cross-references | Miss important connections | Follow all references |
| Not checking compiler version | Misinterpret artifacts | Identify compiler/flags used |
Integration with Other Skills
- skills/analysis/zero-day-hunting - Finding vulnerabilities in binaries
- skills/exploitation/exploit-dev-workflow - Exploiting discovered flaws
- skills/analysis/static-vuln-analysis - Source code analysis if available
Legal and Ethical Considerations
Authorization Required:
- Only analyze authorized software
- Respect license agreements
- Don't distribute cracked software
- Follow responsible disclosure
Legitimate Use Cases:
- Security research with permission
- Malware analysis for defense
- Own software assessment
- Educational purposes with legal samples
Success Metrics
- Understanding program functionality
- Identifying security vulnerabilities
- Extracting useful intelligence
- Creating working exploits (if authorized)
- Comprehensive documentation
References and Further Reading
- "Practical Reverse Engineering" by Dang et al.
- "The IDA Pro Book" by Chris Eagle
- "Practical Binary Analysis" by Dennis Andriesse
- Ghidra documentation and training materials
- Malware analysis books (for dynamic analysis techniques)
- Assembly language references (Intel manuals, x86-64 ABI)
- CTF write-ups for practical examples