| name | memory-debugging |
| description | This skill should be used when debugging memory-related issues in the Breenix kernel including page faults, double faults, frame allocation problems, page table issues, heap allocation failures, stack overflows, and virtual memory mapping errors. |
Memory Debugging for Breenix
Debug kernel memory issues including page faults, allocator problems, and page table errors.
Purpose
Memory bugs in kernel development are among the most difficult to debug. This skill provides systematic approaches for diagnosing page faults, double faults, allocator issues, and page table problems specific to Breenix.
When to Use
- Page faults: Accessing unmapped or incorrectly mapped memory
- Double faults: Stack issues or cascading exceptions
- Frame allocation failures: Out of memory or allocator bugs
- Page table problems: Wrong mappings, missing entries, incorrect flags
- Heap allocation issues: OOM, corruption, leaks
- Stack overflows: Exceeding stack size, missing guard pages
- Virtual address conflicts: Multiple processes mapping same address
Memory Subsystems in Breenix
1. Physical Memory (Frame Allocator)
Location: kernel/src/memory/frame_allocator.rs
What it does: Manages physical memory frames (4KB pages)
Common issues:
- Running out of frames
- Double allocation of same frame
- Frames not freed properly
- Initialization failures
Debug approach:
// Add logging to allocation/deallocation
log::debug!("Allocating frame at {:?}", frame);
log::debug!("Frame allocator: {} frames used", count);
// Check allocator state
log::info!("Physical memory: {} MiB usable", memory_mb);
2. Virtual Memory (Page Tables)
Location: kernel/src/memory/ (process_memory.rs, kernel_page_table.rs)
What it does: Maps virtual addresses to physical frames
Common issues:
- Missing page table entries
- Wrong flags (PRESENT, WRITABLE, USER_ACCESSIBLE)
- Shared page table entries causing conflicts
- Kernel mappings not copied to process page tables
Debug approach:
// Log page table operations
log::debug!("Mapping page {:?} to frame {:?} with flags {:?}",
page, frame, flags);
// Verify mappings
let result = page_table.translate_addr(addr);
log::debug!("Address {:?} translates to {:?}", addr, result);
3. Heap Allocator
Location: Uses Rust's #[global_allocator]
Size: 1024 KiB
Common issues:
- Out of heap memory
- Heap corruption
- Allocations during early boot (before heap init)
Debug approach:
// Check heap size
log::info!("Heap: 1024 KiB");
// Log allocations if needed
// (Note: Can't use allocations in alloc functions!)
4. Kernel Stacks
Location: kernel/src/memory/kernel_stack.rs
Layout: 8KB stacks with 4KB guard pages at 0xffffc900_0000_0000
Common issues:
- Stack overflow into guard page
- Kernel stack not mapped in process page table
- IST stack issues for double faults
Debug approach:
// Log stack allocation
log::debug!("Allocated kernel stack {} at {:?}", id, addr);
// Check stack bounds
log::debug!("Stack bottom: {:?}, top: {:?}", bottom, top);
// Verify stack is mapped
Common Memory Errors
Error 1: Page Fault
Symptoms:
PAGE FAULT at 0x... Error Code: 0x...
Error Code Decoding:
Bit 0 (P): 0 = Page not present
1 = Protection violation
Bit 1 (W): 0 = Read access
1 = Write access
Bit 2 (U): 0 = Kernel mode
1 = User mode
Bit 3 (R): 1 = Reserved bit set
Bit 4 (I): 1 = Instruction fetch
Common Causes:
1. Accessing unmapped memory
// Problem: Address not mapped in page table
let ptr = 0x12345000 as *const u64;
unsafe { *ptr } // PAGE FAULT - not mapped
Diagnosis:
- Check if address should be mapped
- Verify page table has entry for this address
- Confirm physical frame was allocated
Fix: Map the page before accessing
2. Writing to read-only page
// Problem: Page mapped without WRITABLE flag
let ptr = read_only_page as *mut u64;
unsafe { *ptr = 42; } // PAGE FAULT - write to read-only
Diagnosis:
- Check page table flags
- Verify WRITABLE flag is set
- Confirm not writing to kernel code/data
Fix: Add WRITABLE flag or don't write to read-only pages
3. User accessing kernel page
// Problem: Userspace trying to access kernel memory
// (from Ring 3)
let ptr = 0xFFFF_8000_0000_0000 as *const u64; // Kernel address
unsafe { *ptr } // PAGE FAULT - user accessing kernel
Diagnosis:
- Check if address is in kernel space (upper half)
- Verify page doesn't have USER_ACCESSIBLE flag
- Confirm userspace should not access this
Fix: Don't allow userspace to access kernel memory
4. Accessing kernel stack not mapped in process page table
// Problem: Kernel stack mapped in kernel PT but not process PT
// Ring 3 -> Ring 0 transition tries to use unmapped kernel stack
// This was the DIRECT_EXECUTION_FIX issue!
Diagnosis:
- Check if kernel stack is mapped in process page table
- Verify TSS RSP0 points to valid kernel stack
- Look for double fault during syscalls (int 0x80)
Fix: Copy kernel stack mappings to process page table
Error 2: Double Fault
Symptoms:
DOUBLE FAULT - Error Code: 0x...
Instruction Pointer: 0x...
Stack Pointer: 0x...
What it means: Exception occurred while handling another exception
Common Causes:
1. Kernel stack not mapped during exception
Sequence:
1. Exception occurs (page fault, etc.)
2. CPU tries to switch to kernel stack
3. Kernel stack not mapped in current page table
4. Page fault accessing kernel stack
5. DOUBLE FAULT
Diagnosis:
- Check which exception triggered the double fault
- Verify kernel stack is mapped
- Check TSS RSP0 value
- Look at instruction pointer (where was CPU when it faulted?)
Fix: Ensure kernel stack mapped in all page tables
2. Stack overflow
Sequence:
1. Recursive function or large stack allocation
2. Stack exceeds allocated size
3. Writes into guard page
4. Page fault (guard page not mapped)
5. Page fault handler needs stack
6. DOUBLE FAULT
Diagnosis:
- Check stack pointer value
- Compare against stack bounds
- Look for recursive calls
- Check for large stack allocations
Fix: Increase stack size or fix code causing overflow
3. Exception handler itself faults
Sequence:
1. Exception occurs
2. Handler tries to access unmapped memory
3. Page fault inside handler
4. DOUBLE FAULT
Diagnosis:
- Review exception handler code
- Check what handler was executing
- Verify handler doesn't access invalid addresses
Fix: Fix bug in exception handler
Error 3: Page Already Mapped
Symptoms:
Error: Attempted to map already-mapped page
Common Causes:
1. Shared page table levels
// Problem: Multiple processes share L3 table
// Second process tries to map page in shared table
// This was the PAGE_TABLE_FIX issue!
Diagnosis:
- Check if page table levels are shared between processes
- Verify each process has independent L3/L2/L1 tables
- Look at PML4 entry copying code
Fix: Deep copy page table levels, don't share
2. Mapping same address twice
// Problem: Code tries to map a page that's already mapped
page_table.map_to(page, frame, flags, allocator)?;
page_table.map_to(page, frame, flags, allocator)?; // Error!
Diagnosis:
- Check if page is already mapped before mapping
- Look for duplicate mapping calls
- Verify cleanup properly unmaps pages
Fix: Check before mapping or unmap first
Error 4: Out of Memory
Symptoms:
Error: Frame allocator out of memory
Common Causes:
1. Too many allocations
// Problem: Allocating too many frames
loop {
allocator.allocate_frame(); // Eventually runs out
}
Diagnosis:
- Log total memory available
- Count allocations vs deallocations
- Check for memory leaks
Fix: Free frames when done, or increase memory
2. Memory leaks
// Problem: Frames allocated but never freed
let frame = allocator.allocate_frame()?;
// ... use frame ...
// Forget to deallocate - LEAK!
Diagnosis:
- Track allocation/deallocation counts
- Look for allocations without corresponding frees
- Use systematic allocation patterns
Fix: Properly free all allocated frames
Debugging Techniques
Technique 1: Add Checkpoint Logging
Add logging at critical memory operations:
log::debug!("CHECKPOINT: Before page table operation");
page_table.map_to(page, frame, flags, allocator)?;
log::debug!("CHECKPOINT: After page table operation");
log::debug!("CHECKPOINT: Before memory access");
unsafe { *(addr as *const u64) };
log::debug!("CHECKPOINT: After memory access");
If crash happens between checkpoints, you know where to focus.
Technique 2: Verify Assumptions
Check assumptions about memory state:
// Verify address is mapped before accessing
match page_table.translate_addr(addr) {
Some(phys) => log::debug!("Address {:?} mapped to {:?}", addr, phys),
None => log::warn!("Address {:?} NOT MAPPED", addr),
}
// Verify frame was allocated
log::debug!("Allocated frame: {:?}", frame);
assert!(frame.start_address().as_u64() > 0);
// Verify flags are correct
log::debug!("Page mapped with flags: {:?}", flags);
assert!(flags.contains(PageTableFlags::PRESENT));
Technique 3: Dump Page Table State
Add functions to dump page table state:
pub fn dump_page_table_entry(page_table: &PageTable, addr: VirtAddr) {
let result = page_table.translate_addr(addr);
log::debug!("Address: {:?}", addr);
log::debug!(" Translation: {:?}", result);
// Walk page table levels
// Log each level's entry
}
Technique 4: Use kernel-debug-loop
Fast iteration for memory issues:
# Test fix quickly
kernel-debug-loop/scripts/quick_debug.py \
--signal "MEMORY OPERATION COMPLETE" \
--timeout 10
Technique 5: Compare Working vs Broken State
If something used to work:
# Run working version
git checkout working_commit
kernel-debug-loop/scripts/quick_debug.py ... > working.log
# Run broken version
git checkout broken_commit
kernel-debug-loop/scripts/quick_debug.py ... > broken.log
# Compare
diff -u working.log broken.log
Memory Issue Patterns
Pattern: Syscall Page Fault
Scenario: Page fault when userspace calls int 0x80
Diagnosis:
- Kernel stack not mapped in process page table
- Ring 3 → Ring 0 transition fails
Fix: copy_kernel_stack_to_process()
Reference: DIRECT_EXECUTION_FIX.md
Pattern: Context Switch Double Fault
Scenario: Double fault during context switch between processes
Diagnosis:
- Wrong page table activated
- Current stack not mapped in new page table
Fix: Ensure kernel stacks globally mapped
Reference: Global kernel page table architecture
Pattern: Process Creation "Already Mapped"
Scenario: Second process creation fails with "page already mapped"
Diagnosis:
- Processes sharing page table levels
- First process's mappings conflict with second
Fix: Deep copy page table levels
Reference: PAGE_TABLE_FIX.md
Pattern: Heap Allocation Panic
Scenario: Panic during heap allocation
Diagnosis:
- Out of heap memory
- Heap not initialized yet
- Heap corruption
Fix: Increase heap size, defer allocation, or fix corruption
Integration with Other Skills
With kernel-debug-loop
# Fast iteration on memory fixes
kernel-debug-loop/scripts/quick_debug.py \
--signal "MEMORY TEST COMPLETE" \
--timeout 15
With systematic-debugging
Document complex memory bugs:
# Problem
Page fault at 0x10001082 during sys_write
# Root Cause
User buffer not mapped in process page table
# Solution
Verify user addresses before access
# Evidence
Before: PAGE FAULT
After: Successful syscall
With log-analysis
# Find memory-related errors
echo '"PAGE FAULT"' > /tmp/log-query.txt
./scripts/find-in-logs
# Find allocation patterns
echo '"Allocated\|Deallocated"' > /tmp/log-query.txt
./scripts/find-in-logs
Best Practices
- Log extensively: Memory operations should be well-logged
- Verify before access: Check addresses are mapped
- Use checkpoints: Narrow down failure location
- Test incrementally: Add small changes, test frequently
- Understand architecture: Know how page tables work
- Reference working code: Look at similar working code
- Document patterns: Save solutions for future reference
Quick Reference
Page Fault Error Codes
0x0: Read from unmapped page (kernel mode)
0x1: Read from non-present page (kernel mode)
0x2: Write to unmapped page (kernel mode)
0x3: Write to non-present page (kernel mode)
0x4: Read from unmapped page (user mode)
0x6: Write to unmapped page (user mode)
Memory Regions
0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF: User space
0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF: Kernel space
0xFFFF_C900_0000_0000: Kernel stacks
Key Files
kernel/src/memory/frame_allocator.rs - Physical memory
kernel/src/memory/process_memory.rs - Process page tables
kernel/src/memory/kernel_page_table.rs - Kernel mappings
kernel/src/memory/kernel_stack.rs - Kernel stack allocator
Summary
Memory debugging requires:
- Understanding memory subsystems (frame allocator, page tables, heap, stacks)
- Systematic diagnosis of page faults and double faults
- Checkpoint logging to isolate failures
- Verification of page table state and mappings
- Reference to past fixes (DIRECT_EXECUTION_FIX, PAGE_TABLE_FIX)
- Integration with fast iteration tools
Memory bugs are complex but systematic debugging always works.