Claude Code Plugins

Community-maintained marketplace

Feedback

memory-debugging

@ryanbreen/breenix
4
0

This skill should be used when debugging memory-related issues in the Breenix kernel including page faults, double faults, frame allocation problems, page table issues, heap allocation failures, stack overflows, and virtual memory mapping errors.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name memory-debugging
description This skill should be used when debugging memory-related issues in the Breenix kernel including page faults, double faults, frame allocation problems, page table issues, heap allocation failures, stack overflows, and virtual memory mapping errors.

Memory Debugging for Breenix

Debug kernel memory issues including page faults, allocator problems, and page table errors.

Purpose

Memory bugs in kernel development are among the most difficult to debug. This skill provides systematic approaches for diagnosing page faults, double faults, allocator issues, and page table problems specific to Breenix.

When to Use

  • Page faults: Accessing unmapped or incorrectly mapped memory
  • Double faults: Stack issues or cascading exceptions
  • Frame allocation failures: Out of memory or allocator bugs
  • Page table problems: Wrong mappings, missing entries, incorrect flags
  • Heap allocation issues: OOM, corruption, leaks
  • Stack overflows: Exceeding stack size, missing guard pages
  • Virtual address conflicts: Multiple processes mapping same address

Memory Subsystems in Breenix

1. Physical Memory (Frame Allocator)

Location: kernel/src/memory/frame_allocator.rs

What it does: Manages physical memory frames (4KB pages)

Common issues:

  • Running out of frames
  • Double allocation of same frame
  • Frames not freed properly
  • Initialization failures

Debug approach:

// Add logging to allocation/deallocation
log::debug!("Allocating frame at {:?}", frame);
log::debug!("Frame allocator: {} frames used", count);

// Check allocator state
log::info!("Physical memory: {} MiB usable", memory_mb);

2. Virtual Memory (Page Tables)

Location: kernel/src/memory/ (process_memory.rs, kernel_page_table.rs)

What it does: Maps virtual addresses to physical frames

Common issues:

  • Missing page table entries
  • Wrong flags (PRESENT, WRITABLE, USER_ACCESSIBLE)
  • Shared page table entries causing conflicts
  • Kernel mappings not copied to process page tables

Debug approach:

// Log page table operations
log::debug!("Mapping page {:?} to frame {:?} with flags {:?}",
    page, frame, flags);

// Verify mappings
let result = page_table.translate_addr(addr);
log::debug!("Address {:?} translates to {:?}", addr, result);

3. Heap Allocator

Location: Uses Rust's #[global_allocator]

Size: 1024 KiB

Common issues:

  • Out of heap memory
  • Heap corruption
  • Allocations during early boot (before heap init)

Debug approach:

// Check heap size
log::info!("Heap: 1024 KiB");

// Log allocations if needed
// (Note: Can't use allocations in alloc functions!)

4. Kernel Stacks

Location: kernel/src/memory/kernel_stack.rs

Layout: 8KB stacks with 4KB guard pages at 0xffffc900_0000_0000

Common issues:

  • Stack overflow into guard page
  • Kernel stack not mapped in process page table
  • IST stack issues for double faults

Debug approach:

// Log stack allocation
log::debug!("Allocated kernel stack {} at {:?}", id, addr);

// Check stack bounds
log::debug!("Stack bottom: {:?}, top: {:?}", bottom, top);

// Verify stack is mapped

Common Memory Errors

Error 1: Page Fault

Symptoms:

PAGE FAULT at 0x... Error Code: 0x...

Error Code Decoding:

Bit 0 (P): 0 = Page not present
           1 = Protection violation
Bit 1 (W): 0 = Read access
           1 = Write access
Bit 2 (U): 0 = Kernel mode
           1 = User mode
Bit 3 (R): 1 = Reserved bit set
Bit 4 (I): 1 = Instruction fetch

Common Causes:

1. Accessing unmapped memory

// Problem: Address not mapped in page table
let ptr = 0x12345000 as *const u64;
unsafe { *ptr } // PAGE FAULT - not mapped

Diagnosis:

  • Check if address should be mapped
  • Verify page table has entry for this address
  • Confirm physical frame was allocated

Fix: Map the page before accessing

2. Writing to read-only page

// Problem: Page mapped without WRITABLE flag
let ptr = read_only_page as *mut u64;
unsafe { *ptr = 42; } // PAGE FAULT - write to read-only

Diagnosis:

  • Check page table flags
  • Verify WRITABLE flag is set
  • Confirm not writing to kernel code/data

Fix: Add WRITABLE flag or don't write to read-only pages

3. User accessing kernel page

// Problem: Userspace trying to access kernel memory
// (from Ring 3)
let ptr = 0xFFFF_8000_0000_0000 as *const u64; // Kernel address
unsafe { *ptr } // PAGE FAULT - user accessing kernel

Diagnosis:

  • Check if address is in kernel space (upper half)
  • Verify page doesn't have USER_ACCESSIBLE flag
  • Confirm userspace should not access this

Fix: Don't allow userspace to access kernel memory

4. Accessing kernel stack not mapped in process page table

// Problem: Kernel stack mapped in kernel PT but not process PT
// Ring 3 -> Ring 0 transition tries to use unmapped kernel stack
// This was the DIRECT_EXECUTION_FIX issue!

Diagnosis:

  • Check if kernel stack is mapped in process page table
  • Verify TSS RSP0 points to valid kernel stack
  • Look for double fault during syscalls (int 0x80)

Fix: Copy kernel stack mappings to process page table

Error 2: Double Fault

Symptoms:

DOUBLE FAULT - Error Code: 0x...
Instruction Pointer: 0x...
Stack Pointer: 0x...

What it means: Exception occurred while handling another exception

Common Causes:

1. Kernel stack not mapped during exception

Sequence:
1. Exception occurs (page fault, etc.)
2. CPU tries to switch to kernel stack
3. Kernel stack not mapped in current page table
4. Page fault accessing kernel stack
5. DOUBLE FAULT

Diagnosis:

  • Check which exception triggered the double fault
  • Verify kernel stack is mapped
  • Check TSS RSP0 value
  • Look at instruction pointer (where was CPU when it faulted?)

Fix: Ensure kernel stack mapped in all page tables

2. Stack overflow

Sequence:
1. Recursive function or large stack allocation
2. Stack exceeds allocated size
3. Writes into guard page
4. Page fault (guard page not mapped)
5. Page fault handler needs stack
6. DOUBLE FAULT

Diagnosis:

  • Check stack pointer value
  • Compare against stack bounds
  • Look for recursive calls
  • Check for large stack allocations

Fix: Increase stack size or fix code causing overflow

3. Exception handler itself faults

Sequence:
1. Exception occurs
2. Handler tries to access unmapped memory
3. Page fault inside handler
4. DOUBLE FAULT

Diagnosis:

  • Review exception handler code
  • Check what handler was executing
  • Verify handler doesn't access invalid addresses

Fix: Fix bug in exception handler

Error 3: Page Already Mapped

Symptoms:

Error: Attempted to map already-mapped page

Common Causes:

1. Shared page table levels

// Problem: Multiple processes share L3 table
// Second process tries to map page in shared table
// This was the PAGE_TABLE_FIX issue!

Diagnosis:

  • Check if page table levels are shared between processes
  • Verify each process has independent L3/L2/L1 tables
  • Look at PML4 entry copying code

Fix: Deep copy page table levels, don't share

2. Mapping same address twice

// Problem: Code tries to map a page that's already mapped
page_table.map_to(page, frame, flags, allocator)?;
page_table.map_to(page, frame, flags, allocator)?; // Error!

Diagnosis:

  • Check if page is already mapped before mapping
  • Look for duplicate mapping calls
  • Verify cleanup properly unmaps pages

Fix: Check before mapping or unmap first

Error 4: Out of Memory

Symptoms:

Error: Frame allocator out of memory

Common Causes:

1. Too many allocations

// Problem: Allocating too many frames
loop {
    allocator.allocate_frame(); // Eventually runs out
}

Diagnosis:

  • Log total memory available
  • Count allocations vs deallocations
  • Check for memory leaks

Fix: Free frames when done, or increase memory

2. Memory leaks

// Problem: Frames allocated but never freed
let frame = allocator.allocate_frame()?;
// ... use frame ...
// Forget to deallocate - LEAK!

Diagnosis:

  • Track allocation/deallocation counts
  • Look for allocations without corresponding frees
  • Use systematic allocation patterns

Fix: Properly free all allocated frames

Debugging Techniques

Technique 1: Add Checkpoint Logging

Add logging at critical memory operations:

log::debug!("CHECKPOINT: Before page table operation");
page_table.map_to(page, frame, flags, allocator)?;
log::debug!("CHECKPOINT: After page table operation");

log::debug!("CHECKPOINT: Before memory access");
unsafe { *(addr as *const u64) };
log::debug!("CHECKPOINT: After memory access");

If crash happens between checkpoints, you know where to focus.

Technique 2: Verify Assumptions

Check assumptions about memory state:

// Verify address is mapped before accessing
match page_table.translate_addr(addr) {
    Some(phys) => log::debug!("Address {:?} mapped to {:?}", addr, phys),
    None => log::warn!("Address {:?} NOT MAPPED", addr),
}

// Verify frame was allocated
log::debug!("Allocated frame: {:?}", frame);
assert!(frame.start_address().as_u64() > 0);

// Verify flags are correct
log::debug!("Page mapped with flags: {:?}", flags);
assert!(flags.contains(PageTableFlags::PRESENT));

Technique 3: Dump Page Table State

Add functions to dump page table state:

pub fn dump_page_table_entry(page_table: &PageTable, addr: VirtAddr) {
    let result = page_table.translate_addr(addr);
    log::debug!("Address: {:?}", addr);
    log::debug!("  Translation: {:?}", result);

    // Walk page table levels
    // Log each level's entry
}

Technique 4: Use kernel-debug-loop

Fast iteration for memory issues:

# Test fix quickly
kernel-debug-loop/scripts/quick_debug.py \
  --signal "MEMORY OPERATION COMPLETE" \
  --timeout 10

Technique 5: Compare Working vs Broken State

If something used to work:

# Run working version
git checkout working_commit
kernel-debug-loop/scripts/quick_debug.py ... > working.log

# Run broken version
git checkout broken_commit
kernel-debug-loop/scripts/quick_debug.py ... > broken.log

# Compare
diff -u working.log broken.log

Memory Issue Patterns

Pattern: Syscall Page Fault

Scenario: Page fault when userspace calls int 0x80

Diagnosis:

  1. Kernel stack not mapped in process page table
  2. Ring 3 → Ring 0 transition fails

Fix: copy_kernel_stack_to_process()

Reference: DIRECT_EXECUTION_FIX.md

Pattern: Context Switch Double Fault

Scenario: Double fault during context switch between processes

Diagnosis:

  1. Wrong page table activated
  2. Current stack not mapped in new page table

Fix: Ensure kernel stacks globally mapped

Reference: Global kernel page table architecture

Pattern: Process Creation "Already Mapped"

Scenario: Second process creation fails with "page already mapped"

Diagnosis:

  1. Processes sharing page table levels
  2. First process's mappings conflict with second

Fix: Deep copy page table levels

Reference: PAGE_TABLE_FIX.md

Pattern: Heap Allocation Panic

Scenario: Panic during heap allocation

Diagnosis:

  1. Out of heap memory
  2. Heap not initialized yet
  3. Heap corruption

Fix: Increase heap size, defer allocation, or fix corruption

Integration with Other Skills

With kernel-debug-loop

# Fast iteration on memory fixes
kernel-debug-loop/scripts/quick_debug.py \
  --signal "MEMORY TEST COMPLETE" \
  --timeout 15

With systematic-debugging

Document complex memory bugs:

# Problem
Page fault at 0x10001082 during sys_write

# Root Cause
User buffer not mapped in process page table

# Solution
Verify user addresses before access

# Evidence
Before: PAGE FAULT
After: Successful syscall

With log-analysis

# Find memory-related errors
echo '"PAGE FAULT"' > /tmp/log-query.txt
./scripts/find-in-logs

# Find allocation patterns
echo '"Allocated\|Deallocated"' > /tmp/log-query.txt
./scripts/find-in-logs

Best Practices

  1. Log extensively: Memory operations should be well-logged
  2. Verify before access: Check addresses are mapped
  3. Use checkpoints: Narrow down failure location
  4. Test incrementally: Add small changes, test frequently
  5. Understand architecture: Know how page tables work
  6. Reference working code: Look at similar working code
  7. Document patterns: Save solutions for future reference

Quick Reference

Page Fault Error Codes

0x0: Read from unmapped page (kernel mode)
0x1: Read from non-present page (kernel mode)
0x2: Write to unmapped page (kernel mode)
0x3: Write to non-present page (kernel mode)
0x4: Read from unmapped page (user mode)
0x6: Write to unmapped page (user mode)

Memory Regions

0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF: User space
0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF: Kernel space
0xFFFF_C900_0000_0000: Kernel stacks

Key Files

kernel/src/memory/frame_allocator.rs    - Physical memory
kernel/src/memory/process_memory.rs     - Process page tables
kernel/src/memory/kernel_page_table.rs  - Kernel mappings
kernel/src/memory/kernel_stack.rs       - Kernel stack allocator

Summary

Memory debugging requires:

  • Understanding memory subsystems (frame allocator, page tables, heap, stacks)
  • Systematic diagnosis of page faults and double faults
  • Checkpoint logging to isolate failures
  • Verification of page table state and mappings
  • Reference to past fixes (DIRECT_EXECUTION_FIX, PAGE_TABLE_FIX)
  • Integration with fast iteration tools

Memory bugs are complex but systematic debugging always works.