name	systematic-debugging
description	This skill should be used when debugging complex kernel issues requiring systematic investigation and documentation. Use for documenting problem analysis, root cause investigation, solution implementation, and evidence collection following Breenix's Problem→Root Cause→Solution→Evidence pattern.

Systematic Debugging for Breenix

Document-driven debugging workflow for kernel issues.

Purpose

Complex kernel bugs require systematic investigation and documentation. This skill provides the pattern used in Breenix debugging docs like TIMER_INTERRUPT_INVESTIGATION.md, DIRECT_EXECUTION_FIX.md, and PAGE_TABLE_FIX.md.

The Four-Phase Pattern

All debugging documents follow this structure:

Problem: What's broken? Observable symptoms
Root Cause: Why is it broken? Deep analysis
Solution: What fixes it? Implementation details
Evidence: How do you know it's fixed? Before/after proof

When to Use

Complex kernel issues: Not simple typos or obvious bugs
Architectural problems: Issues requiring design changes
Recurring failures: Problems that reappear or are hard to reproduce
Learning opportunities: Bugs that teach important lessons
CI investigations: Failed tests requiring deep analysis

Debugging Workflow

Phase 1: Problem Definition

Document observable symptoms:

# Problem Summary

[Brief description of what's failing]

## Symptoms

- What fails? (test, boot, specific operation)
- When does it fail? (always, intermittently, specific conditions)
- Error messages or behavior observed
- What works vs what doesn't

Example from DIRECT_EXECUTION_FIX.md:

# Problem Summary
Direct userspace execution was failing with a double fault at `int 0x80`
instruction (`0x10000019`).

## Symptoms
- Userspace processes boot successfully
- Calling int 0x80 triggers double fault
- Error occurs during Ring 3 → Ring 0 transition

Phase 2: Root Cause Analysis

Investigate systematically:

Reproduce consistently

# Use kernel-debug-loop for fast iteration
kernel-debug-loop/scripts/quick_debug.py --signal "FAILURE_POINT" --timeout 10

Add diagnostic logging

log::debug!("About to perform operation X");
log::debug!("Variable state: {:?}", state);
log::debug!("After operation X");

Narrow down location
- Binary search: Add checkpoint in middle of suspect code
- If reached: problem is after
- If not reached: problem is before
- Repeat until isolated
Analyze state
- What values are variables?
- What should they be?
- What assumptions are violated?

Document findings:

## Root Cause Analysis

1. **Sequence of Events**:
   - Step 1 happens
   - Step 2 happens
   - Step 3 fails because X

2. **Technical Details**:
   - Specific memory addresses, registers, flags
   - Code paths taken
   - Assumptions violated

3. **Why It Happens**:
   - Fundamental reason for the failure
   - What design assumption was wrong

Phase 3: Solution Implementation

Document the fix:

## Solution

### 1. [Component] Fix
**File**: `path/to/file.rs`
**Lines**: X-Y

[Explanation of what changed and why]

```rust
// Code snippet showing the fix

2. [Another Component] Fix

File: path/to/another/file.rs Lines: X-Y

[Explanation]


**Example structure:**
- Identify all files that need changes
- For each change:
  - File path
  - Line numbers
  - Explanation of change
  - Code snippet
  - Rationale

### Phase 4: Evidence Collection

**Prove it works:**

```markdown
## Evidence

### Before Fix:

[Log output or error messages showing failure]


### After Fix:

[Log output showing success]


### Test Results:
- Test X: PASS
- Test Y: PASS
- Feature Z: Working as expected

Integration with Tools

With kernel-debug-loop

Fast iteration during investigation:

# Test hypothesis quickly
kernel-debug-loop/scripts/quick_debug.py \
  --signal "CHECKPOINT_AFTER_FIX" \
  --timeout 15

With log-analysis

Extract evidence from logs:

# Find before/after comparison
echo '"Error pattern"' > /tmp/log-query.txt
./scripts/find-in-logs

echo '"Success pattern"' > /tmp/log-query.txt
./scripts/find-in-logs

With ci-failure-analysis

Analyze CI test failures:

ci-failure-analysis/scripts/analyze_ci_failure.py \
  --context target/xtask_*_output.txt

Debug Document Template

# [Issue Name] Fix

Date: [YYYY-MM-DD]

## Problem Summary

[What's broken - one paragraph]

## Symptoms

- Symptom 1
- Symptom 2
- Error messages or behavior

## Root Cause Analysis

### Sequence of Events
1. [Step by step what happens]
2. [Leading to failure]

### Technical Details
- Memory addresses, registers, etc.
- Code paths taken
- State at time of failure

### Why It Happens
[Fundamental explanation]

## Solution

### 1. [First Change]
**File**: `path/to/file.rs`
**Lines**: X-Y

[Explanation]

```rust
// Code change

2. [Second Change]

File: path/to/file2.rs Lines: X-Y