| name | Fixing Bugs Systematically |
| description | Diagnose and fix bugs through systematic investigation, root cause analysis, and targeted validation. Use when something is broken, errors occur, performance degrades, or unexpected behavior manifests. |
Fixing Bugs Systematically
Structured protocol for isolating root causes and implementing focused fixes in existing features.
When to Use
- Something is broken and needs diagnosis and repair
- Error messages or unexpected behavior occurs
- Performance degradation in existing functionality
- Intermittent or hard-to-reproduce issues
Core Steps
1. Context & Reproduction
Read relevant documentation:
docs/feature-spec/F-##-*.mdfor affected featuredocs/user-stories/US-###-*.mdfor expected behavior and acceptance criteriadocs/api-contracts.yamlif API-relateddocs/system-design.mdfor architecture context
Document the bug:
- Expected behavior (cite story AC or spec)
- Actual behavior (what's broken)
- Reproduction steps
- Feature ID (F-##) and Story ID (US-###) if known
2. Investigation
Simple bugs (obvious entry point)
Use direct investigation:
- Grep to locate error messages or related code
- Read suspected files to examine implementation
- Trace function calls and data transformations
- Check related files for connected logic
Complex bugs (multiple subsystems or unclear origin)
Delegate to async agents in parallel:
Spawn senior-engineer agents to:
- Trace error flow through specific subsystem
- Analyze related failure patterns
- Investigate runtime conditions
Spawn context-engineer agents to:
- Map data flow across multiple files
- Find all error handling for specific operation
- Locate configuration and integration points
Example: For authentication bug, spawn:
- Agent 1: "Trace auth flow from login endpoint to session creation"
- Agent 2: "Find all error handling and validation in auth module"
- Agent 3: "Locate session storage config and related code"
Wait for results using ./agent-responses/await {agent_id}
3. Root Cause Analysis
Generate hypotheses:
- List 3-8 potential root causes from investigation
- Rank by probability (evidence from code) and impact
- Select most likely cause(s)
Decision point:
- Fix immediately if root cause is obvious and confirmed
- Add validation if multiple plausible causes or runtime-dependent behavior
4. Validation (if needed)
Add minimal debugging:
- Logging at decision points
- Data inspection at boundaries
- Input/output logging at integration points
Test to confirm root cause before proceeding to fix.
5. Implementation
Fix the confirmed root cause:
- Keep changes minimal and focused
- Maintain API stability unless approved
- Follow existing patterns in codebase
Update documentation if needed:
- Add note in feature spec or changelog
- Update
docs/api-contracts.yamlif contract changed (requires approval) - For slash commands:
/manage-project/update/update-featureto correct spec/manage-project/update/update-storyif ACs were ambiguous/manage-project/update/update-apiif API changed (with approval)
6. Validation & Testing
Verify fix against acceptance criteria:
- Test all ACs from affected user stories
- Check 1-2 key edge cases and error states
- Run contract tests if API changed
- Verify events in
docs/data-plan.mdstill fire correctly
7. Cleanup
- Remove all debugging and logging code
- Verify no temporary files remain
Investigation Strategy
For direct investigation:
- Use grep, read_file to understand subsystem
- Trace flows manually through related files
- Focus on specific area where bug manifests
When to validate before fixing:
- Multiple plausible root causes exist
- Runtime-dependent behavior
- Intermittent or hard-to-reproduce issues
For async investigation:
- Each agent investigates independent subsystem
- Run in parallel for speed
- Maximum 6 agents (diminishing returns)
Artifacts
Inputs:
docs/feature-spec/F-##-*.md— Feature specsdocs/user-stories/US-###-*.md— Expected behavior and ACsdocs/api-contracts.yaml— API specsdocs/system-design.md— Architecture context
Outputs:
- Investigation findings (inline notes or agent reports)
- Updated feature spec with bug resolution notes
- Fixed code with accompanying tests
Quick Reference
| Scenario | Approach |
|---|---|
| Single subsystem, obvious entry | Direct investigation → immediate fix |
| Multiple subsystems, unclear origin | Spawn 2-4 agents in parallel → synthesize findings → fix |
| Runtime-dependent or intermittent | Add targeted logging → reproduce → analyze logs → fix |
| Multiple independent fixes needed | Pass investigation results to fix agents via artifact files |