| name | git-bisect-debugging |
| description | Use when debugging regressions or identifying which commit introduced a bug - provides systematic workflow for git bisect with automated test scripts, manual verification, or hybrid approaches. Can be invoked from systematic-debugging as a debugging technique, or used standalone when you know the issue is historical. |
Git Bisect Debugging
Overview
Systematically identify which commit introduced a bug or regression using git bisect. This skill provides a structured workflow for automated, manual, and hybrid bisect approaches.
Core principle: Binary search through commit history to find the exact commit that introduced the issue. Main agent orchestrates, subagents execute verification at each step.
Announce at start: "I'm using git-bisect-debugging to find which commit introduced this issue."
Quick Reference
| Phase | Key Activities | Output |
|---|---|---|
| 1. Setup & Verification | Identify good/bad commits, verify clean state | Confirmed commit range |
| 2. Strategy Selection | Choose automated/manual/hybrid approach | Test script or verification steps |
| 3. Execution | Run bisect with subagents | First bad commit hash |
| 4. Analysis & Handoff | Show commit details, analyze root cause | Root cause understanding |
MANDATORY Requirements
These are non-negotiable. No exceptions for time pressure, production incidents, or "simple" cases:
✅ ANNOUNCE skill usage at start:
"I'm using git-bisect-debugging to find which commit introduced this issue."✅ CREATE TodoWrite checklist immediately (before Phase 1):
- Copy the exact checklist from "The Process" section below
- Update status as you progress through phases
- Mark phases complete ONLY when finished
✅ VERIFY safety checks (Phase 1 - no skipping):
- Working directory MUST be clean (
git status) - Good commit MUST be verified (actually good)
- Bad commit MUST be verified (actually bad)
- If ANY check fails → abort and fix before proceeding
- Working directory MUST be clean (
✅ USE AskUserQuestion for strategy selection (Phase 2):
- Present all 3 approaches (automated, manual, hybrid)
- Don't default to automated without asking
- User must explicitly choose
✅ LAUNCH subagents for verification (Phase 3):
- Main agent: orchestrates git bisect state
- Subagents: execute verification at each commit (via Task tool)
- NEVER run verification in main context
- Each commit tested in isolated subagent
✅ HANDOFF to systematic-debugging (Phase 4):
- After finding bad commit, announce handoff
- Use superpowers:systematic-debugging skill
- Investigate root cause, not just WHAT changed
Red Flags - STOP and Follow the Skill
If you catch yourself thinking ANY of these, you're about to violate the skill:
- "User is in a hurry, I'll skip safety checks" → NO. Run all safety checks.
- "This is simple, no need for TodoWrite" → NO. Create the checklist.
- "I'll just use automated approach" → NO. Use AskUserQuestion.
- "I'll run the test in my context" → NO. Launch subagent.
- "Found the commit, that's enough" → NO. Handoff to systematic-debugging.
- "Working directory looks clean" → NO. Run
git statusto verify. - "I'll verify good/bad commits later" → NO. Verify BEFORE starting bisect.
All of these mean: STOP. Follow the 4-phase workflow exactly.
The Process
Copy this checklist to track progress:
Git Bisect Progress:
- [ ] Phase 1: Setup & Verification (good/bad commits identified)
- [ ] Phase 2: Strategy Selection (approach chosen, script ready)
- [ ] Phase 3: Execution (first bad commit found)
- [ ] Phase 4: Analysis & Handoff (root cause investigation complete)
Phase 1: Setup & Verification
Purpose: Ensure git bisect is appropriate and safe to run.
Steps:
Verify prerequisites:
- Check we're in a git repository
- Verify working directory is clean (
git status) - If uncommitted changes exist, abort and ask user to commit or stash
Identify commit range:
- Ask user for good commit (where it worked)
- Suggestions: last release tag, last passing CI, commit from when it worked
- Commands to help:
git log --oneline,git tag,git log --since="last week"
- Ask user for bad commit (where it's broken)
- Usually:
HEADor a specific commit where issue confirmed
- Usually:
- Calculate estimated steps: ~log2(commits between good and bad)
- Ask user for good commit (where it worked)
Verify the range:
- Checkout bad commit and verify issue exists
- Checkout good commit and verify issue doesn't exist
- If reversed, offer to swap them
- Return to original branch/commit
Safety checks:
- Warn if range is >1000 commits (ask for confirmation)
- Verify good commit is ancestor of bad commit
- Note current branch/commit for cleanup later
Output: Confirmed good commit hash, bad commit hash, estimated steps
Phase 2: Strategy Selection
Purpose: Choose the most efficient bisect approach.
Assessment: Can we write an automated test script that deterministically identifies good vs bad?
MANDATORY: Use AskUserQuestion tool to present these three approaches (do NOT default to automated):
AskUserQuestion({
questions: [{
question: "Which git bisect approach should we use?",
header: "Strategy",
multiSelect: false,
options: [
{
label: "Automated - test script runs automatically",
description: "Fast, no manual intervention. Best for: test failures, crashes, deterministic behavior. Requires: working test script."
},
{
label: "Manual - you verify each commit",
description: "Handles subjective issues. Best for: UI/UX changes, complex scenarios. Requires: you can manually check each commit."
},
{
label: "Hybrid - script + manual confirmation",
description: "Efficient with reliability. Best for: mostly automated but needs judgment. Requires: script for most cases, manual for edge cases."
}
]
}]
})
Three approaches to present:
Approach 1: Automated Bisect
- When to use: Test failure, crash, deterministic behavior
- How it works: Script returns exit 0 (good) or 1 (bad), fully automatic
- Benefits: Fast, no manual intervention, reproducible
- Requirements: Can write a script that runs the test/check
Approach 2: Manual Bisect
- When to use: UI/UX changes, subjective behavior, complex scenarios
- How it works: User verifies at each commit, Claude guides
- Benefits: Handles non-deterministic or subjective issues
- Requirements: User can manually verify each commit
Approach 3: Hybrid Bisect
- When to use: Mostly automatable but needs human judgment
- How it works: Script narrows range, manual verification for final confirmation
- Benefits: Efficiency of automation with reliability of manual check
- Requirements: Can automate most checks, manual for edge cases
If automated or hybrid selected:
Write test script following this template:
#!/bin/bash
# Exit codes: 0 = good, 1 = bad, 125 = skip (can't test)
# Setup/build (required for each commit)
npm install --silent 2>/dev/null || exit 125
# Run the actual test
npm test -- path/to/specific-test.js
exit $?
Script guidelines:
- Make it deterministic (no random data, use fixed seeds)
- Make it fast (runs ~log2(N) times)
- Exit codes: 0 = good, 1 = bad, 125 = skip
- Include build/setup (each commit might need different deps)
- Test ONE specific thing, not entire suite
- Make it read-only (no data modification)
If manual selected:
Write specific verification steps for subagent:
Good example:
1. Run `npm start`
2. Open browser to http://localhost:3000
3. Click the "Login" button
4. Check if it redirects to /dashboard
5. Respond 'good' if redirect happens, 'bad' if it doesn't
Bad example:
See if the login works
Output: Selected approach, test script (if automated/hybrid), or verification steps (if manual)
Phase 3: Execution
Architecture: Main agent orchestrates bisect, subagents verify each commit in isolated context.
Main agent responsibilities:
- Manage git bisect state (
start,good,bad,reset) - Track progress and communicate remaining steps
- Launch subagents for verification
- Handle errors and cleanup
Subagent responsibilities:
- Execute verification in clean context (no bleeding between commits)
- Report result: "good", "bad", or "skip"
- Provide brief reasoning for result
Execution flow:
Main agent: Run
git bisect start <bad> <good>Loop until bisect completes:
a. Git checks out a commit to test
b. Main agent launches subagent using Task tool:
For automated:
Prompt: "Run this test script and report the result: <script content> Report 'good' if exit code is 0, 'bad' if exit code is 1, 'skip' if exit code is 125. Include the output of the script in your response."For manual:
Prompt: "We're testing commit <hash> (<message>). Follow these verification steps: <verification steps> Report 'good' if the issue doesn't exist, 'bad' if it does exist. Explain what you observed."For hybrid:
Prompt: "Run this test script: <script content> If exit code is 0 or 1, report that result. If exit code is 125 or script is ambiguous, perform manual verification: <verification steps> Report 'good', 'bad', or 'skip' with explanation."c. Subagent returns: Result ("good", "bad", or "skip") with explanation
d. Main agent: Run
git bisect good|bad|skipbased on resulte. Main agent: Update progress
- Show commit that was tested and result
- Calculate remaining steps:
git bisect log | grep "# .*step" | tail -1 - Example: "Tested commit abc123 (bad). ~4 steps remaining."
f. Repeat until git bisect identifies first bad commit
Main agent: Run
git bisect resetto cleanupMain agent: Return to original branch/commit
Error handling during execution:
- Subagent timeout/error: Allow user to manually mark as "skip"
- Build failures: Use
git bisect skip - Too many skips (>5): Suggest manual investigation, show untestable commits
- Bisect interrupted: Ensure
git bisect resetruns in cleanup
Output: First bad commit hash, bisect log showing the path taken
Phase 4: Analysis & Handoff
Purpose: Present findings and analyze root cause.
Steps:
Present the identified commit:
Found first bad commit: <hash> Author: <author> Date: <date> <commit message> Files changed: <list of files from git show --stat>Show how to view details:
View full diff: git show <hash> View file at that commit: git show <hash>:<file>Handoff to root cause analysis:
- Announce: "Now that we've found the breaking commit at
<hash>, I'm using systematic-debugging to analyze why this change caused the issue." - Use superpowers:systematic-debugging skill to investigate
- Focus analysis on the changes in the bad commit
- Identify the specific line/change that caused the issue
- Explain WHY it broke (not just WHAT changed)
- Announce: "Now that we've found the breaking commit at
Output: Root cause understanding of why the commit broke functionality
Safety & Error Handling
Pre-flight Checks (Phase 1)
- ✅ Working directory is clean
- ✅ In a git repository
- ✅ Good/bad commits exist and are valid
- ✅ Good commit is actually good (issue doesn't exist)
- ✅ Bad commit is actually bad (issue exists)
- ✅ Good is ancestor of bad
- ⚠️ Warn if >1000 commits in range
During Execution (Phase 3)
- Subagent fails: Log error, allow skip or abort
- Build fails: Use
git bisect skip, continue - Ambiguous result: Use
git bisect skip, max 5 skips - Can't determine good/bad: Ask user for guidance
Cleanup & Recovery
- Always run
git bisect resetwhen done (success or failure) - If interrupted, prompt user to run
git bisect reset - Return to original branch/commit
- If bisect is running and skill exits, warn user to cleanup
Failure Modes
- Too many skips: Report untestable commits, suggest narrower range or manual review
- Good/bad reversed: Detect pattern (all results opposite), offer to restart with swapped inputs
- No bad commit found: Verify bad commit is actually bad, check if issue is environmental
Best Practices
Optimizing Commit Range
- Narrow the range first if possible:
- Issue appeared last week? Start from last week, not 6 months ago
- Use
git log --since="2 weeks ago"to find starting point - Use tags/releases as good commits when possible
Writing Good Test Scripts
Do:
- ✅ Test ONE specific thing
- ✅ Make it deterministic (fixed seeds, no random data)
- ✅ Make it fast (runs log2(N) times)
- ✅ Include setup/build in script
- ✅ Use proper exit codes (0=good, 1=bad, 125=skip)
Don't:
- ❌ Run entire test suite (too slow)
- ❌ Depend on external state (databases, APIs)
- ❌ Use random data or timestamps
- ❌ Modify production data
Manual Verification
Be specific:
- ✅ "API returns 200 for GET /health"
- ✅ "Login button redirects to /dashboard"
- ❌ "See if it works"
- ❌ "Check if login is broken"
Give exact steps:
- Run server with
npm start - Open browser to http://localhost:3000
- Click element with id="login-btn"
- Verify URL changes to /dashboard
Common Patterns
| Issue Type | Recommended Approach | Script/Steps Example |
|---|---|---|
| Test failure | Automated | npm test -- failing-test.spec.js |
| Crash/error | Automated | `node app.js 2>&1 |
| Performance | Automated | time npm run benchmark | awk '{if ($1 > 5.0) exit 1}' |
| UI/UX change | Manual | "Click X, verify Y appears" |
| Behavior change | Manual or Hybrid | Script to check, manual to confirm subjective aspects |
Progress Communication
- After each step: "Tested commit abc123 (
). ~X steps remaining." - Show bisect log periodically:
git bisect log - Estimate remaining steps: log2(commits in range)
- Example: 100 commits → ~7 steps, 1000 commits → ~10 steps
Common Rationalizations (Resist These!)
| Rationalization | Reality | What to Do Instead |
|---|---|---|
| "User is in a hurry, skip safety checks" | Broken bisect from dirty state wastes MORE time | Run all Phase 1 checks. Always. |
| "This is simple, no need for TodoWrite" | You'll skip phases without tracking | Create checklist immediately |
| "I'll just use automated approach" | User might prefer manual for vague issues | Use AskUserQuestion tool |
| "I'll run the test in my context" | Context bleeding between commits breaks bisect | Launch subagent for each verification |
| "Working directory looks clean" | Assumptions cause failures | Run git status to verify |
| "I'll verify good/bad commits later" | Starting with wrong good/bad wastes all steps | Verify BEFORE git bisect start |
| "Found the commit, user knows why" | User asked to FIND it, not debug it | Hand off to systematic-debugging |
| "Production incident, no time for process" | Skipping process in incidents causes MORE incidents | Follow workflow. It's faster. |
| "I remember from baseline, no need to test" | Skills evolve, baseline was different session | Test at each commit with subagent |
If you catch yourself rationalizing, STOP. Go back to MANDATORY Requirements section.
Integration with Other Skills
Called BY systematic-debugging
When systematic-debugging determines an issue is historical:
systematic-debugging detects:
- Issue doesn't exist in commit from 2 weeks ago
- Issue exists now
→ Suggests: "This appears to be a regression. I'm using git-bisect-debugging to find when it was introduced."
→ Invokes: git-bisect-debugging skill
→ Returns: First bad commit for analysis
→ Resumes: systematic-debugging analyzes the breaking change
Calls systematic-debugging
In Phase 4, after finding the bad commit:
git-bisect-debugging completes:
→ Announces: "Found commit abc123. Using systematic-debugging to analyze root cause."
→ Invokes: superpowers:systematic-debugging
→ Context: "Focus on changes in commit abc123"
→ Goal: Understand WHY the change broke functionality
Limitations (By Design)
This skill focuses on straightforward scenarios. It does NOT handle:
- ❌ Complex merge commit issues (would need
--first-parent) - ❌ Flaky/intermittent test failures (would need statistical approaches)
- ❌ Build system failures across many commits (would need advanced skip strategies)
For these scenarios, manual git bisect with user guidance is recommended.
Example Workflows
Example 1: Automated Test Failure
User: "The login test started failing sometime in the last 50 commits."
Claude: "I'm using git-bisect-debugging to find which commit introduced this issue."
[Phase 1: Setup]
- git status → clean
- Good commit: v1.2.0 tag (last release)
- Bad commit: HEAD
- Verify: checkout v1.2.0, run test → passes
- Verify: checkout HEAD, run test → fails
- Range: 47 commits, ~6 steps estimated
[Phase 2: Strategy Selection]
AskUserQuestion: Which approach?
User selects: Automated
Script created:
#!/bin/bash
npm install --silent 2>/dev/null || exit 125
npm test -- tests/login.spec.js
exit $?
[Phase 3: Execution]
Step 1: Launch subagent for commit abc123
Subagent: "Test failed. Exit code 1. bad"
Main: git bisect bad → ~3 steps remaining
Step 2: Launch subagent for commit def456
Subagent: "Test passed. Exit code 0. good"
Main: git bisect good → ~2 steps remaining
Step 3: Launch subagent for commit ghi789
Subagent: "Test failed. Exit code 1. bad"
Main: git bisect bad → ~1 step remaining
Step 4: Launch subagent for commit jkl012
Subagent: "Test passed. Exit code 0. good"
Main: git bisect good
Git bisect identifies: ghi789 is first bad commit
[Phase 4: Analysis]
Found first bad commit: ghi789
Author: John Doe
Date: 2025-11-10
feat: update authentication middleware
Files changed:
src/auth/middleware.js | 12 ++++---
tests/login.spec.js | 3 +-
Using systematic-debugging to analyze why this change broke the test...
[Hands off to systematic-debugging skill]
Example 2: Manual UI Regression
User: "The dashboard layout looks wrong, but I'm not sure when it broke."
Claude: "I'm using git-bisect-debugging to find which commit introduced this issue."
[Phase 1: Setup]
- git status → clean
- Good commit: 2 weeks ago (user remembers it working)
- Bad commit: HEAD
- Range: 89 commits, ~7 steps estimated
[Phase 2: Strategy Selection]
AskUserQuestion: Which approach?
User selects: Manual
Verification steps created:
1. Run `npm run dev`
2. Open browser to http://localhost:3000/dashboard
3. Check if the sidebar and main content are side-by-side
4. Respond 'good' if layout is correct, 'bad' if sidebar overlaps content
[Phase 3: Execution]
Step 1: Checkout commit abc123
Launch subagent with verification steps
User verifies: "Layout is correct. good"
Main: git bisect good → ~3 steps remaining
Step 2: Checkout commit def456
Launch subagent with verification steps
User verifies: "Sidebar overlaps content. bad"
Main: git bisect bad → ~2 steps remaining
... continues until first bad commit found
[Phase 4: Analysis]
Found first bad commit: mno345
Author: Jane Smith
Date: 2025-11-08
refactor: migrate to CSS Grid layout
Files changed:
src/components/Dashboard.css | 25 +++++++++++----------
Using systematic-debugging to analyze the CSS changes...
[Hands off to systematic-debugging skill]
Example 3: Hybrid Performance Regression
User: "API response times got slower recently."
Claude: "I'm using git-bisect-debugging to find which commit introduced this issue."
[Phase 1: Setup]
- git status → clean
- Good commit: last release tag v2.1.0
- Bad commit: HEAD
- Range: 120 commits, ~7 steps estimated
[Phase 2: Strategy Selection]
AskUserQuestion: Which approach?
User selects: Hybrid
Script created:
#!/bin/bash
npm install --silent 2>/dev/null || exit 125
# Run benchmark 3 times, take average
total=0
for i in 1 2 3; do
time=$(npm run benchmark:api 2>/dev/null | grep "response_time" | awk '{print $2}')
[ -z "$time" ] && exit 125 # Can't test
total=$(echo "$total + $time" | bc)
done
avg=$(echo "scale=2; $total / 3" | bc)
# Threshold: 500ms is acceptable
if (( $(echo "$avg > 500" | bc -l) )); then
exit 1 # bad (too slow)
else
exit 0 # good (fast enough)
fi
Manual fallback steps:
"If script is ambiguous, manually test API and verify response time is <500ms"
[Phase 3: Execution]
Steps proceed with script automation...
If script returns 125 (can't test), subagent asks user to manually verify
[Phase 4: Analysis]
Found first bad commit: pqr678
Author: Bob Johnson
Date: 2025-11-11
feat: add caching layer for user preferences
Files changed:
src/api/middleware/cache.js | 45 ++++++++++++++++++++++++++++++++
Using systematic-debugging to analyze the caching implementation...
[Reveals: Cache lookup is synchronous and blocking, causing slowdown]
Troubleshooting
"Good and bad are reversed"
If early results suggest good/bad are swapped:
- Stop bisect
- Verify issue description is correct
- Swap good/bad commits and restart
"Too many skips, can't narrow down"
If >5 commits skipped:
- Review skipped commits manually
- Check if builds are broken in that range
- Consider narrowing the range or manual investigation
"Bisect is stuck/interrupted"
If bisect state is corrupted or interrupted:
git bisect reset # Clean up bisect state
git checkout main # Return to main branch
# Restart bisect with better range/script
"Subagent is taking too long"
- Set reasonable timeout for verification
- If automated: optimize test script
- If manual: simplify verification steps
- Consider marking commit as 'skip'
Summary
When to use: Historical bugs, regressions, "when did this break" questions
Key strengths:
- ✅ Systematic binary search (efficient)
- ✅ Subagent isolation (clean context)
- ✅ Automated + manual + hybrid approaches
- ✅ Integrates with systematic-debugging
Remember:
- Always verify good is good, bad is bad
- Keep test scripts deterministic and fast
- Use subagents for each verification step
- Clean up with
git bisect reset - Hand off to systematic-debugging for root cause