Claude Code Plugins

Community-maintained marketplace

Feedback

quality-debugging-troubleshooting

@vasilyu1983/AI-Agents-public
14
0

Systematic debugging methodologies, troubleshooting workflows, logging strategies, error tracking, performance profiling, stack trace analysis, and debugging tools across languages and environments. Covers local debugging, distributed systems, production issues, and root cause analysis.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name quality-debugging-troubleshooting
description Systematic debugging methodologies, troubleshooting workflows, logging strategies, error tracking, performance profiling, stack trace analysis, and debugging tools across languages and environments. Covers local debugging, distributed systems, production issues, and root cause analysis.

Debugging & Troubleshooting — Quick Reference

This skill provides execution-ready debugging strategies, troubleshooting workflows, and root cause analysis techniques. Claude should apply these patterns when users encounter bugs, errors, performance issues, or production incidents.

Modern Best Practices (2025): Structured logging (Pino/Winston), distributed tracing (OpenTelemetry), error tracking (Sentry/Rollbar), observability-first debugging, time-travel debugging, AI-assisted error analysis, and proactive monitoring.


Quick Reference

Symptom Tool/Technique Command/Approach When to Use
Application crashes Stack trace analysis Check error logs, identify first line in your code Unhandled exceptions
Slow performance Profiling (CPU/memory) node --prof, Chrome DevTools, cProfile High CPU, latency issues
Memory leak Heap snapshots node --inspect, compare snapshots over time Memory usage grows
Database slow Query profiling EXPLAIN ANALYZE, slow query log Slow queries, high DB CPU
Production-only bug Log analysis + feature flags grep "ERROR", enable verbose logging for user Can't reproduce locally
Distributed system issue Distributed tracing OpenTelemetry, Jaeger, trace request ID Microservices, async workflows
Intermittent failures Logging + monitoring Add detailed logs, monitor metrics Race conditions, timeouts
Network timeout Network debugging curl, Postman, check firewall/DNS External API failures

Decision Tree: Debugging Strategy

Issue type: [Problem Scenario]
    ├─ Application Behavior?
    │   ├─ Crashes immediately? → Check stack trace, error logs
    │   ├─ Slow/hanging? → CPU/memory profiling
    │   ├─ Intermittent failures? → Add logging, reproduce consistently
    │   └─ Unexpected output? → Binary search (add logs to narrow down)
    │
    ├─ Performance Issues?
    │   ├─ High CPU? → CPU profiler to find hot functions
    │   ├─ Memory leak? → Heap snapshots, track over time
    │   ├─ Slow database? → EXPLAIN ANALYZE, check indexes
    │   ├─ Network latency? → Trace external API calls
    │   └─ Frontend slow? → Lighthouse, Web Vitals profiling
    │
    ├─ Production-Only?
    │   ├─ Can't reproduce? → Analyze logs for patterns
    │   ├─ Environment difference? → Compare configs, data volume
    │   ├─ Need safe debugging? → Feature flags for verbose logging
    │   └─ Recent deployment? → Git bisect to find regression
    │
    ├─ Distributed Systems?
    │   ├─ Multiple services involved? → Distributed tracing (Jaeger)
    │   ├─ Request lost? → Search logs by request ID
    │   ├─ Service dependency? → Check health checks, circuit breakers
    │   └─ Async workflow? → Trace message queue, event logs
    │
    └─ Error Type?
        ├─ TypeError/NullPointer? → Check object existence, defensive coding
        ├─ Network timeout? → Check external service health, retry logic
        ├─ Database error? → Check connection pool, query syntax
        └─ Unknown error? → Systematic debugging workflow (observe, hypothesize, test)

When to Use This Skill

Claude should invoke this skill when a user reports:

  • Application crashes or errors
  • Unexpected behavior or bugs
  • Performance issues (slow queries, memory leaks, high CPU)
  • Production incidents requiring root cause analysis
  • Stack trace or error message interpretation
  • Debugging strategies for specific scenarios
  • Log analysis and pattern detection
  • Distributed system debugging (microservices, async workflows)
  • Memory leaks and resource exhaustion
  • Race conditions and concurrency issues
  • Network connectivity problems
  • Database query optimization
  • Third-party API integration issues

Operational Deep Dives

See resources/operational-patterns.md for systematic debugging workflows, logging strategy details, stack trace and performance profiling guides, and language-specific tooling checklists.


Templates (Copy-Paste Ready)

Production templates organized by workflow type:


Resources (Deep-Dive Guides)

Operational best practices by domain:


Navigation

Resources

Templates

Data


External Resources

See data/sources.json for:

  • Debugging tool documentation
  • Error tracking platforms (Sentry, Rollbar, Bugsnag)
  • Observability platforms (Datadog, New Relic, Honeycomb)
  • Profiling tutorials and guides
  • Production debugging best practices

Quick Decision Matrix

Symptom Likely Cause First Action
Application crashes Unhandled exception Check error logs and stack trace
Slow performance Database/network/CPU bottleneck Profile with performance tools
Memory usage grows Memory leak Take heap snapshots over time
Intermittent failures Race condition, network timeout Add detailed logging around failure
Production-only bug Environment difference, data volume Compare prod vs dev config/data
High CPU usage Infinite loop, inefficient algorithm CPU profiler to find hot functions
Database slow Missing index, N+1 queries Run EXPLAIN ANALYZE on slow queries

Anti-Patterns to Avoid

  • Random changes - Making changes without hypothesis
  • Inadequate logging - Can't debug what you can't see
  • Debugging in production - Always reproduce locally when possible
  • Ignoring stack traces - Stack trace tells you exactly where error occurred
  • Not writing tests - Fix today, break tomorrow
  • Symptom fixing - Treating symptoms instead of root cause
  • No monitoring - Flying blind in production
  • Skipping postmortems - Not learning from incidents

Related Skills

This skill works with other skills in the framework:

Development & Operations:

Infrastructure & Platform:

  • ops-devops-platform - CI/CD pipelines, monitoring, incident response, SRE practices, Kubernetes ops
  • ops-database-sql - Database query optimization, EXPLAIN ANALYZE, index tuning, slow query debugging

AI/ML Operations:

  • ai-ml-ops-production - ML model debugging, drift detection, API monitoring, batch pipeline troubleshooting
  • ai-ml-ops-security - Security debugging, jailbreak detection, privacy issues, threat modeling

Success Criteria: Issues are diagnosed systematically, root causes are identified accurately, fixes include regression tests, and debugging knowledge is documented for future reference.