Claude Code Plugins

Community-maintained marketplace

Feedback

incident-analysis

@My3VM/ClaudeSkillDemo
0
0

Analyze and resolve production incidents using systematic investigation, root cause analysis, and autonomous remediation

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name incident-analysis
description Analyze and resolve production incidents using systematic investigation, root cause analysis, and autonomous remediation

Incident Analysis and Resolution Skill

This skill guides you through systematic incident response, from initial alert to resolution and documentation.

When to Use This Skill

Use this skill when:

  • Production system is degraded or failing
  • Users are reporting issues or errors
  • Metrics show abnormal behavior
  • Need to investigate and resolve incidents autonomously

Incident Response Workflow

MANDATORY FIRST STEP:

Before using any tools, use the Read tool to read: .claude/skills/incident-analysis/phases/triage.md

This file contains Phase 1 instructions and will tell you which file to read next.

DO NOT proceed with tool calls until you've read Phase 1.

The complete workflow consists of 6 phases:

  1. Triage & Assessment (2-5 min) → phases/triage.md
  2. Investigation (5-10 min) → phases/investigation.md
  3. Root Cause Analysis (2-5 min) → phases/rca.md
  4. Remediation Planning (2-3 min) → phases/remediation.md
  5. Execution (2-5 min) → phases/execution.md
  6. Communication & Documentation (3-5 min) → phases/documentation.md

Each phase file contains the "Next Step" section that directs you to the next phase file.

Available Tools

MCP servers provide the necessary tools:

  • monitoring-analysis server - System metrics, log analysis, health checks
  • workflow-orchestration server - Incident tickets, remediation, notifications

Tools are available throughout all phases as needed.

Key Principles

Progressive Disclosure: The phase files reveal detailed instructions progressively. Read each phase file in sequence - do not skip ahead or assume you know what to do.

Autonomous Execution: Make decisions based on evidence. Take action. Show reasoning.

TodoWrite: Create todos at start for all 6 phases. Update status as you progress.

Success Criteria

Incident is resolved when:

  • ✅ Root cause identified with 90%+ confidence
  • ✅ Remediation executed and verified
  • ✅ Metrics returned to baseline
  • ✅ No new errors occurring
  • ✅ Team notified and documented