Claude Code Plugins

Community-maintained marketplace

Feedback

troubleshoot

@X-McKay/kubani
1
0

Diagnose and fix cluster issues. Use when pods fail, deployments don't work, or services are unreachable.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name troubleshoot
description Diagnose and fix cluster issues. Use when pods fail, deployments don't work, or services are unreachable.

Troubleshoot Cluster Issues

Diagnose and resolve common cluster problems.

Instructions

1. Identify Problem Pods

KUBECONFIG=/home/al/.kube/config

echo "=== Problem Pods ==="
kubectl get pods -A | grep -v Running | grep -v Completed

2. Check Pod Details

KUBECONFIG=/home/al/.kube/config
POD_NAME="k8s-monitor-xxx"
NAMESPACE="ai-agents"

# Describe pod
kubectl describe pod $POD_NAME -n $NAMESPACE

# Check logs
kubectl logs $POD_NAME -n $NAMESPACE --tail=50

# Check previous container logs (if crash loop)
kubectl logs $POD_NAME -n $NAMESPACE --previous --tail=50

3. Common Issues

ImagePullBackOff

  • Image not pushed to registry
  • Wrong image tag
  • Registry authentication issue
# Check if image exists
docker images | grep k8s-monitor

# Push if missing
docker push registry.almckay.io/k8s-monitor:TAG

CrashLoopBackOff

  • Application error on startup
  • Missing environment variables
  • Failed dependency connections
# Check logs for error
KUBECONFIG=/home/al/.kube/config kubectl logs $POD_NAME -n $NAMESPACE --tail=100

# Check env vars
kubectl get pod $POD_NAME -n $NAMESPACE -o jsonpath='{.spec.containers[0].env}'

Pending

  • Insufficient resources
  • Node selector mismatch
  • PVC not bound
# Check events
KUBECONFIG=/home/al/.kube/config kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20

# Check node resources
kubectl top nodes

4. Temporal Workflow Issues

# Check worker connectivity
KUBECONFIG=/home/al/.kube/config kubectl logs -n ai-agents -l app.kubernetes.io/name=k8s-monitor --tail=50 | grep -i temporal

# List running workflows
# (use tctl or Temporal UI)

5. Restart Deployment

KUBECONFIG=/home/al/.kube/config kubectl rollout restart deployment/$DEPLOYMENT -n $NAMESPACE

6. Force Pod Deletion

KUBECONFIG=/home/al/.kube/config kubectl delete pod $POD_NAME -n $NAMESPACE --force --grace-period=0

Escalation

If unable to resolve:

  1. Check node health: kubectl get nodes
  2. Check kubelet logs on node
  3. Review recent changes in git history
  4. Check Flux reconciliation status