| name | k8s-troubleshoot |
| description | Kubernetes troubleshooting toolkit - search pods by labels and execute diagnostic commands inside containers. Use when user reports service errors, exceptions, crashes, timeouts, or needs to check logs, processes, network, or resource usage in K8s pods. |
Kubernetes Troubleshooting Skill
A complete toolkit for diagnosing Kubernetes applications. Find pods by labels, then execute commands inside containers for deep diagnostics.
When to Use
- User reports service errors, exceptions, failures, timeouts
- Need to check application logs or process status
- Diagnose network, memory, or disk issues
- Keywords: error, exception, failed, timeout, crash, not working, logs, troubleshoot, diagnose, pod, container
Workflow
- Search pods - Find target pods by label selector
- Execute diagnostics - Run commands inside containers
Scripts
1. Search Pods
Find pods by label selector:
uv run python .claude/skills/k8s-troubleshoot/scripts/search_pods.py -l "app=nginx" -n default
| Parameter | Required | Description |
|---|---|---|
-l, --label-selector |
Yes | Label selector, e.g., app=nginx or project-id=123,pipeline-id=456 |
-n, --namespace |
No | Namespace (default: default). Use all for all namespaces |
Output: JSON with success, podCount, pods (name, namespace, phase, containers)
2. Execute Command in Pod
Run diagnostic commands inside a container:
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p "pod-name" -n default -cmd "tail -n 100 /root/logs/app.log"
| Parameter | Required | Description |
|---|---|---|
-p, --pod |
Yes | Pod name |
-n, --namespace |
No | Namespace (default: default) |
-c, --container |
No | Container name (for multi-container pods) |
-cmd, --command |
Yes | Command to execute |
Output: JSON with success, pod, namespace, command, output
Common Diagnostic Patterns
View application logs
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "tail -n 100 /root/logs/app.log"
Check Nacos config (dubbo3 issues)
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "cat /root/logs/nacos/config.log | grep nacos"
Check processes
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "ps aux | head -20"
Check network
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "netstat -tlnp"
Check disk and memory
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "df -h && free -m"
Troubleshooting Tips
| Issue | Diagnostic Command |
|---|---|
| dubbo3 no provider | Check /root/logs/nacos/config.log for nacos address |
| Service not responding | Check process status with ps aux and logs |
| Connection issues | Check network with netstat -tlnp |
| OOM errors | Check memory with free -m |