| name | kubernetes-operations |
| description | Assist with Kubernetes interactions including debugging (kubectl logs, describe, exec, port-forward), resource management (deployments, services, configmaps, secrets), and cluster operations (scaling, rollouts, node management). Use when working with kubectl, pods, deployments, services, or troubleshooting Kubernetes issues. |
| allowed-tools | Read, Grep, Glob, Bash |
Kubernetes Operations
Comprehensive kubectl assistance for debugging, resource management, and cluster operations with token-efficient scripts.
BEFORE YOU START
This skill prevents 5 common errors and saves ~70% tokens.
| Metric | Without Skill | With Skill |
|---|---|---|
| Pod Debugging | ~1200 tokens | ~400 tokens |
| Resource Listing | ~800 tokens | ~200 tokens |
| Cluster Health | ~1500 tokens | ~300 tokens |
Known Issues This Skill Prevents
- Running kubectl commands in wrong namespace/context
- Verbose output flooding context with unnecessary data
- Missing critical debugging steps (events, previous logs)
- Exposing secrets in plain text output
- Destructive operations without dry-run verification
Quick Start
Step 1: Verify Context
kubectl config current-context
kubectl config get-contexts
Why this matters: Running commands in the wrong cluster can cause production incidents.
Step 2: Debug a Pod
uv run scripts/debug_pod.py <pod-name> [-n namespace]
Why this matters: The script combines describe, logs, and events into a condensed summary, saving ~800 tokens.
Step 3: Check Cluster Health
uv run scripts/cluster_health.py
Why this matters: Quick overview of node status and unhealthy pods without verbose output.
Critical Rules
Always Do
- Always verify
kubectl config current-contextbefore operations - Always use
-n namespaceto be explicit about target - Always use
--dry-run=client -o yamlbefore applying changes - Always check events when debugging:
kubectl get events --sort-by='.lastTimestamp' - Always use
--previousflag when pod is in CrashLoopBackOff
Never Do
- Never run
kubectl deletewithout--dry-runfirst in production - Never output secrets without filtering: avoid
kubectl get secret -o yaml - Never assume default namespace - always specify
-n - Never ignore resource limits when debugging OOMKilled pods
- Never skip
describewhen logs show no errors
Common Mistakes
Wrong:
kubectl logs my-pod
Correct:
kubectl logs my-pod -n my-namespace --tail=100 --timestamps
Why: Default namespace may not be correct, unlimited logs flood context, timestamps help correlate with events.
Known Issues Prevention
| Issue | Root Cause | Solution |
|---|---|---|
| CrashLoopBackOff | App crash on startup | Check kubectl logs --previous and describe for exit codes |
| ImagePullBackOff | Registry auth or image tag | Verify image exists and check pull secrets |
| Pending pods | No schedulable nodes | Check node resources and pod affinity/tolerations |
| OOMKilled | Memory limit exceeded | Check container limits vs actual usage with kubectl top |
| Connection refused | Service selector mismatch | Verify pod labels match service selector |
Debugging Workflows
Pod Not Starting
# 1. Get pod status and events
kubectl describe pod <name> -n <namespace>
# 2. Check logs (current or previous)
kubectl logs <name> -n <namespace> --tail=100
kubectl logs <name> -n <namespace> --previous # If restarting
# 3. Check events for scheduling issues
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep <name>
# 4. Interactive debugging
kubectl exec -it <name> -n <namespace> -- /bin/sh
Service Connectivity
# 1. Verify service exists and has endpoints
kubectl get svc <name> -n <namespace>
kubectl get endpoints <name> -n <namespace>
# 2. Check pod labels match service selector
kubectl get pods -n <namespace> --show-labels
# 3. Test from within cluster
kubectl run debug --rm -it --image=busybox -- wget -qO- http://<service>:<port>
# 4. Port-forward for local testing
kubectl port-forward svc/<name> 8080:80 -n <namespace>
Resource Management
Deployments
# List deployments
kubectl get deployments -n <namespace>
# Scale
kubectl scale deployment <name> --replicas=3 -n <namespace>
# Rollout status
kubectl rollout status deployment/<name> -n <namespace>
# Rollback
kubectl rollout undo deployment/<name> -n <namespace>
# History
kubectl rollout history deployment/<name> -n <namespace>
ConfigMaps and Secrets
# List
kubectl get configmaps -n <namespace>
kubectl get secrets -n <namespace>
# View ConfigMap data
kubectl get configmap <name> -n <namespace> -o jsonpath='{.data}'
# View Secret keys (NOT values)
kubectl get secret <name> -n <namespace> -o jsonpath='{.data}' | jq 'keys'
# Create from file
kubectl create configmap <name> --from-file=<path> -n <namespace> --dry-run=client -o yaml
Cluster Operations
Node Management
# List nodes with status
kubectl get nodes -o wide
# Node details
kubectl describe node <name>
# Cordon (prevent scheduling)
kubectl cordon <node>
# Drain (evict pods)
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
# Uncordon
kubectl uncordon <node>
Resource Usage
# Node resources
kubectl top nodes
# Pod resources
kubectl top pods -n <namespace>
# Sort by memory
kubectl top pods -n <namespace> --sort-by=memory
Bundled Resources
Scripts
Located in scripts/:
debug_pod.py- Comprehensive pod debugging with condensed outputget_resources.py- Resource summary using jsonpath for minimal tokenscluster_health.py- Quick cluster status overview
References
Located in references/:
- `kubectl-cheatsheet.md` - Condensed command reference
- `jsonpath-patterns.md` - Common JSONPath expressions
- `debugging-flowchart.md` - Decision tree for pod issues
Note: For deep dives on specific topics, see the reference files above.
Dependencies
Required
| Package | Version | Purpose |
|---|---|---|
| kubectl | 1.25+ | Kubernetes CLI |
| jq | 1.6+ | JSON parsing for scripts |
Optional
| Package | Version | Purpose |
|---|---|---|
| k9s | 0.27+ | Terminal UI for Kubernetes |
| stern | 1.25+ | Multi-pod log tailing |
Official Documentation
Troubleshooting
kubectl command not found
Symptoms: command not found: kubectl
Solution:
# macOS
brew install kubectl
# Verify
kubectl version --client
Context not set
Symptoms: error: no context is currently set
Solution:
# List available contexts
kubectl config get-contexts
# Set context
kubectl config use-context <context-name>
Permission denied
Symptoms: Error from server (Forbidden)
Solution:
# Check current user
kubectl auth whoami
# Check permissions
kubectl auth can-i get pods -n <namespace>
kubectl auth can-i --list -n <namespace>
Timeout connecting to cluster
Symptoms: Unable to connect to the server: dial tcp: i/o timeout
Solution:
# Check cluster endpoint
kubectl cluster-info
# Verify network connectivity
curl -k https://<cluster-api-endpoint>/healthz
# Check kubeconfig
cat ~/.kube/config
Setup Checklist
Before using this skill, verify:
-
kubectlinstalled (kubectl version --client) - Kubeconfig configured (
~/.kube/configexists) - Context set to correct cluster (
kubectl config current-context) - Permissions verified (
kubectl auth can-i get pods) -
jqinstalled for JSON parsing (jq --version)