Claude Code Plugins

Community-maintained marketplace

Feedback

kubernetes-best-practices

@armanzeroeight/fastagent-plugins
19
0

Provides production-ready Kubernetes manifest guidance including resource management, security, high availability, and configuration best practices. This skill should be used when working with Kubernetes YAML files, deployments, pods, services, or when users mention k8s, container orchestration, or cloud-native applications.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name kubernetes-best-practices
description Provides production-ready Kubernetes manifest guidance including resource management, security, high availability, and configuration best practices. This skill should be used when working with Kubernetes YAML files, deployments, pods, services, or when users mention k8s, container orchestration, or cloud-native applications.

Kubernetes Best Practices

This skill provides guidance for writing production-ready Kubernetes manifests and managing cloud-native applications.

Resource Management

Memory: Set requests and limits to the same value to ensure QoS class and prevent OOM kills.

CPU: Set requests only, omit limits to allow performance bursting and avoid throttling.

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"
    # No CPU limit

Image Versioning

Always pin specific versions, never use :latest tag unless explicitly requested:

# Good
image: nginx:1.25.3

# Bad
image: nginx:latest

For immutability, consider pinning to specific digests.

Configuration Management

Secrets: Sensitive data (passwords, tokens, certificates) ConfigMaps: Non-sensitive configuration (feature flags, URLs, settings)

env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: app-secrets
        key: database-url
  - name: LOG_LEVEL
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: log-level

Best practices:

  • Never hardcode secrets in manifests
  • Use external secret management (Sealed Secrets, External Secrets Operator)
  • Rotate secrets regularly
  • Limit access with RBAC

Workload Selection

Choose the appropriate workload type:

  • Deployment: Stateless applications (web servers, APIs, microservices)
  • StatefulSet: Stateful applications (databases, message queues)
  • DaemonSet: Node-level services (log collectors, monitoring agents)
  • Job/CronJob: Batch processing and scheduled tasks

Security Context

Always implement security best practices:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false

Security checklist:

  • Run as non-root user
  • Drop all capabilities by default
  • Use read-only root filesystem
  • Disable privilege escalation
  • Implement network policies
  • Scan images for vulnerabilities

Health Checks

Implement all three probe types:

Liveness: Restart container if unhealthy Readiness: Remove from service endpoints if not ready Startup: Allow slow-starting containers time to initialize

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  periodSeconds: 10
  failureThreshold: 30

High Availability

Replica counts: Set minimum 2 for production workloads

Pod Disruption Budgets: Maintain availability during voluntary disruptions

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

Additional HA considerations:

  • Use anti-affinity rules for pod distribution across nodes
  • Configure graceful shutdown periods
  • Implement horizontal pod autoscaling
  • Set appropriate resource requests for scheduling

Namespace Organization

Use namespaces for environment isolation and apply resource quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: prod-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    persistentvolumeclaims: "10"

Benefits: Logical separation, resource limits, RBAC boundaries, cost tracking

Labels and Annotations

Use consistent, recommended labels:

metadata:
  labels:
    app.kubernetes.io/name: myapp
    app.kubernetes.io/instance: myapp-prod
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/component: backend
    app.kubernetes.io/part-of: ecommerce
    app.kubernetes.io/managed-by: helm

Service Types

  • ClusterIP: Internal cluster communication (default)
  • NodePort: External access via node ports (dev/test)
  • LoadBalancer: Cloud provider load balancer (production)
  • ExternalName: DNS CNAME record (external services)

Storage

Choose appropriate storage class and access mode:

Access Modes:

  • ReadWriteOnce (RWO): Single node read-write
  • ReadOnlyMany (ROX): Multiple nodes read-only
  • ReadWriteMany (RWX): Multiple nodes read-write
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 10Gi

Validation and Testing

Always validate before applying to production:

  1. Client-side validation: kubectl apply --dry-run=client -f manifest.yaml
  2. Server-side validation: kubectl apply --dry-run=server -f manifest.yaml
  3. Test in staging: Deploy to non-production environment first
  4. Monitor metrics: Watch resource usage and application health
  5. Gradual rollout: Use rolling updates with health checks

Application Checklist

When creating or reviewing Kubernetes manifests:

  • Resource requests and limits configured
  • Specific image version pinned (not :latest)
  • Secrets and ConfigMaps used for configuration
  • Security context implemented (non-root, dropped capabilities)
  • Health checks configured (liveness, readiness, startup)
  • Pod Disruption Budget defined for HA workloads
  • Consistent labels applied
  • Appropriate workload type selected
  • Namespace and resource quotas configured
  • Validated with dry-run before applying