name	kubernetes-specialist
description	Kubernetes orchestration expert for Helm chart development, custom operators and CRDs, service mesh (Istio/Linkerd), auto-scaling strategies (HPA/VPA/Cluster Autoscaler), multi-cluster management, and production-grade deployments. Use when deploying containerized applications to K8s, implementing GitOps workflows, optimizing pod scheduling, or requiring Kubernetes best practices. Handles ingress controllers, persistent volumes, network policies, and security contexts.
category	Cloud Platforms
complexity	Very High
triggers	kubernetes, k8s, helm, operators, crd, istio, linkerd, service mesh, kubectl, kustomize, argocd, flux

Kubernetes Specialist

Expert Kubernetes orchestration for cloud-native applications with production-grade deployments.

Purpose

Comprehensive Kubernetes expertise including Helm charts, custom operators, service mesh, auto-scaling, and GitOps. Ensures K8s deployments are resilient, secure, observable, and cost-effective.

When to Use

Deploying microservices to Kubernetes
Creating Helm charts for reusable deployments
Implementing auto-scaling (HPA, VPA, Cluster Autoscaler)
Setting up service mesh for advanced networking
Building custom operators with Operator SDK
Implementing GitOps with ArgoCD or Flux
Optimizing pod scheduling and resource allocation

Prerequisites

Required: Docker, kubectl, basic K8s concepts (Pods, Services, Deployments)

Agents: system-architect, cicd-engineer, perf-analyzer, security-manager

Core Workflows

Workflow 1: Production-Grade Deployment

Step 1: Create Deployment Manifest

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: v1
    spec:
      containers:
      - name: app
        image: myregistry/my-app:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          runAsNonRoot: true
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - my-app
              topologyKey: kubernetes.io/hostname

Step 2: Create Service and Ingress

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - my-app.example.com
    secretName: my-app-tls
  rules:
  - host: my-app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app
            port:
              number: 80

Workflow 2: Helm Chart Development

Step 1: Create Helm Chart

helm create my-app
cd my-app

Step 2: Define Values.yaml

# values.yaml
replicaCount: 3

image:
  repository: myregistry/my-app
  tag: "v1.0.0"
  pullPolicy: IfNotPresent

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "500m"

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: my-app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: my-app-tls
      hosts:
        - my-app.example.com

Step 3: Template Deployment

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "my-app.fullname" . }}
  labels:
    {{- include "my-app.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      {{- include "my-app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "my-app.selectorLabels" . | nindent 8 }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        ports:
        - containerPort: 8080
        resources:
          {{- toYaml .Values.resources | nindent 12 }}

Step 4: Install Chart

helm install my-app . -n production --create-namespace
helm upgrade my-app . -n production

Workflow 3: Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Workflow 4: Service Mesh with Istio

Step 1: Install Istio

curl -L https://istio.io/downloadIstio | sh -
cd istio-*
export PATH=$PWD/bin:$PATH
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled

Step 2: Define VirtualService for Traffic Management

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts:
  - my-app
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: my-app
        subset: v2
      weight: 100
  - route:
    - destination:
        host: my-app
        subset: v1
      weight: 90
    - destination:
        host: my-app
        subset: v2
      weight: 10

Step 3: Define DestinationRule

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: my-app
spec:
  host: my-app
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

Best Practices

1. Resource Limits Always

# ✅ GOOD
resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "500m"

# ❌ BAD: No limits (can consume all node resources)
resources: {}

2. Health Probes

# ✅ GOOD: Both liveness and readiness
livenessProbe:
  httpGet:
    path: /health
    port: 8080
readinessProbe:
  httpGet:
    path: /ready
    port: 8080

3. Security Contexts

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]

4. Pod Disruption Budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

5. Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-app-network-policy
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432

Quality Criteria

✅ All workloads have resource limits
✅ Health probes configured
✅ Security contexts applied
✅ PodDisruptionBudgets for critical apps
✅ Network policies restrict traffic
✅ Secrets managed with external-secrets or sealed-secrets
✅ Cluster autoscaling enabled

Troubleshooting

Issue: Pods stuck in Pending Solution: Check node capacity (kubectl describe node), review resource requests

Issue: Service not reachable Solution: Verify selector matches pod labels, check NetworkPolicies

Issue: High memory usage (OOMKilled) Solution: Increase memory limits, profile application memory usage

Related Skills

docker-containerization: Container best practices
aws-specialist: EKS deployment
terraform-iac: K8s cluster provisioning
opentelemetry-observability: Distributed tracing in K8s

Tools

kubectl: Kubernetes CLI
Helm: Package manager
Kustomize: YAML templating
ArgoCD/Flux: GitOps
Istio/Linkerd: Service mesh
Lens: Desktop GUI

MCP Tools

mcp__flow-nexus__sandbox_execute for kubectl commands
mcp__memory-mcp__memory_store for K8s patterns

Success Metrics

Deployment time: <5 minutes
Pod startup time: <30 seconds
Cluster utilization: 60-80%
Zero-downtime deployments: 100%

Skill Version: 1.0.0 Last Updated: 2025-11-02

Core Principles

Kubernetes Specialist operates on 3 fundamental principles:

Principle 1: Declarative Infrastructure Over Imperative Commands

Kubernetes thrives on declarative configuration where you describe the desired state, not the steps to achieve it. This principle ensures reproducibility, version control, and GitOps compatibility.

In practice:

Define all resources in YAML manifests checked into version control
Use Helm charts or Kustomize for templating rather than manual kubectl edits
Implement GitOps workflows with ArgoCD/Flux for automated deployments
Avoid imperative commands like kubectl run or kubectl expose in production

Principle 2: Resource Constraints Are Non-Negotiable

Every container must have explicit resource requests and limits. This principle prevents the noisy neighbor problem, enables efficient bin-packing, and ensures predictable performance across the cluster.

In practice:

Set CPU/memory requests based on actual application profiling (use VPA recommendations)
Set limits to prevent runaway processes from consuming entire node resources
Use HorizontalPodAutoscaler for elastic scaling based on utilization
Monitor resource utilization with metrics-server and adjust requests/limits accordingly

Principle 3: Defense in Depth Through Layered Security

Security is not a single configuration but a composition of multiple protective layers. This principle ensures that even if one layer fails, others provide fallback protection.

In practice:

Apply security contexts at pod level (runAsNonRoot, readOnlyRootFilesystem, capabilities drop)
Enforce network policies to restrict pod-to-pod communication
Use PodSecurityStandards or OPA/Gatekeeper for admission control
Implement RBAC with least-privilege access for service accounts
Scan images with Trivy/Snyk before deployment

Common Anti-Patterns

Anti-Pattern	Problem	Solution
No Resource Limits	Pods can consume all node resources, causing cascading failures and unpredictable scheduling. Without limits, a memory leak or CPU spike can crash the entire node.	Define both requests and limits for every container. Use VPA for recommendations. Set namespace ResourceQuotas to enforce constraints.
Missing Health Probes	Kubernetes cannot detect when containers are unhealthy or not ready to serve traffic. Results in traffic being routed to broken pods, causing user-facing errors.	Implement both liveness (restart if unhealthy) and readiness (remove from service if not ready) probes. Use HTTP checks for web services, TCP for databases. Set appropriate initialDelaySeconds and periodSeconds.
Deploying :latest Tag	Image tags are mutable and can change unexpectedly, leading to version drift, inability to rollback, and non-reproducible deployments across environments.	Always use immutable tags with semantic versioning (e.g., v1.2.3 or git commit SHA). Implement image pull policies (IfNotPresent) to reduce registry load. Use image digests for absolute immutability.
StatefulSets for Stateless Apps	Using StatefulSets when Deployments suffice adds unnecessary complexity, slower rollouts, and limited horizontal scaling capabilities.	Use Deployments for stateless applications. Reserve StatefulSets for databases or applications requiring stable network identity and persistent storage. Use init containers for initialization tasks.
Single Replica in Production	Zero redundancy means any pod failure causes downtime. No high availability, no rolling updates without downtime.	Run minimum 2-3 replicas with PodDisruptionBudgets (minAvailable: 1). Use podAntiAffinity to spread replicas across nodes/zones. Implement HPA for dynamic scaling.

Conclusion

The Kubernetes Specialist skill empowers you to build production-grade container orchestration systems that are resilient, secure, and cost-effective. By adhering to the three core principles of declarative infrastructure, resource constraints, and defense-in-depth security, you ensure that your deployments are not only functional but maintainable at scale. The workflows provided cover the full spectrum from basic deployments to advanced service mesh implementations, giving you the tools to handle everything from greenfield microservices to complex multi-cluster architectures.

Key takeaways: Always define resource limits, implement comprehensive health checks, and apply security contexts. Use Helm charts for reusability and GitOps for automated deployments. The anti-patterns table serves as a checklist to avoid common pitfalls that plague Kubernetes deployments. When combined with the troubleshooting section, you have a complete toolkit for diagnosing and resolving production issues.

This skill is particularly valuable when deploying microservices architectures, implementing auto-scaling strategies, or migrating legacy applications to cloud-native platforms. Whether you're setting up a new EKS cluster on AWS or optimizing an existing GKE deployment on Google Cloud, the patterns and best practices documented here will accelerate your journey to production-ready Kubernetes deployments.

kubernetes-specialist

Install Skill

SKILL.md

Kubernetes Specialist

Purpose

When to Use

Prerequisites

Core Workflows

Workflow 1: Production-Grade Deployment

Workflow 2: Helm Chart Development

Workflow 3: Horizontal Pod Autoscaler (HPA)

Workflow 4: Service Mesh with Istio

Best Practices

Quality Criteria

Troubleshooting

Related Skills

Tools

MCP Tools

Success Metrics

Core Principles

Principle 1: Declarative Infrastructure Over Imperative Commands

Principle 2: Resource Constraints Are Non-Negotiable

Principle 3: Defense in Depth Through Layered Security

Common Anti-Patterns

Conclusion