Claude Code Plugins

Community-maintained marketplace

Feedback

deploying-cloud-k8s

@mjunaidca/mjs-agent-skills
4
0

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name deploying-cloud-k8s
description Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

Deploying Cloud K8s

Quick Start

  1. Check cluster architecture: kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
  2. Match build platform to cluster (arm64 vs amd64)
  3. Set up GitHub Actions with path filters
  4. Deploy with Helm, passing secrets via --set

Critical: Build-Time vs Runtime Variables

The Problem

Next.js NEXT_PUBLIC_* variables are embedded at build time, not runtime:

# WRONG: Runtime ENV does nothing for NEXT_PUBLIC_*
ENV NEXT_PUBLIC_API_URL=https://api.example.com

# RIGHT: Must be build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL

Build-Time (Next.js)

Variable Purpose
NEXT_PUBLIC_SSO_URL SSO endpoint for browser OAuth
NEXT_PUBLIC_API_URL API endpoint for browser fetch
NEXT_PUBLIC_APP_URL App URL for redirects

Runtime (ConfigMaps/Secrets)

Variable Source
DATABASE_URL Secret (Neon/managed DB)
SSO_URL ConfigMap (internal K8s: http://sso:3001)
BETTER_AUTH_SECRET Secret

Architecture Matching

BEFORE ANY DEPLOYMENT, check architecture:

kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# Output: arm64 arm64  OR  amd64 amd64

Docker Build

- uses: docker/build-push-action@v5
  with:
    platforms: linux/arm64      # MATCH YOUR CLUSTER!
    provenance: false           # Avoid manifest issues
    no-cache: true              # When debugging

Why provenance: false? Buildx attestation creates complex manifest lists that cause "no match for platform" errors.

GitHub Actions CI/CD

Selective Builds with Path Filters

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      web: ${{ steps.filter.outputs.web }}
    steps:
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - 'apps/api/**'
            web:
              - 'apps/web/**'

  build-api:
    needs: changes
    if: needs.changes.outputs.api == 'true'

Next.js Build Args

- name: Build and push (web)
  uses: docker/build-push-action@v5
  with:
    build-args: |
      NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
      NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}

Helm Deployment

- name: Deploy
  run: |
    helm upgrade --install myapp ./helm/myapp \
      --set global.imageTag=${{ github.sha }} \
      --set "secrets.databaseUrl=${{ secrets.DATABASE_URL }}" \
      --set "secrets.authSecret=${{ secrets.BETTER_AUTH_SECRET }}"

Troubleshooting Guide

Quick Diagnosis Flow

Pod not running?
    │
    ├─► ImagePullBackOff
    │       ├─► "not found" ──► Wrong tag or registry
    │       ├─► "unauthorized" ──► Auth/imagePullSecrets
    │       └─► "no match for platform" ──► Architecture mismatch
    │
    ├─► CrashLoopBackOff
    │       ├─► "exec format error" ──► Wrong CPU architecture
    │       ├─► Exit code 1 ──► App startup failure
    │       └─► OOMKilled ──► Memory limits too low
    │
    └─► Pending
            ├─► Insufficient resources ──► Scale cluster
            └─► No matching node ──► Check nodeSelector

Diagnostic Commands

kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -E "(Image:|Failed|Error)"
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
kubectl logs <pod-name> -n <namespace> --tail=50

Error: ImagePullBackOff "not found"

Causes:

  • Tag doesn't exist (short vs full SHA)
  • Wrong registry path
  • Builds skipped by path filters

Fix: Verify image was pushed with exact tag used in deployment

Error: "no match for platform in manifest"

Cause: Image built for wrong architecture OR buildx provenance issue

Fix:

platforms: linux/arm64  # Match cluster!
provenance: false       # Simple manifest
no-cache: true          # Force rebuild

Error: "exec format error"

Cause: Binary architecture doesn't match node

Fix: Rebuild with correct platform, use no-cache: true

Error: Helm comma parsing

failed parsing --set data: key "com" has no value

Cause: Helm interprets commas as array separators

Fix: Use heredoc values file:

- name: Deploy
  run: |
    cat > /tmp/overrides.yaml << EOF
    sso:
      env:
        ALLOWED_ORIGINS: "https://a.com,https://b.com"
    EOF
    helm upgrade --install app ./chart --values /tmp/overrides.yaml

Error: Password authentication failed

Cause: Password with special characters (base64 +/=)

Fix: Use hex passwords:

# Wrong
openssl rand -base64 16  # Can have +/=

# Right
openssl rand -hex 16     # Alphanumeric only

Error: Logout redirects to 0.0.0.0

Cause: request.url returns container bind address

Fix:

const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));

Pre-Deployment Checklist

Architecture

  • Checked cluster node architecture
  • Build platform matches cluster

Docker Build

  • provenance: false set
  • platforms: linux/<arch> matches cluster
  • Image tags consistent between build and deploy

CI/CD

  • All NEXT_PUBLIC_* as build args
  • Secrets passed via --set (not in values.yaml)
  • Path filters configured

Helm

  • No commas in --set values
  • Internal K8s service names for inter-service communication
  • Password single source of truth in values.yaml

Production Debugging

Trace Request Path

# 1. Frontend logs
kubectl logs deploy/web -n myapp --tail=50

# 2. API logs
kubectl logs deploy/api -n myapp --tail=100 | grep -i error

# 3. Sidecar logs (Dapr, etc.)
kubectl logs deploy/api -n myapp -c daprd --tail=50

Common Bug Patterns

Error Likely Cause
AttributeError: no attribute 'X' Model/schema mismatch
404 Not Found on internal call Wrong endpoint URL
Times off by hours Timezone handling bug
greenlet_spawn not called Async SQLAlchemy pattern

GitOps with ArgoCD

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/org/repo.git
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true      # Delete resources not in Git
      selfHeal: true   # Fix drift automatically

Observability

# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 30s

Security

# Pod Security Context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]

Resilience

# HPA + PDB
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
---
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  minAvailable: 1

See references/production-patterns.md for full GitOps, observability, security, and resilience patterns.

Verification

Run: python scripts/verify.py

Related Skills

  • containerizing-applications - Docker and Helm charts
  • operating-k8s-local - Local Kubernetes with Minikube
  • building-nextjs-apps - Next.js patterns

References