| name | docker-k8s |
| description | Master containerization and orchestration with security-first approach. Expert in Docker multi-stage builds, Kubernetes zero-trust deployments, security hardening, GitOps workflows, and production-ready patterns for cloud-native applications. Includes 2025 best practices from CNCF and major cloud providers. |
| license | MIT |
Containerization & Kubernetes with Security Hardening
This skill provides comprehensive patterns for containerizing applications and deploying to Kubernetes in 2025, focusing on zero-trust security, multi-stage optimization, production hardening, and cloud-native best practices that work across different cloud providers.
When to Use This Skill
Use this skill when you need to:
- Create secure multi-stage Docker builds
- Deploy applications to Kubernetes with security hardening
- Implement zero-trust security patterns
- Set up GitOps workflows with ArgoCD/Flux
- Optimize container images for production
- Configure cluster security with Pod Security Standards
- Implement secure networking with service meshes
- Set up monitoring and observability
- Deploy to multiple cloud providers (AWS, GCP, Azure, DO)
Secure Multi-Stage Docker Builds
1. Security-First Multi-Stage Builds
# Dockerfile with security hardening
# Build stage
FROM python:3.11-slim AS builder
# Set build-time security arguments
ARG DEBIAN_FRONTEND=noninteractive
ARG DEBCONF_NONINTERACTIVE_SEEN=true
ARG BUILDPLATFORM
ARG TARGETPLATFORM
# Install build dependencies with security updates
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
build-essential \
libpq-dev \
ca-certificates \
curl \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
truncate -s 0 /var/log/apt/history.log
# Create non-root user for build
RUN groupadd -r builder && \
useradd -r -g builder builder
# Set working directory
WORKDIR /app
# Copy dependency files
COPY requirements.txt requirements-dev.txt ./
# Install Python dependencies in virtual environment
RUN python -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache-dir -U pip && \
/opt/venv/bin/pip install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.11-slim AS production
# Security labels
LABEL maintainer="security-team@company.com" \
version="1.0.0" \
security.scan="enabled" \
org.opencontainers.image.vendor="Company"
# Install runtime dependencies with security updates
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
libpq5 \
ca-certificates \
curl \
dumb-init \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
truncate -s 0 /var/log/apt/history.log
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
# Create non-root user for runtime
RUN groupadd -r appuser && \
useradd -r -g appuser -s /bin/sh appuser
# Set working directory
WORKDIR /app
# Copy application code
COPY --chown=appuser:appuser app/ ./app/
COPY --chown=appuser:appuser alembic/ ./alembic/
COPY --chown=appuser:appuser alembic.ini .
# Set permissions
RUN chmod -R 755 /app && \
chmod -x /app/alembic.ini
# Switch to non-root user
USER appuser
# Set PATH
ENV PATH="/opt/venv/bin:$PATH"
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Security scan before running
RUN python -c "import urllib.request; urllib.request.urlopen('https://security-scan.example.com/scan')"
# Expose port
EXPOSE 8000
# Use dumb-init as PID 1
ENTRYPOINT ["dumb-init", "--"]
# Run the application
CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers=4", "--access-log=-"]
2. Docker Security Best Practices
# Dockerfile.security
# Base security hardening
FROM python:3.11-slim AS base
# Kernel security settings
RUN sysctl -w net.ipv4.ip_forward=1 && \
sysctl -w net.ipv6.conf.all.forwarding=1 && \
sysctl -w net.ipv4.conf.all.send_redirects=0 && \
sysctl -w net.ipv4.conf.default_accept_source_route=0 && \
sysctl -w kernel.dmesg_restrict=1
# Remove setuid/setgid binaries
RUN find / -type f -perm /6000 -exec chmod a-s {} \; || true && \
find / -type f -perm /4000 -exec chmod a+r {} \; || true
# Security scanning
RUN apt-get update && \
apt-get install -y trivy && \
trivy image --severity HIGH,CRITICAL python:3.11-slim
# Production application with security
FROM base AS production
# Add security labels
LABEL \
security.stack.linux.docker.image-version="1" \
security.stack.linux.docker.kernel-version="6.x" \
security.stack.linux.docker.os-version="Debian 12" \
security.stack.cis.docker.version="1.6.0"
# Implement runtime security
RUN apt-get update && \
apt-get install -y \
apparmor-profiles \
fail2ban \
rkhunter \
chkrootkit \
&& \
apt-get clean
# Configure fail2ban
COPY fail2ban.conf /etc/fail2ban/jail.local
# Configure AppArmor
COPY apparmor-profiles /etc/apparmor.d/
# Security runtime checks
RUN python -c "
import subprocess
import sys
# Check for vulnerable packages
result = subprocess.run(['apt', 'list', '--upgradable'],
capture_output=True, text=True)
if result.stdout.strip():
print('WARNING: Packages need updates:', result.stdout.strip())
sys.exit(1)
"
3. Zero-Trust Kubernetes Deployment
# manifests/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
name: production
security-tier: "high"
environment: "production"
annotations:
pod-security.kubernetes.io/enforce: "restricted"
pod-security.kubernetes.io/audit: "restricted"
pod-security.kubernetes.io/warn: "restricted"
---
# manifests/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: production-apps
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets", "daemonsets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies"]
verbs: ["get", "list", "create", "update", "patch"]
---
apiVersion: rbac.authorization.kubernetes.io/v1
kind: ClusterRoleBinding
metadata:
name: production-apps-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: production-apps
subjects:
- kind: ServiceAccount
name: default
namespace: production
---
# manifests/network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# Deny all ingress and egress by default
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
- to: []
ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 53
---
# manifests/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: production
labels:
app: secure-app
tier: application
version: v1
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
tier: application
version: v1
security.stack.company.com/container-security-level: "high"
security.stack.company.com/network-segment: "application"
annotations:
container.security.kubernetes.io/scc: "restricted"
seccomp.security.kubernetes.io/profile: "runtime/default"
kubernetes.io/psp: "restricted"
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
readOnlyRootFilesystem: false
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
containers:
- name: app
image: your-registry/secure-app:v1.0.0
imagePullPolicy: Always
ports:
- containerPort: 8000
name: http
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
resources:
limits:
cpu: "500m"
memory: "512Mi"
requests:
cpu: "100m"
memory: "128Mi"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
startupProbe:
httpGet:
path: /health
port: 8000
failureThreshold: 30
periodSeconds: 10
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
imagePullSecrets:
- name: registry-secret
serviceAccountName: restricted-sa
automountServiceAccountToken: false
---
# manifests/service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: restricted-sa
namespace: production
annotations:
iam.gke.io/gcp-service-account: "production-sa@your-project.iam.gserviceaccount.com"
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/production-role"
---
# manifests/pod-security-policy.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted-psp
namespace: production
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
'downwardAPI'
'persistentVolumeClaim'
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
4. Service Mesh with Istio
# istio/istio-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-config
namespace: istio-system
data:
meshConfig: |
accessLogFile: /dev/stdout
defaultConfig:
concurrency: 3
circuitBreakers:
consecutive5xxErrors: 5
timeout: 30s
maxConnections: 1000
maxRequestsPerConnection: 500
connectionPoolSettings:
tcp:
maxConnections: 100
connectTimeout: 10s
http:
http2MaxRequests: 100
maxRequestsPerConnection: 10
maxRetries: 3
outlierDetection:
consecutiveGatewayErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 100
proxyStatsMatcher: "-x"
telemetry:
v2:
prometheus:
enabled: true
customTags:
app: "secure-app"
version: "v1"
---
# istio/destination-rule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: secure-app
namespace: production
spec:
host: secure-app
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
clientCertificate:
- /etc/istio/destination-client-certs
connectionPool:
tcp:
maxConnections: 100
connectTimeout: 30s
keepAlive:
time: 7200s
requests: 10000
http:
http2MaxRequests: 1000
maxRequestsPerConnection: 100
maxRetries: 3
idleTimeout: 120s
h2UpgradePolicy: RFC
keepAlive:
time: 7200s
requests: 100
loadBalancer:
simple: LEAST_CONN
circuitBreaker:
consecutiveErrors: 7
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 100
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 10
retryOn:
- connect-failure
- refused-stream
- canceled
- timeout
- retriable-status
retryOff:
- overloaded
- deadline-exceeded
- unknown
timeout: 300s
tls:
mode: ISTIO_MUTUAL
subsets:
- name: v1
labels:
version: v1
trafficPolicy:
loadBalancer:
leastRequestConn: 100
connectionPool:
tcp:
maxConnections: 10
connectTimeout: 10s
keepAlive:
time: 300s
http:
http1MaxPendingRequests: 10
maxRequestsPerConnection: 2
maxRetries: 2
idleTimeout: 60s
outlierDetection:
consecutiveErrors: 3
interval: 15s
baseEjectionTime: 15s
maxEjectionPercent: 10
timeout: 60s
---
# istio/virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: secure-app
namespace: production
spec:
hosts:
- secure-app.production.company.com
gateways:
- istio-system/ingressgateway
http:
- match:
- uri:
prefix: /
retries:
attempts: 3
perTryTimeout: 2s
retryOn: [5xx, connect-failure, refused-stream]
fault:
delay:
percentage:
value: 0
fixedDelay: 0s
timeout: 300s
route:
- destination:
host: secure-app
subset: v1
mirror:
host: secure-app
subset: v1
mirrorPercentage:
value: 5
http:
- match:
- headers:
canary: "true"
route:
- destination:
host: secure-app
subset: canary
weight: 10
http:
- route:
- destination:
host: secure-app
subset: v1
weight: 90
5. ArgoCD GitOps Configuration
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: secure-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/company/secure-app-k8s
targetRevision: HEAD
path: kubernetes/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PruneLast=false
- PrunePropagationPolicy=foreground
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
ignoreDifferences:
- JSONPath: '.spec.template.spec.replicas'
revisionHistoryLimit: 10
plugins:
- name: argocd-image-updater
args:
- image-updater
- allow-tags
- kwok='image-tag: "*"'
- name: argocd-notifications
args:
- notifications
- service: slack
- slack-token: $SLACK_TOKEN
- name: argocd-cm
args:
- cm
- configMapName: argocd-cm
- configMapKey: config.yaml
6. Monitoring and Observability
# monitoring/prometheus.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "/etc/prometheus/rules/*.yml"
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_name]
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
source_label: __meta_kubernetes_pod_annotation_prometheus_io_scrape
target_label: __meta_kubernetes_pod_annotation_prometheus_io_scrape
metric_relabel_configs:
- source_labels: [__name__]
regex: 'instance_.*'
target_label: instance
- action: labelmap
regex: 'pod_.*'
target_label: pod
- source_labels: [__name__]
regex: 'namespace_.*'
target_label: namespace
- source_labels: [__name__]
regex: 'workload_.*'
target_label: workload
- job_name: 'kubernetes-services'
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
source_label: __meta_kubernetes_service_annotation_prometheus_io_scrape
target_label: __meta_kubernetes_service_annotation_prometheus_io_scrape
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: keep
regex: true
source_label: __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __meta_kubernetes_service_annotation_prometheus_io_scheme
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: keep
regex: true
source_label: __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __meta_kubernetes_service_annotation_prometheus_io_path
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_label]
target_label: kubernetes_name
# Security rules
---
apiVersion: v1
kind: Secret
metadata:
name: prometheus-rules
namespace: monitoring
type: Opaque
stringData:
security-alerts.yml: |
groups:
- name: security.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{job="kubernetes-pods",status="5xx"}[5m]) > 0.1
for: 5m
labels:
severity: "critical"
annotations:
summary: "High error rate detected"
description: "Error rate is above 10% for 5 minutes"
runbook_url: "https://runbooks.company.com/high-error-rate"
- alert: UnauthorizedAccess
expr: rate(http_requests_total{status="401"}[5m]) > 5
for: 5m
labels:
severity: "warning"
annotations:
summary: "Unauthorized access attempts"
description: "More than 5 unauthorized requests per minute"
runbook_url: "https://runbooks.company.com/unauthorized-access"
- alert: ContainerRestarts
expr: rate(kube_pod_container_status_restarts_total[15m]) > 3
for: 15m
labels:
severity: "warning"
annotations:
summary: "Container restarts detected"
description: "Container has restarted more than 3 times in 15 minutes"
runbook_url: "https://runbooks.company.com/container-restarts"
7. Production Deployment Checklist
# deployment/production-checklist.yaml
security:
container_security:
minimal_base_image: true
minimal_packages: true
security_scanning: true
vulnerability_scanning: true
sbom_generation: true
runtime_security:
non_root_user: true
readonly_filesystem: false
capabilities_dropped_all: true
seccomp_enabled: true
apparmor_enabled: true
selinux_enabled: true
network_security:
network_policies: true
service_mesh: true
mtls_enabled: true
tls_version: "1.3"
certificate_rotation: true
secrets_management:
encrypted_secrets: true
key_rotation: true
secret_versioning: true
access_control: true
rbac:
principle_of_least_privilege: true
service_accounts_limited: true
token_min_ttl: true
token_autorenewal: false
compliance:
cis_benchmark: true
pod_security_standards: true
nist_controls: true
pci_dss: true
hipaa: true
observability:
monitoring:
metrics_collection: true
custom_metrics: true
alerting_enabled: true
dashboard_creation: true
logging:
structured_logging: true
log_aggregation: true
log_retention: true
sensitive_data_redaction: true
tracing:
distributed_tracing: true
span_sampling: true
performance_tracking: true
error_tracking: true
auditing:
api_audit_logging: true
system_event_logging: true
security_event_logging: true
compliance_reporting: true
performance:
resource_management:
resource_limits: true
resource_quotas: true
hpa_enabled: true
cluster_autoscaling: true
optimization:
image_optimization: true
caching_enabled: true
compression: true
connection_pooling: true
batch_processing: true
disaster_recovery:
backup_strategy:
automated_backups: true
backup_encryption: true
cross_region_backup: true
restore_testing: true
high_availability:
multi_zone_deployment: true
health_checks: true
auto_healing: true
graceful_shutdown: true
rolling_updates: true
testing:
security_testing:
penetration_testing: true
vulnerability_scanning: true
compliance_testing: true
chaos_testing: true
performance_testing:
load_testing: true
stress_testing: true
scalability_testing: true
reliability_testing: true
This comprehensive Docker/Kubernetes skill provides security-first containerization and deployment patterns for 2025, including zero-trust architectures, service mesh integration, GitOps workflows, and production-ready security hardening that works across different cloud providers and meets modern compliance standards.