| name | devops-cloud |
| description | DevOps, cloud infrastructure, and platform engineering. Use when working with AWS, GCP, Azure, Kubernetes, Terraform, CI/CD pipelines, or infrastructure as code. |
DevOps & Cloud Infrastructure
Comprehensive guide for cloud platforms, infrastructure as code, and DevOps practices.
Cloud Platforms
AWS (Amazon Web Services)
Compute:
# EC2 Instance Types
General Purpose: t3, m6i, m7g (ARM)
Compute Optimized: c6i, c7g
Memory Optimized: r6i, x2idn
Storage Optimized: i3, d3
# Auto Scaling
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name my-asg \
--launch-template LaunchTemplateId=lt-xxx \
--min-size 1 --max-size 10 --desired-capacity 2 \
--vpc-zone-identifier "subnet-xxx,subnet-yyy"
Serverless:
// Lambda with TypeScript
import { APIGatewayProxyHandler } from 'aws-lambda';
export const handler: APIGatewayProxyHandler = async (event) => {
return {
statusCode: 200,
body: JSON.stringify({ message: 'Success' }),
};
};
Storage:
| Service | Use Case | Durability |
|---|---|---|
| S3 | Object storage | 99.999999999% |
| EBS | Block storage (EC2) | 99.999% |
| EFS | Shared file system | 99.999999999% |
| FSx | Windows/Lustre | 99.999999999% |
GCP (Google Cloud Platform)
Key Services:
# GKE cluster
gcloud container clusters create my-cluster \
--zone us-central1-a \
--num-nodes 3 \
--machine-type e2-medium \
--enable-autoscaling --min-nodes 1 --max-nodes 10
# Cloud Run (serverless containers)
gcloud run deploy my-service \
--image gcr.io/project/image:tag \
--platform managed \
--region us-central1 \
--allow-unauthenticated
Azure
Key Services:
# AKS cluster
az aks create \
--resource-group myRG \
--name myAKS \
--node-count 3 \
--enable-addons monitoring \
--generate-ssh-keys
# Azure Functions
func init MyFunctionApp --typescript
func new --name HttpTrigger --template "HTTP trigger"
Kubernetes
Core Concepts
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:1.0.0
ports:
- containerPort: 8080
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Service Types
# ClusterIP (internal)
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: ClusterIP
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
---
# LoadBalancer (external)
apiVersion: v1
kind: Service
metadata:
name: my-service-lb
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
Ingress with TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- app.example.com
secretName: app-tls
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80
Helm Charts
# values.yaml
replicaCount: 3
image:
repository: my-app
tag: "1.0.0"
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
ingress:
enabled: true
hosts:
- host: app.example.com
paths: ["/"]
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 250m
memory: 128Mi
Terraform
AWS Infrastructure
# main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = var.aws_region
}
# VPC
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
name = "my-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
}
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "19.0.0"
cluster_name = "my-cluster"
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
default = {
min_size = 1
max_size = 10
desired_size = 3
instance_types = ["t3.medium"]
}
}
}
Variables and Outputs
# variables.tf
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
# outputs.tf
output "cluster_endpoint" {
description = "EKS cluster endpoint"
value = module.eks.cluster_endpoint
}
output "cluster_name" {
description = "EKS cluster name"
value = module.eks.cluster_name
}
CI/CD Pipelines
GitHub Actions
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test
- run: npm run build
build-and-push:
needs: test
runs-on: ubuntu-latest
if: github.event_name == 'push'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
deploy:
needs: build-and-push
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Deploy to Kubernetes
uses: azure/k8s-deploy@v4
with:
manifests: k8s/
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
GitLab CI
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
test:
stage: test
image: node:20
script:
- npm ci
- npm test
- npm run build
cache:
paths:
- node_modules/
build:
stage: build
image: docker:24
services:
- docker:24-dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $DOCKER_IMAGE .
- docker push $DOCKER_IMAGE
only:
- main
deploy:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl set image deployment/my-app my-app=$DOCKER_IMAGE
only:
- main
environment:
name: production
Docker
Multi-Stage Builds
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine AS production
WORKDIR /app
RUN addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
USER nextjs
EXPOSE 3000
CMD ["node", "dist/main.js"]
Docker Compose
# docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgres://user:pass@db:5432/mydb
- REDIS_URL=redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: mydb
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
Monitoring & Observability
Prometheus + Grafana
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
Application Metrics
// metrics.ts
import { Counter, Histogram, Registry } from 'prom-client';
export const register = new Registry();
export const httpRequestsTotal = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'path', 'status'],
registers: [register],
});
export const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'path'],
buckets: [0.1, 0.5, 1, 2, 5],
registers: [register],
});
Security Best Practices
Secrets Management
# AWS Secrets Manager
aws secretsmanager create-secret \
--name prod/db-credentials \
--secret-string '{"username":"admin","password":"xxx"}'
# Kubernetes Secrets (external-secrets)
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
secretStoreRef:
kind: ClusterSecretStore
name: aws-secrets
target:
name: db-credentials
data:
- secretKey: username
remoteRef:
key: prod/db-credentials
property: username
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-network-policy
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
Checklist
Pre-Deployment
- Infrastructure as Code reviewed
- Secrets in secrets manager (not env files)
- Resource limits set
- Health checks configured
- Logging and monitoring enabled
- Network policies defined
- Backup strategy in place
Production Readiness
- Multi-AZ deployment
- Auto-scaling configured
- SSL/TLS enabled
- WAF rules configured
- Disaster recovery tested
- Runbooks documented