Cloud Infrastructure Skill
Quick Reference
| Platform |
Market |
Best For |
Learning |
| AWS |
32% |
Everything |
3-6 mo |
| Azure |
24% |
Microsoft stack |
3-6 mo |
| GCP |
11% |
Data, ML |
3-6 mo |
| Cloudflare |
Edge |
CDN, Workers |
2-4 wk |
Learning Paths
AWS
[1] IAM + VPC (1-2 wk)
│ └─ Roles, policies, networking
│
▼
[2] Compute: EC2, Lambda (2-3 wk)
│
▼
[3] Storage: S3, EBS (1-2 wk)
│
▼
[4] Database: RDS, DynamoDB (2-3 wk)
│
▼
[5] Containers: ECS, EKS (3-4 wk)
│
▼
[6] Monitoring: CloudWatch (1-2 wk)
Docker & Containers
[1] Docker Basics (1 wk)
│ └─ Images, containers, Dockerfile
│
▼
[2] Multi-stage Builds (1 wk)
│ └─ Optimization, layer caching
│
▼
[3] Docker Compose (1 wk)
│ └─ Multi-container apps
│
▼
[4] Registry & Security (1 wk)
└─ Push/pull, scanning, non-root
Kubernetes
[1] Pods & Deployments (2 wk)
│
▼
[2] Services & Networking (1-2 wk)
│
▼
[3] ConfigMaps & Secrets (1 wk)
│
▼
[4] Helm Charts (2 wk)
│
▼
[5] Production Patterns (ongoing)
└─ HPA, PDB, resource limits
Terraform (IaC)
[1] Resources & State (1 wk)
│
▼
[2] Variables & Outputs (1 wk)
│
▼
[3] Modules (1-2 wk)
│
▼
[4] Remote State (1 wk)
│
▼
[5] Workspaces & Environments (1 wk)
Kubernetes Quick Reference
| Resource |
Purpose |
Example |
| Pod |
Smallest unit |
Single container |
| Deployment |
Manage replicas |
Web app |
| Service |
Network access |
ClusterIP, LoadBalancer |
| Ingress |
HTTP routing |
Path-based routing |
| ConfigMap |
Configuration |
Environment variables |
| Secret |
Sensitive data |
Credentials |
| StatefulSet |
Stateful apps |
Databases |
Terraform Structure
project/
├── main.tf # Resources
├── variables.tf # Inputs
├── outputs.tf # Outputs
├── providers.tf # Provider config
├── versions.tf # Version constraints
├── modules/
│ ├── vpc/
│ ├── eks/
│ └── rds/
└── environments/
├── dev.tfvars
├── staging.tfvars
└── prod.tfvars
CI/CD Pipeline Template
# GitHub Actions
name: CI/CD
on:
push:
branches: [main]
jobs:
build-test-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: docker build -t app .
- name: Test
run: docker run app pytest
- name: Push
run: docker push registry/app:${{ github.sha }}
- name: Deploy
if: github.ref == 'refs/heads/main'
run: kubectl set image deployment/app app=registry/app:${{ github.sha }}
Monitoring Stack
┌─────────────────────────────────────────┐
│ OBSERVABILITY STACK │
├─────────────────────────────────────────┤
│ Metrics: Prometheus → Grafana │
│ Logs: Loki / ELK │
│ Traces: Jaeger / Tempo │
│ Alerts: Alertmanager → PagerDuty │
└─────────────────────────────────────────┘
Troubleshooting
Container not starting?
├─► docker logs <container>
├─► Check port conflicts
├─► Check image name/tag
└─► Check resource limits
Pod in CrashLoopBackOff?
├─► kubectl describe pod <name>
├─► kubectl logs <pod>
├─► Check resource limits
├─► Check probes configuration
└─► Check image pull secrets
Terraform apply fails?
├─► terraform plan first
├─► Check state lock
├─► terraform import existing
└─► Restore state from backup
High cloud bill?
├─► Enable cost alerts
├─► Right-size instances
├─► Use spot instances
├─► Delete unused resources
└─► Storage lifecycle policies
Common Failure Modes
| Symptom |
Root Cause |
Recovery |
| Pod CrashLoopBackOff |
App error or OOM |
Check logs, increase limits |
| ImagePullBackOff |
Wrong image or auth |
Verify image, check secrets |
| Terraform drift |
Manual changes |
Import or terraform apply |
| Slow deploys |
Large images |
Multi-stage builds, layer caching |
Best Practices
Docker
- Use multi-stage builds
- Run as non-root user
- Use .dockerignore
- Pin base image versions
- Scan for vulnerabilities
Kubernetes
- Set resource requests/limits
- Use readiness/liveness probes
- Store config in ConfigMaps
- Use namespaces for isolation
- Enable network policies
Terraform
- Use remote state (S3, GCS)
- Lock state file
- Use modules for reuse
- Plan before apply
- Tag all resources
Next Actions
Specify your cloud platform and focus area for detailed guidance.