| name | autoscaling-configuration |
| description | Configure autoscaling for Kubernetes, VMs, and serverless workloads based on metrics, schedules, and custom indicators. |
Autoscaling Configuration
Overview
Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.
When to Use
- Traffic-driven workload scaling
- Time-based scheduled scaling
- Resource utilization optimization
- Cost reduction
- High-traffic event handling
- Batch processing optimization
- Database connection pooling
Implementation Examples
1. Kubernetes Horizontal Pod Autoscaler
# hpa-configuration.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 15
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
---
# Vertical Pod Autoscaler for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: myapp
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 1000m
memory: 512Mi
controlledResources:
- cpu
- memory
2. AWS Auto Scaling
# aws-autoscaling.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: autoscaling-config
namespace: production
data:
setup-asg.sh: |
#!/bin/bash
set -euo pipefail
ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80
echo "Creating Auto Scaling Group..."
# Create launch template
aws ec2 create-launch-template \
--launch-template-name myapp-template \
--version-description "Production version" \
--launch-template-data '{
"ImageId": "ami-0c55b159cbfafe1f0",
"InstanceType": "t3.medium",
"KeyName": "myapp-key",
"SecurityGroupIds": ["sg-0123456789abcdef0"],
"UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [{"Key": "Name", "Value": "myapp-instance"}]
}]
}' || true
# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name "$ASG_NAME" \
--launch-template LaunchTemplateName=myapp-template \
--min-size $MIN_SIZE \
--max-size $MAX_SIZE \
--desired-capacity $DESIRED_CAPACITY \
--availability-zones us-east-1a us-east-1b us-east-1c \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
--health-check-type ELB \
--health-check-grace-period 300 \
--tags "Key=Name,Value=myapp,PropagateAtLaunch=true"
# Create CPU scaling policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name "$ASG_NAME" \
--policy-name myapp-cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": '$TARGET_CPU',
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
echo "Auto Scaling Group created: $ASG_NAME"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: scheduled-autoscaling
namespace: production
spec:
# Scale up at 8 AM
- schedule: "0 8 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: autoscale
image: amazon/aws-cli:latest
command:
- sh
- -c
- |
aws autoscaling set-desired-capacity \
--auto-scaling-group-name myapp-asg \
--desired-capacity 10
restartPolicy: OnFailure
# Scale down at 6 PM
- schedule: "0 18 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: autoscale
image: amazon/aws-cli:latest
command:
- sh
- -c
- |
aws autoscaling set-desired-capacity \
--auto-scaling-group-name myapp-asg \
--desired-capacity 3
restartPolicy: OnFailure
3. Custom Metrics Autoscaling
# custom-metrics-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metrics-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 50
metrics:
# Queue depth from custom metrics
- type: Pods
pods:
metric:
name: job_queue_depth
target:
type: AverageValue
averageValue: "100"
# Request rate from custom metrics
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
# Custom business metric
- type: Pods
pods:
metric:
name: active_connections
target:
type: AverageValue
averageValue: "500"
---
# Prometheus ServiceMonitor for custom metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-metrics
namespace: production
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
path: /metrics
4. Autoscaling Script
#!/bin/bash
# autoscaling-setup.sh - Complete autoscaling configuration
set -euo pipefail
ENVIRONMENT="${1:-production}"
DEPLOYMENT="${2:-myapp}"
echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"
# Create HPA
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ${DEPLOYMENT}-hpa
namespace: ${ENVIRONMENT}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ${DEPLOYMENT}
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
EOF
echo "HPA created successfully"
# Monitor autoscaling
echo "Monitoring autoscaling events..."
kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w
5. Monitoring Autoscaling
# autoscaling-monitoring.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: autoscaling-alerts
namespace: monitoring
data:
alerts.yaml: |
groups:
- name: autoscaling
rules:
- alert: HpaMaxedOut
expr: |
kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
and
kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas
for: 10m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.hpa }} is at maximum replicas"
- alert: HpaMinedOut
expr: |
kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
and
kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
for: 30m
labels:
severity: info
annotations:
summary: "HPA {{ $labels.hpa }} is at minimum replicas"
- alert: AsgCapacityLow
expr: |
aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
for: 10m
labels:
severity: warning
annotations:
summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"
Best Practices
✅ DO
- Set appropriate min/max replicas
- Monitor metric aggregation window
- Implement cooldown periods
- Use multiple metrics
- Test scaling behavior
- Monitor scaling events
- Plan for peak loads
- Implement fallback strategies
❌ DON'T
- Set min replicas to 1
- Scale too aggressively
- Ignore cooldown periods
- Use single metric only
- Forget to test scaling
- Scale below resource needs
- Neglect monitoring
- Deploy without capacity tests
Scaling Metrics
- CPU Utilization: Most common metric
- Memory Utilization: Heap-bound applications
- Request Rate: API-driven scaling
- Queue Depth: Async job processing
- Custom Metrics: Business-specific indicators