| name | infrastructure-cost-optimization |
| description | Optimize cloud infrastructure costs through resource rightsizing, reserved instances, spot instances, and waste reduction strategies. |
Infrastructure Cost Optimization
Overview
Reduce infrastructure costs through intelligent resource allocation, reserved instances, spot instances, and continuous optimization without sacrificing performance.
When to Use
- Cloud cost reduction
- Budget management and tracking
- Resource utilization optimization
- Multi-environment cost allocation
- Waste identification and elimination
- Reserved instance planning
- Spot instance integration
Implementation Examples
1. AWS Cost Optimization Configuration
# cost-optimization-setup.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-optimization-scripts
namespace: operations
data:
analyze-costs.sh: |
#!/bin/bash
set -euo pipefail
echo "=== AWS Cost Analysis ==="
# Get daily cost trend
echo "Daily costs for last 7 days:"
aws ce get-cost-and-usage \
--time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
--granularity DAILY \
--metrics "BlendedCost" \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' \
--output table
# Find unattached resources
echo -e "\n=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,CreateTime]' \
--output table
echo -e "\n=== Unattached Elastic IPs ==="
aws ec2 describe-addresses \
--filters Name=association-id,Values=none \
--query 'Addresses[*].[PublicIp,AllocationId]' \
--output table
echo -e "\n=== Unused RDS Instances ==="
aws rds describe-db-instances \
--query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,DBInstanceClass,Engine,AllocatedStorage]' \
--output table
# Estimate savings with Reserved Instances
echo -e "\n=== Reserved Instance Savings Potential ==="
aws ce get-reservation-purchase-recommendation \
--service "EC2" \
--lookback-period THIRTY_DAYS \
--query 'Recommendations[0].[RecommendationSummary.TotalEstimatedMonthlySavingsAmount,RecommendationSummary.TotalEstimatedMonthlySavingsPercentage]' \
--output table
optimize-resources.sh: |
#!/bin/bash
set -euo pipefail
echo "Starting resource optimization..."
# Remove unattached volumes
echo "Removing unattached volumes..."
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].VolumeId' \
--output text | \
while read volume_id; do
echo "Deleting volume: $volume_id"
aws ec2 delete-volume --volume-id "$volume_id" 2>/dev/null || true
done
# Release unused Elastic IPs
echo "Releasing unused Elastic IPs..."
aws ec2 describe-addresses \
--filters Name=association-id,Values=none \
--query 'Addresses[*].AllocationId' \
--output text | \
while read alloc_id; do
echo "Releasing EIP: $alloc_id"
aws ec2 release-address --allocation-id "$alloc_id" 2>/dev/null || true
done
# Modify RDS to smaller instances
echo "Analyzing RDS for downsizing..."
# Implement logic to check CloudWatch metrics and downsize if needed
echo "Optimization complete"
---
# Terraform cost optimization
resource "aws_ec2_instance" "spot" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
# Use spot instances for non-critical workloads
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.05" # Set max price
spot_instance_type = "persistent"
interrupt_behavior = "terminate"
valid_until = "2025-12-31T23:59:59Z"
}
}
tags = {
Name = "spot-instance"
CostCenter = "engineering"
}
}
# Reserved instance for baseline capacity
resource "aws_ec2_instance" "reserved" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
# Tag for reserved instance matching
tags = {
Name = "reserved-instance"
ReservationType = "reserved"
}
}
resource "aws_ec2_fleet" "mixed" {
name = "mixed-capacity"
launch_template_configs {
launch_template_specification {
launch_template_id = aws_launch_template.app.id
version = "$Latest"
}
overrides {
instance_type = "t3.medium"
weighted_capacity = "1"
priority = 1 # Reserved
}
overrides {
instance_type = "t3.large"
weighted_capacity = "2"
priority = 2 # Reserved
}
overrides {
instance_type = "t3a.medium"
weighted_capacity = "1"
priority = 3 # Spot
}
overrides {
instance_type = "t3a.large"
weighted_capacity = "2"
priority = 4 # Spot
}
}
target_capacity_specification {
total_target_capacity = 10
on_demand_target_capacity = 6
spot_target_capacity = 4
default_target_capacity_type = "on-demand"
}
fleet_type = "maintain"
}
2. Kubernetes Cost Optimization
# k8s-cost-optimization.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-optimization-policies
namespace: kube-system
data:
policies.yaml: |
# Resource quotas per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: "200Gi"
limits.cpu: "200"
limits.memory: "400Gi"
pods: "500"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high", "medium"]
---
# Pod Disruption Budget for cost-effective scaling
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: cost-optimized-pdb
namespace: production
spec:
minAvailable: 1
selector:
matchLabels:
tier: backend
---
# Prioritize spot instances with taints/tolerations
apiVersion: v1
kind: Node
metadata:
name: spot-node-1
spec:
taints:
- key: cloud.google.com/gke-preemptible
value: "true"
effect: NoSchedule
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cost-optimized-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
# Tolerate spot instances
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
# Prefer nodes with lower cost
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
3. Cost Monitoring Dashboard
# cost-monitoring.py
import boto3
import json
from datetime import datetime, timedelta
class CostOptimizer:
def __init__(self):
self.ce_client = boto3.client('ce')
self.ec2_client = boto3.client('ec2')
self.rds_client = boto3.client('rds')
def get_daily_costs(self, days=30):
"""Get daily costs for past N days"""
end_date = datetime.now().date()
start_date = end_date - timedelta(days=days)
response = self.ce_client.get_cost_and_usage(
TimePeriod={
'Start': str(start_date),
'End': str(end_date)
},
Granularity='DAILY',
Metrics=['BlendedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'SERVICE'}
]
)
return response
def find_underutilized_instances(self):
"""Find EC2 instances with low CPU usage"""
cloudwatch = boto3.client('cloudwatch')
instances = []
ec2_instances = self.ec2_client.describe_instances()
for reservation in ec2_instances['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
# Check CPU utilization
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.now() - timedelta(days=7),
EndTime=datetime.now(),
Period=3600,
Statistics=['Average']
)
if response['Datapoints']:
avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
if avg_cpu < 10: # Less than 10% average
instances.append({
'InstanceId': instance_id,
'Type': instance['InstanceType'],
'AverageCPU': avg_cpu,
'Recommendation': 'Downsize or terminate'
})
return instances
def estimate_reserved_instance_savings(self):
"""Estimate potential savings from reserved instances"""
response = self.ce_client.get_reservation_purchase_recommendation(
Service='EC2',
LookbackPeriod='THIRTY_DAYS',
PageSize=100
)
total_savings = 0
for recommendation in response.get('Recommendations', []):
summary = recommendation['RecommendationSummary']
savings = float(summary['EstimatedMonthlyMonthlySavingsAmount'])
total_savings += savings
return total_savings
def generate_report(self):
"""Generate comprehensive cost optimization report"""
print("=== Cost Optimization Report ===\n")
# Daily costs
print("Daily Costs:")
costs = self.get_daily_costs(7)
for result in costs['ResultsByTime']:
date = result['TimePeriod']['Start']
total = result['Total']['BlendedCost']['Amount']
print(f" {date}: ${total}")
# Underutilized instances
print("\nUnderutilized Instances:")
underutilized = self.find_underutilized_instances()
for instance in underutilized:
print(f" {instance['InstanceId']}: {instance['AverageCPU']:.1f}% CPU - {instance['Recommendation']}")
# Reserved instance savings
print("\nReserved Instance Savings Potential:")
savings = self.estimate_reserved_instance_savings()
print(f" Estimated Monthly Savings: ${savings:.2f}")
# Usage
if __name__ == '__main__':
optimizer = CostOptimizer()
optimizer.generate_report()
Cost Optimization Strategies
✅ DO
- Use reserved instances for baseline
- Leverage spot instances
- Right-size resources
- Monitor cost trends
- Implement auto-scaling
- Use multi-region pricing
- Tag resources consistently
- Schedule non-essential resources
❌ DON'T
- Over-provision resources
- Ignore unused resources
- Neglect cost monitoring
- Run all on-demand
- Forget to release EIPs
- Mix cost centers
- Ignore savings opportunities
- Deploy without budgets
Cost Saving Opportunities
- Reserved Instances: 40-70% savings
- Spot Instances: 70-90% savings
- Committed Use Discounts: 25-55% savings
- Right-sizing: 10-30% savings
- Resource cleanup: 5-20% savings