| name | terraform-best-practices |
| description | Terraform infrastructure-as-code best practices for scalable and maintainable cloud infrastructure. Use when writing Terraform modules, managing infrastructure state, or implementing infrastructure automation at scale. |
Terraform Best Practices
Expert guidance for building production-grade Terraform infrastructure with enterprise patterns for module design, state management, security, testing, and multi-environment deployments.
When to Use This Skill
- Writing reusable Terraform modules for teams or organizations
- Setting up secure remote state management and backend configuration
- Designing multi-environment infrastructure (dev/staging/prod)
- Implementing infrastructure CI/CD pipelines with automated validation
- Managing infrastructure at scale across multiple teams or projects
- Migrating from manual infrastructure to infrastructure-as-code
- Refactoring existing Terraform for better maintainability
- Implementing security best practices for infrastructure code
Module Design Patterns
1. Module Structure
Standard module layout:
terraform-aws-vpc/
├── main.tf # Primary resource definitions
├── variables.tf # Input variables
├── outputs.tf # Output values
├── versions.tf # Provider and Terraform version constraints
├── README.md # Documentation with examples
├── examples/
│ ├── simple/ # Minimal example
│ └── complete/ # Full-featured example
└── tests/ # Terratest or validation tests
2. Composition over Monoliths
Child modules for reusability:
# Root module orchestrates child modules
module "vpc" {
source = "./modules/vpc"
cidr_block = var.vpc_cidr
environment = var.environment
}
module "eks_cluster" {
source = "./modules/eks"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
cluster_name = "${var.environment}-cluster"
}
# Benefits: Testable, reusable, maintainable
3. Variable Design
Type constraints and validation:
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_config" {
description = "Instance configuration"
type = object({
instance_type = string
count = number
tags = map(string)
})
default = {
instance_type = "t3.medium"
count = 2
tags = {}
}
}
# Use complex types for structured configuration
4. Output Organization
Well-structured outputs:
output "vpc_id" {
description = "VPC identifier"
value = aws_vpc.main.id
}
output "private_subnet_ids" {
description = "Private subnet identifiers for workload placement"
value = aws_subnet.private[*].id
}
output "connection_info" {
description = "Database connection information"
value = {
endpoint = aws_db_instance.main.endpoint
port = aws_db_instance.main.port
}
sensitive = true # Mark sensitive outputs
}
5. Dynamic Blocks for Flexibility
resource "aws_security_group" "main" {
name = "${var.environment}-sg"
vpc_id = var.vpc_id
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.from_port
to_port = ingress.value.to_port
protocol = ingress.value.protocol
cidr_blocks = ingress.value.cidr_blocks
}
}
}
# Enables flexible configuration without code duplication
State Management Best Practices
1. Remote Backend Configuration
S3 with DynamoDB locking:
# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "projects/myapp/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Enable versioning on S3 bucket for state history
# Enable encryption at rest with KMS
}
}
# Required AWS resources (separate bootstrap):
# - S3 bucket with versioning enabled
# - S3 bucket encryption with KMS
# - DynamoDB table with LockID primary key
# - IAM policies for terraform execution role
Terraform Cloud backend:
terraform {
cloud {
organization = "company-name"
workspaces {
name = "myapp-production"
# OR tags = ["myapp", "production"] for dynamic workspaces
}
}
}
# Benefits: Built-in state locking, versioning, collaboration
# Remote execution, policy as code, cost estimation
2. State File Security
# Never commit state files to version control
# .gitignore
*.tfstate
*.tfstate.*
.terraform/
.terraform.lock.hcl
# Encrypt state at rest (S3 KMS encryption)
resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform.id
}
}
}
# Restrict state bucket access with strict IAM policies
# Enable MFA delete for production state buckets
3. State Operations
# Import existing resources
terraform import aws_instance.example i-1234567890abcdef0
# Move resources between modules
terraform state mv aws_instance.old aws_instance.new
# Remove resources from state (doesn't destroy)
terraform state rm aws_instance.example
# Refresh state from actual infrastructure
terraform refresh
# List all resources in state
terraform state list
# Show specific resource details
terraform state show aws_instance.example
4. Workspace Strategies
When to use workspaces:
# Same code, different state (dev/staging/prod)
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
# Access workspace name in code
resource "aws_instance" "example" {
tags = {
Environment = terraform.workspace
}
}
# Limitations:
# - All workspaces share same backend configuration
# - Cannot have different provider settings per workspace
# - Better for similar environments, not vastly different ones
Directory-based environments (preferred for production):
project/
├── modules/ # Shared modules
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ ├── backend.tf
│ └── terraform.tfvars
# Benefits: Complete isolation, different backends,
# environment-specific configurations
Workspace & Environment Management
1. Variable Precedence
# Terraform variable precedence (highest to lowest):
# 1. -var or -var-file CLI flags
# 2. *.auto.tfvars or *.auto.tfvars.json (alphabetical)
# 3. terraform.tfvars or terraform.tfvars.json
# 4. Environment variables (TF_VAR_name)
# Example usage:
terraform plan -var="environment=prod" -var-file="prod.tfvars"
# Environment variables
export TF_VAR_region="us-west-2"
export TF_VAR_instance_count=5
2. Environment Configuration
Separate tfvars per environment:
# environments/dev/terraform.tfvars
environment = "dev"
instance_type = "t3.small"
instance_count = 1
enable_monitoring = false
# environments/prod/terraform.tfvars
environment = "prod"
instance_type = "m5.large"
instance_count = 3
enable_monitoring = true
enable_backups = true
3. Terragrunt for DRY Configuration
# terragrunt.hcl (root)
remote_state {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# environments/prod/vpc/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../../modules/vpc"
}
inputs = {
environment = "prod"
cidr_block = "10.0.0.0/16"
}
# Benefits: DRY backend config, dependency management,
# automatic remote state handling
Security Best Practices
1. Sensitive Variable Management
variable "database_password" {
description = "Database master password"
type = string
sensitive = true # Prevents output in logs
}
# Use external secret management
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "prod/db/password"
}
resource "aws_db_instance" "main" {
password = data.aws_secretsmanager_secret_version.db_password.secret_string
# Never hardcode secrets in code
# Use AWS Secrets Manager, HashiCorp Vault, etc.
}
2. State Encryption
# Enable encryption in backend configuration
terraform {
backend "s3" {
encrypt = true # Client-side encryption
kms_key_id = "arn:aws:kms:region:account:key/id"
}
}
# S3 bucket encryption at rest
resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform.arn
}
bucket_key_enabled = true
}
}
3. IAM and Access Control
# Principle of least privilege for Terraform execution
data "aws_iam_policy_document" "terraform_execution" {
statement {
actions = [
"ec2:*",
"s3:*",
"rds:*"
]
resources = ["*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = ["us-east-1", "us-west-2"]
}
}
}
# Separate IAM roles for different environments
# terraform-dev, terraform-staging, terraform-prod
4. Security Scanning
# tfsec - Static analysis security scanner
tfsec .
# Checkov - Policy-as-code scanner
checkov -d .
# Terrascan - Compliance and security scanner
terrascan scan
# Integrate in CI/CD pipeline
# Fail builds on critical security issues
5. Resource Tagging
locals {
common_tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = var.project_name
Owner = var.team_email
CostCenter = var.cost_center
}
}
resource "aws_instance" "example" {
ami = var.ami_id
instance_type = var.instance_type
tags = merge(
local.common_tags,
{
Name = "${var.environment}-web-server"
Role = "web"
}
)
}
# Enables cost tracking, ownership, compliance
Testing & Validation
1. Pre-Commit Validation
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_docs
- id: terraform_tflint
- id: terraform_tfsec
# Ensures code quality before commits
2. Terraform Validate & Plan
# Always validate before planning
terraform init
terraform validate
# Review plan output thoroughly
terraform plan -out=tfplan
# Save and review plans before applying
terraform show tfplan
# Apply only after approval
terraform apply tfplan
3. Automated Testing with Terratest
// test/vpc_test.go
func TestVPCCreation(t *testing.T) {
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../examples/simple",
Vars: map[string]interface{}{
"environment": "test",
"cidr_block": "10.0.0.0/16",
},
})
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcID)
}
4. Policy as Code
# policy/deny_public_s3_buckets.rego
package terraform.s3
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg := sprintf("S3 bucket '%s' has public ACL", [resource.name])
}
# Use Open Policy Agent (OPA) or Sentinel
# Enforce policies in CI/CD pipeline
Best Practices Summary
Code Organization
- Modular design: Break code into reusable modules
- Consistent structure: Follow standard file layouts
- Clear naming: Use descriptive resource and variable names
- DRY principles: Avoid duplication with modules and locals
State Management
- Remote backends: Always use remote state for teams
- State encryption: Enable encryption at rest and in transit
- State locking: Prevent concurrent modifications
- Backup strategy: Enable versioning on state storage
Security
- Sensitive data: Use secret management, never hardcode
- IAM policies: Principle of least privilege
- Security scanning: Integrate tools in CI/CD
- Resource tagging: Enable tracking and compliance
Quality & Testing
- Validation: Run terraform validate in CI/CD
- Static analysis: Use tfsec, checkov, terrascan
- Automated tests: Write Terratest for critical modules
- Code review: Peer review all infrastructure changes
Deployment
- Plan before apply: Always review execution plans
- Incremental changes: Small, frequent updates over large batches
- Rollback strategy: Maintain previous state versions
- Change tracking: Git history for all infrastructure code
Documentation
- README files: Document module usage with examples
- Variable descriptions: Clear, comprehensive descriptions
- Output documentation: Explain output values and usage
- Architecture diagrams: Visual representation of infrastructure
Version Management
- Provider constraints: Pin major versions, allow minor updates
- Module versions: Use semantic versioning for modules
- Terraform version: Specify minimum required version
- Dependency locking: Commit .terraform.lock.hcl
Performance
- Resource parallelism: Use -parallelism flag for large infrastructures
- Targeted operations: Use -target for specific resources when needed
- State optimization: Keep state size manageable, split large projects
- Provider caching: Use plugin cache directory
Resources
- Official Docs: https://developer.hashicorp.com/terraform/docs
- Style Guide: https://developer.hashicorp.com/terraform/language/syntax/style
- Module Registry: https://registry.terraform.io/
- Terragrunt: https://terragrunt.gruntwork.io/
- Terratest: https://terratest.gruntwork.io/
- tfsec: https://aquasecurity.github.io/tfsec/
- Checkov: https://www.checkov.io/
- Best Practices: https://www.terraform-best-practices.com/
- AWS Provider: https://registry.terraform.io/providers/hashicorp/aws/latest/docs