Terraform Best Practices 2025
Terraform is easy to start with but challenging to use well at scale. These practices separate teams that struggle with infrastructure drift and broken pipelines from teams that deploy confidently every day.
1. Use Remote State — Always
Local state (terraform.tfstate in your working directory) is fine for personal experiments. It has no place in team environments:
# Good: S3 backend with lockingterraform { backend "s3" { bucket = "mycompany-terraform-state" key = "services/auth-service/production/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true }}Remote state enables:
- Team members to share and collaborate on the same state
- State locking (prevents concurrent applies)
- State versioning (recover from mistakes)
- Separation of state by environment and service
2. Never Store Secrets in .tf Files
# Wrong — password visible in git history foreverresource "aws_db_instance" "main" { password = "mypassword123" # Never do this!}
# Right — use environment variablesresource "aws_db_instance" "main" { password = var.db_password # Supplied via TF_VAR_db_password env var}
# Right — use AWS Secrets Manager or Vaultdata "aws_secretsmanager_secret_version" "db_password" { secret_id = "production/rds/master-password"}
resource "aws_db_instance" "main" { password = data.aws_secretsmanager_secret_version.db_password.secret_string}Mark sensitive variables with sensitive = true to prevent them from appearing in plan/apply output.
3. Lock Provider Versions
terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.50" # Allows patch updates, not major versions } } required_version = ">= 1.6, < 2.0"}Always commit .terraform.lock.hcl. Without version constraints, a provider update can silently break your configuration — an aws v6.0 release could introduce breaking changes overnight.
4. Use Modules for Everything Reused More Than Once
# Instead of copy-pasting VPC config across 3 environments:module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 5.0"
name = "${var.environment}-vpc" cidr = var.vpc_cidr azs = data.aws_availability_zones.available.names private_subnets = var.private_subnet_cidrs public_subnets = var.public_subnet_cidrs enable_nat_gateway = true}Write modules for: VPC networking, ECS services, RDS clusters, Kubernetes namespaces, monitoring stacks. The module abstraction dramatically reduces duplication and ensures consistent configuration.
5. Separate State by Environment and Service
state-bucket/├── services/│ ├── auth-service/│ │ ├── dev/terraform.tfstate│ │ ├── staging/terraform.tfstate│ │ └── production/terraform.tfstate│ ├── payment-service/│ │ └── ...└── shared/ ├── networking/terraform.tfstate # VPCs, DNS └── monitoring/terraform.tfstate # CloudWatch, GrafanaNever share a state file between multiple environments. A terraform destroy in dev should have zero risk of affecting production.
6. Use for_each Over count for Named Resources
# Count — creates indexed resources (fragile: deleting middle index destroys everything after)resource "aws_iam_user" "devs" { count = 3 name = "dev-user-${count.index}" # Bad: dev-user-0, dev-user-1, dev-user-2}
# for_each — creates named resources (stable: removing one user removes only that user)resource "aws_iam_user" "devs" { for_each = toset(["alice", "bob", "carol"]) name = each.key}With count, removing “bob” from the middle shifts all subsequent indices and Terraform destroys + recreates wrong resources. for_each with named keys is always safer for production resources.
7. Always Review terraform plan Before Applying
# Required workflow — never skip the plan reviewterraform plan -out=tfplan# Review output — look for unexpected destroys, replacements, or changesterraform apply tfplanA -/+ (destroy and replace) on a database is catastrophic. A plan review catches it before it happens. In CI/CD, post the plan output as a PR comment so the whole team reviews it.
8. Tag Every Resource
locals { mandatory_tags = { Environment = var.environment Service = var.service_name Team = var.team_name ManagedBy = "terraform" CostCenter = var.cost_center }}
resource "aws_instance" "app" { # ... tags = merge(local.mandatory_tags, { Name = "${var.service_name}-app-server" })}Tags enable cost allocation, security compliance, and operational visibility. Enforce them via a module that merges mandatory tags automatically.
9. Use Data Sources for External References
# Don't hardcode AMI IDs — they change across regions and become stale# Wrong:resource "aws_instance" "app" { ami = "ami-0abcdef1234567890" # Region-specific, will be outdated}
# Right — data source fetches the current latest AMIdata "aws_ami" "amazon_linux_2023" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["al2023-ami-*-x86_64"] }}
resource "aws_instance" "app" { ami = data.aws_ami.amazon_linux_2023.id}10. Protect Stateful Resources with Lifecycle Rules
resource "aws_db_instance" "production" { # ... lifecycle { prevent_destroy = true # terraform destroy will error ignore_changes = [password] # Password rotated externally create_before_destroy = false # Don't try to create a new DB before destroying }}11. Structure Terraform with CI/CD from Day One
# PR: plan, validate, cost estimate# Merge to main: apply automatically
name: Terraformon: pull_request: → terraform plan (post to PR) push to main: → terraform apply (saved plan from PR)Manual applies to production should require explicit approval gates, not just “someone SSHed into a box and ran it.”
12. Regularly Run terraform plan on Unchanged Infrastructure
Schedule a weekly terraform plan to detect drift — resources changed outside Terraform by humans or other automation. Drift compounds over time and makes the next planned change unpredictable.
# Cron job or scheduled pipelineterraform plan -detailed-exitcode# Exit code 0: no changes# Exit code 1: error# Exit code 2: changes detected (send alert)