Terraform Best Practices 2025

Terraform is easy to start with but challenging to use well at scale. These practices separate teams that struggle with infrastructure drift and broken pipelines from teams that deploy confidently every day.

1. Use Remote State — Always

Local state (terraform.tfstate in your working directory) is fine for personal experiments. It has no place in team environments:

# Good: S3 backend with locking
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "services/auth-service/production/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Remote state enables:

Team members to share and collaborate on the same state
State locking (prevents concurrent applies)
State versioning (recover from mistakes)
Separation of state by environment and service

2. Never Store Secrets in .tf Files

# Wrong — password visible in git history forever
resource "aws_db_instance" "main" {
  password = "mypassword123"   # Never do this!
}

# Right — use environment variables
resource "aws_db_instance" "main" {
  password = var.db_password   # Supplied via TF_VAR_db_password env var
}

# Right — use AWS Secrets Manager or Vault
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "production/rds/master-password"
}

resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}

Mark sensitive variables with sensitive = true to prevent them from appearing in plan/apply output.

3. Lock Provider Versions

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.50"    # Allows patch updates, not major versions
    }
  }
  required_version = ">= 1.6, < 2.0"
}

Always commit .terraform.lock.hcl. Without version constraints, a provider update can silently break your configuration — an aws v6.0 release could introduce breaking changes overnight.

4. Use Modules for Everything Reused More Than Once

# Instead of copy-pasting VPC config across 3 environments:
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name            = "${var.environment}-vpc"
  cidr            = var.vpc_cidr
  azs             = data.aws_availability_zones.available.names
  private_subnets = var.private_subnet_cidrs
  public_subnets  = var.public_subnet_cidrs
  enable_nat_gateway = true
}

Write modules for: VPC networking, ECS services, RDS clusters, Kubernetes namespaces, monitoring stacks. The module abstraction dramatically reduces duplication and ensures consistent configuration.

5. Separate State by Environment and Service

state-bucket/
├── services/
│   ├── auth-service/
│   │   ├── dev/terraform.tfstate
│   │   ├── staging/terraform.tfstate
│   │   └── production/terraform.tfstate
│   ├── payment-service/
│   │   └── ...
└── shared/
    ├── networking/terraform.tfstate   # VPCs, DNS
    └── monitoring/terraform.tfstate   # CloudWatch, Grafana

Never share a state file between multiple environments. A terraform destroy in dev should have zero risk of affecting production.

6. Use for_each Over count for Named Resources

# Count — creates indexed resources (fragile: deleting middle index destroys everything after)
resource "aws_iam_user" "devs" {
  count = 3
  name  = "dev-user-${count.index}"   # Bad: dev-user-0, dev-user-1, dev-user-2
}

# for_each — creates named resources (stable: removing one user removes only that user)
resource "aws_iam_user" "devs" {
  for_each = toset(["alice", "bob", "carol"])
  name     = each.key
}

With count, removing “bob” from the middle shifts all subsequent indices and Terraform destroys + recreates wrong resources. for_each with named keys is always safer for production resources.

7. Always Review terraform plan Before Applying

# Required workflow — never skip the plan review
terraform plan -out=tfplan
# Review output — look for unexpected destroys, replacements, or changes
terraform apply tfplan

A -/+ (destroy and replace) on a database is catastrophic. A plan review catches it before it happens. In CI/CD, post the plan output as a PR comment so the whole team reviews it.

8. Tag Every Resource

locals {
  mandatory_tags = {
    Environment = var.environment
    Service     = var.service_name
    Team        = var.team_name
    ManagedBy   = "terraform"
    CostCenter  = var.cost_center
  }
}

resource "aws_instance" "app" {
  # ...
  tags = merge(local.mandatory_tags, {
    Name = "${var.service_name}-app-server"
  })
}

Tags enable cost allocation, security compliance, and operational visibility. Enforce them via a module that merges mandatory tags automatically.

9. Use Data Sources for External References

# Don't hardcode AMI IDs — they change across regions and become stale
# Wrong:
resource "aws_instance" "app" {
  ami = "ami-0abcdef1234567890"  # Region-specific, will be outdated
}

# Right — data source fetches the current latest AMI
data "aws_ami" "amazon_linux_2023" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

resource "aws_instance" "app" {
  ami = data.aws_ami.amazon_linux_2023.id
}

10. Protect Stateful Resources with Lifecycle Rules

resource "aws_db_instance" "production" {
  # ...
  lifecycle {
    prevent_destroy       = true     # terraform destroy will error
    ignore_changes        = [password]  # Password rotated externally
    create_before_destroy = false    # Don't try to create a new DB before destroying
  }
}

11. Structure Terraform with CI/CD from Day One

# PR: plan, validate, cost estimate
# Merge to main: apply automatically

name: Terraform
on:
  pull_request:  → terraform plan (post to PR)
  push to main:  → terraform apply (saved plan from PR)

Manual applies to production should require explicit approval gates, not just “someone SSHed into a box and ran it.”

12. Regularly Run terraform plan on Unchanged Infrastructure

Schedule a weekly terraform plan to detect drift — resources changed outside Terraform by humans or other automation. Drift compounds over time and makes the next planned change unpredictable.

# Cron job or scheduled pipeline
terraform plan -detailed-exitcode
# Exit code 0: no changes
# Exit code 1: error
# Exit code 2: changes detected (send alert)