Terraform Best Practices (Part 1) - Project Structure, State, and Modules
Terraform is deceptively simple. Write some HCL, run terraform apply, infrastructure appears. But that simplicity hides complexity that only emerges at scale - when you have multiple environments, dozens of team members, and hundreds of resources.
This two-part series covers Terraform best practices learned from managing infrastructure across startups and enterprises. Part 1 focuses on foundations: project structure, state management, and module design. Part 2 covers advanced topics: testing, CI/CD, security, and team workflows.
TL;DR
- Use a consistent directory structure that scales with your team
- Remote state with locking is non-negotiable for teams
- Design modules for reusability, not just organisation
- Use workspaces sparingly - prefer directory separation for environments
- Lock provider versions and use dependency lock files
Code Repository: All code from this post is available at github.com/moabukar/blog-code/terraform-best-practices-part-1
Project Structure
How you organise Terraform files matters more as your infrastructure grows. There’s no single “correct” structure, but some patterns work better than others.
Small Projects: Flat Structure
For small projects or learning, a flat structure works:
terraform/
├── main.tf # Resources
├── variables.tf # Input variables
├── outputs.tf # Output values
├── providers.tf # Provider configuration
├── terraform.tfvars # Variable values
└── versions.tf # Terraform and provider versions
When to use: Personal projects, small teams, single environment.
Limitations: Doesn’t scale. Everything in one state file becomes slow and risky.
Medium Projects: Environment Directories
Separate directories per environment:
terraform/
├── modules/
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars
│ └── backend.tf
└── README.md
Why this works:
- Each environment has its own state file
- Shared modules reduce duplication
- Clear separation of concerns
- Easy to understand what changes affect which environment
How it works:
# environments/prod/main.tf
module "networking" {
source = "../../modules/networking"
environment = "prod"
vpc_cidr = var.vpc_cidr
}
module "compute" {
source = "../../modules/compute"
environment = "prod"
vpc_id = module.networking.vpc_id
subnet_ids = module.networking.private_subnet_ids
instance_type = var.instance_type
}
Large Projects: Component-Based Structure
For large organisations, split by component/service:
infrastructure/
├── _modules/ # Shared modules
│ ├── vpc/
│ ├── eks-cluster/
│ ├── rds-instance/
│ └── s3-bucket/
├── networking/ # Network team owns this
│ ├── vpc-main/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ └── transit-gateway/
├── platform/ # Platform team owns this
│ ├── eks-cluster/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ └── shared-services/
├── data/ # Data team owns this
│ ├── data-lake/
│ └── analytics-cluster/
└── applications/ # App teams own these
├── api-service/
├── web-frontend/
└── worker-service/
Why this works:
- Team ownership is clear
- Components can evolve independently
- Blast radius is limited (one component’s state doesn’t affect others)
- Different teams can have different deployment cadences
State Management
Terraform state is the source of truth for what exists in your infrastructure. Get this wrong and you’ll face:
- State corruption
- Race conditions with concurrent applies
- Lost resources (Terraform thinks they don’t exist)
- Security breaches (state contains secrets)
Remote State: Non-Negotiable
Never use local state for team projects:
# backend.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "networking/vpc/prod/terraform.tfstate"
region = "eu-west-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
Why remote state matters:
- Collaboration - Team members can work on the same infrastructure
- Locking - Prevents concurrent modifications
- Security - State can be encrypted and access-controlled
- Durability - S3 is more reliable than your laptop
State Locking
Without locking, two people running terraform apply simultaneously can corrupt state:
# DynamoDB table for state locking
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Purpose = "Terraform state locking"
}
}
When someone runs terraform apply, they acquire a lock:
Acquiring state lock. This may take a few moments...
If someone else tries to apply:
Error: Error acquiring the state lock
Lock Info:
ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Path: mycompany-terraform-state/networking/vpc/prod/terraform.tfstate
Operation: OperationTypeApply
Who: alice@mycompany.com
Created: 2025-09-20 10:30:00 UTC
S3 Native State Locking (Terraform 1.10+)
As of Terraform 1.10, S3 supports native state locking without DynamoDB. This uses S3’s conditional writes feature, eliminating the need for a separate DynamoDB table.
# backend.tf - S3 native locking (no DynamoDB needed)
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "networking/vpc/prod/terraform.tfstate"
region = "eu-west-1"
encrypt = true
use_lockfile = true # Enable S3 native locking
}
}
How it works:
S3 native locking creates a .tflock file alongside your state file:
s3://mycompany-terraform-state/
├── networking/vpc/prod/terraform.tfstate
└── networking/vpc/prod/terraform.tfstate.tflock # Lock file
The lock file contains metadata about who holds the lock:
{
"ID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"Operation": "OperationTypeApply",
"Who": "alice@mycompany.com",
"Created": "2025-09-20T10:30:00Z"
}
When to use which:
| Approach | Pros | Cons |
|---|---|---|
| S3 Native | Simpler setup, no DynamoDB costs, fewer resources to manage | Requires Terraform 1.10+, newer feature |
| DynamoDB | Battle-tested, works with older Terraform versions | Extra resource to manage, small cost |
Migration from DynamoDB to S3 native:
# Step 1: Update backend config
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "networking/vpc/prod/terraform.tfstate"
region = "eu-west-1"
encrypt = true
use_lockfile = true # Add this
# dynamodb_table = "terraform-locks" # Remove this
}
}
# Step 2: Run terraform init -reconfigure
For new projects on Terraform 1.10+, prefer S3 native locking for simplicity.
State File Organisation
One state file per component per environment:
s3://terraform-state/
├── networking/
│ ├── vpc/
│ │ ├── dev/terraform.tfstate
│ │ ├── staging/terraform.tfstate
│ │ └── prod/terraform.tfstate
│ └── transit-gateway/
│ └── prod/terraform.tfstate
├── platform/
│ ├── eks/
│ │ ├── dev/terraform.tfstate
│ │ ├── staging/terraform.tfstate
│ │ └── prod/terraform.tfstate
│ └── shared-services/
│ └── prod/terraform.tfstate
└── applications/
└── api/
├── dev/terraform.tfstate
├── staging/terraform.tfstate
└── prod/terraform.tfstate
Why separate state files:
- Blast radius - A mistake in the API service state doesn’t affect networking
- Performance - Smaller states = faster plans
- Permissions - Different teams can have access to different states
- Parallelism - Teams can apply simultaneously to different components
State File Security
State files contain sensitive data (passwords, keys, tokens):
# This ends up in state file in plain text!
resource "aws_db_instance" "main" {
password = var.db_password # Stored in state
}
Security measures:
# 1. Encrypt state at rest
terraform {
backend "s3" {
encrypt = true # SSE-S3 encryption
# Or use KMS
kms_key_id = "arn:aws:kms:eu-west-1:123456789:key/abc-123"
}
}
# 2. Restrict state bucket access
resource "aws_s3_bucket_policy" "state" {
bucket = aws_s3_bucket.terraform_state.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
aws_s3_bucket.terraform_state.arn,
"${aws_s3_bucket.terraform_state.arn}/*"
]
Condition = {
Bool = {
"aws:SecureTransport" = "false"
}
}
}
]
})
}
# 3. Enable versioning for recovery
resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
Workspaces: Use Sparingly
Terraform workspaces allow multiple states from the same configuration:
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
terraform apply
The appeal: Less duplication - one set of .tf files for all environments.
The problems:
- All environments share code - You can’t have different resources in prod vs dev
- Harder to review - PRs don’t show which environment changes
- Easy to apply to wrong environment -
terraform applywithout checking workspace - State paths are less clear - All in one bucket path with workspace suffix
When workspaces make sense:
- Identical ephemeral environments (PR preview environments)
- True multi-tenancy where each tenant is identical
Prefer directory separation for environments that need to differ (which is almost always).
Module Design
Modules are Terraform’s abstraction mechanism. Good modules are reusable, composable, and encapsulate complexity.
Module Structure
modules/
└── rds-instance/
├── main.tf # Resources
├── variables.tf # Input variables
├── outputs.tf # Output values
├── versions.tf # Required versions
├── locals.tf # Local values
├── data.tf # Data sources
├── README.md # Documentation
└── examples/
├── simple/
│ └── main.tf
└── complete/
└── main.tf
Variable Design
Use descriptive names and descriptions:
# Bad
variable "size" {
type = string
}
# Good
variable "instance_class" {
description = "The RDS instance class (e.g., db.t3.micro, db.r5.large)"
type = string
default = "db.t3.micro"
validation {
condition = can(regex("^db\\.", var.instance_class))
error_message = "Instance class must start with 'db.' prefix."
}
}
Use object types for related variables:
# Instead of many separate variables
variable "vpc_id" {}
variable "subnet_ids" {}
variable "security_group_ids" {}
# Group them
variable "network_config" {
description = "Network configuration for the database"
type = object({
vpc_id = string
subnet_ids = list(string)
security_group_ids = list(string)
})
}
Provide sensible defaults:
variable "backup_retention_period" {
description = "Number of days to retain backups"
type = number
default = 7
validation {
condition = var.backup_retention_period >= 0 && var.backup_retention_period <= 35
error_message = "Backup retention must be between 0 and 35 days."
}
}
Output Design
Output everything consumers might need:
# outputs.tf
output "endpoint" {
description = "The connection endpoint for the database"
value = aws_db_instance.main.endpoint
}
output "port" {
description = "The port the database is listening on"
value = aws_db_instance.main.port
}
output "arn" {
description = "The ARN of the RDS instance"
value = aws_db_instance.main.arn
}
output "id" {
description = "The RDS instance identifier"
value = aws_db_instance.main.id
}
# Output as object for convenience
output "database" {
description = "All database attributes"
value = {
endpoint = aws_db_instance.main.endpoint
port = aws_db_instance.main.port
arn = aws_db_instance.main.arn
id = aws_db_instance.main.id
}
}
Mark sensitive outputs:
output "master_password" {
description = "The master password (if generated)"
value = random_password.master.result
sensitive = true
}
Module Composition
Build complex infrastructure from simple modules:
# High-level module composes lower-level modules
module "web_application" {
source = "./modules/web-application"
name = "myapp"
environment = "prod"
# This module internally uses:
# - modules/alb
# - modules/ecs-service
# - modules/rds-instance
# - modules/elasticache
}
# modules/web-application/main.tf
module "alb" {
source = "../alb"
name = var.name
vpc_id = var.vpc_id
subnet_ids = var.public_subnet_ids
}
module "database" {
source = "../rds-instance"
identifier = "${var.name}-db"
instance_class = var.db_instance_class
subnet_ids = var.private_subnet_ids
}
module "cache" {
source = "../elasticache"
cluster_id = "${var.name}-cache"
subnet_ids = var.private_subnet_ids
}
module "service" {
source = "../ecs-service"
name = var.name
cluster_arn = var.ecs_cluster_arn
target_group_arn = module.alb.target_group_arn
environment_variables = {
DATABASE_URL = module.database.connection_string
REDIS_URL = module.cache.endpoint
}
}
Module Versioning
Always version your modules:
# Using Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.1.0" # Pin to specific version
}
# Using Git
module "vpc" {
source = "git::https://github.com/myorg/terraform-modules.git//vpc?ref=v2.1.0"
}
# Using private registry
module "vpc" {
source = "app.terraform.io/myorg/vpc/aws"
version = "~> 2.0" # Allow minor updates
}
Version constraints:
version = "2.1.0" # Exact version
version = ">= 2.0" # Minimum version
version = "~> 2.1" # Allow 2.1.x, not 2.2.0
version = ">= 2.0, < 3.0" # Range
Provider Configuration
Lock Provider Versions
# versions.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
random = {
source = "hashicorp/random"
version = "~> 3.5"
}
}
}
Dependency Lock File
The .terraform.lock.hcl file pins exact provider versions:
# .terraform.lock.hcl (auto-generated)
provider "registry.terraform.io/hashicorp/aws" {
version = "5.17.0"
constraints = "~> 5.0"
hashes = [
"h1:abc123...",
"zh:def456...",
]
}
Always commit this file to ensure everyone uses identical provider versions.
# Update lock file when changing provider constraints
terraform init -upgrade
Multiple Provider Configurations
# Default provider
provider "aws" {
region = "eu-west-1"
}
# Aliased provider for different region
provider "aws" {
alias = "us_east"
region = "us-east-1"
}
# Use in resources
resource "aws_s3_bucket" "eu_bucket" {
bucket = "my-eu-bucket"
# Uses default provider
}
resource "aws_s3_bucket" "us_bucket" {
provider = aws.us_east
bucket = "my-us-bucket"
}
Provider Configuration in Modules
Modules should accept provider configurations, not define them:
# modules/s3-bucket/main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
configuration_aliases = [aws.replica] # Accept additional provider
}
}
}
resource "aws_s3_bucket" "main" {
bucket = var.bucket_name
}
resource "aws_s3_bucket" "replica" {
provider = aws.replica
bucket = "${var.bucket_name}-replica"
}
# Root module passes providers
module "bucket" {
source = "./modules/s3-bucket"
bucket_name = "my-bucket"
providers = {
aws = aws
aws.replica = aws.us_east
}
}
Naming Conventions
Consistent naming makes code readable and maintainable.
Resource Naming
# Use descriptive, lowercase names with underscores
resource "aws_security_group" "web_servers" {} # Good
resource "aws_security_group" "sg1" {} # Bad - not descriptive
resource "aws_security_group" "WebServers" {} # Bad - inconsistent case
# Include purpose in the name
resource "aws_iam_role" "lambda_execution" {}
resource "aws_s3_bucket" "application_logs" {}
Variable Naming
# Use lowercase with underscores
variable "instance_type" {} # Good
variable "instanceType" {} # Bad - camelCase
variable "instance-type" {} # Bad - hyphens
# Be specific
variable "vpc_cidr_block" {} # Good
variable "cidr" {} # Bad - ambiguous
Output Naming
# Match the resource/attribute being output
output "vpc_id" {}
output "private_subnet_ids" {}
output "database_endpoint" {}
Module Naming
# Name modules by what they create
module "networking" {} # Good
module "module1" {} # Bad
# Use consistent patterns
module "api_database" {}
module "api_cache" {}
module "api_service" {}
Coming in Part 2
Part 2 covers advanced practices:
- Testing Terraform code
- CI/CD pipelines for infrastructure
- Security best practices
- Working with teams
- Drift detection and remediation
- Performance optimisation