Skip to content
Back to blog Terraform Best Practices (Part 1) - Project Structure, State, and Modules

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

TerraformDevOps

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

Terraform is deceptively simple. Write some HCL, run terraform apply, infrastructure appears. But that simplicity hides complexity that only emerges at scale - when you have multiple environments, dozens of team members, and hundreds of resources.

This two-part series covers Terraform best practices learned from managing infrastructure across startups and enterprises. Part 1 focuses on foundations: project structure, state management, and module design. Part 2 covers advanced topics: testing, CI/CD, security, and team workflows.

TL;DR

  • Use a consistent directory structure that scales with your team
  • Remote state with locking is non-negotiable for teams
  • Design modules for reusability, not just organisation
  • Use workspaces sparingly - prefer directory separation for environments
  • Lock provider versions and use dependency lock files

Code Repository: All code from this post is available at github.com/moabukar/blog-code/terraform-best-practices-part-1


Project Structure

How you organise Terraform files matters more as your infrastructure grows. There’s no single “correct” structure, but some patterns work better than others.

Small Projects: Flat Structure

For small projects or learning, a flat structure works:

terraform/
├── main.tf          # Resources
├── variables.tf     # Input variables
├── outputs.tf       # Output values
├── providers.tf     # Provider configuration
├── terraform.tfvars # Variable values
└── versions.tf      # Terraform and provider versions

When to use: Personal projects, small teams, single environment.

Limitations: Doesn’t scale. Everything in one state file becomes slow and risky.

Medium Projects: Environment Directories

Separate directories per environment:

terraform/
├── modules/
│   ├── networking/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── compute/
│   └── database/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       ├── terraform.tfvars
│       └── backend.tf
└── README.md

Why this works:

  • Each environment has its own state file
  • Shared modules reduce duplication
  • Clear separation of concerns
  • Easy to understand what changes affect which environment

How it works:

# environments/prod/main.tf
module "networking" {
  source = "../../modules/networking"
  
  environment = "prod"
  vpc_cidr    = var.vpc_cidr
}

module "compute" {
  source = "../../modules/compute"
  
  environment   = "prod"
  vpc_id        = module.networking.vpc_id
  subnet_ids    = module.networking.private_subnet_ids
  instance_type = var.instance_type
}

Large Projects: Component-Based Structure

For large organisations, split by component/service:

infrastructure/
├── _modules/                    # Shared modules
│   ├── vpc/
│   ├── eks-cluster/
│   ├── rds-instance/
│   └── s3-bucket/
├── networking/                  # Network team owns this
│   ├── vpc-main/
│   │   ├── dev/
│   │   ├── staging/
│   │   └── prod/
│   └── transit-gateway/
├── platform/                    # Platform team owns this
│   ├── eks-cluster/
│   │   ├── dev/
│   │   ├── staging/
│   │   └── prod/
│   └── shared-services/
├── data/                        # Data team owns this
│   ├── data-lake/
│   └── analytics-cluster/
└── applications/                # App teams own these
    ├── api-service/
    ├── web-frontend/
    └── worker-service/

Why this works:

  • Team ownership is clear
  • Components can evolve independently
  • Blast radius is limited (one component’s state doesn’t affect others)
  • Different teams can have different deployment cadences

State Management

Terraform state is the source of truth for what exists in your infrastructure. Get this wrong and you’ll face:

  • State corruption
  • Race conditions with concurrent applies
  • Lost resources (Terraform thinks they don’t exist)
  • Security breaches (state contains secrets)

Remote State: Non-Negotiable

Never use local state for team projects:

# backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "networking/vpc/prod/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

Why remote state matters:

  1. Collaboration - Team members can work on the same infrastructure
  2. Locking - Prevents concurrent modifications
  3. Security - State can be encrypted and access-controlled
  4. Durability - S3 is more reliable than your laptop

State Locking

Without locking, two people running terraform apply simultaneously can corrupt state:

# DynamoDB table for state locking
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Purpose = "Terraform state locking"
  }
}

When someone runs terraform apply, they acquire a lock:

Acquiring state lock. This may take a few moments...

If someone else tries to apply:

Error: Error acquiring the state lock
Lock Info:
  ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
  Path:      mycompany-terraform-state/networking/vpc/prod/terraform.tfstate
  Operation: OperationTypeApply
  Who:       alice@mycompany.com
  Created:   2025-09-20 10:30:00 UTC

S3 Native State Locking (Terraform 1.10+)

As of Terraform 1.10, S3 supports native state locking without DynamoDB. This uses S3’s conditional writes feature, eliminating the need for a separate DynamoDB table.

# backend.tf - S3 native locking (no DynamoDB needed)
terraform {
  backend "s3" {
    bucket       = "mycompany-terraform-state"
    key          = "networking/vpc/prod/terraform.tfstate"
    region       = "eu-west-1"
    encrypt      = true
    use_lockfile = true  # Enable S3 native locking
  }
}

How it works:

S3 native locking creates a .tflock file alongside your state file:

s3://mycompany-terraform-state/
├── networking/vpc/prod/terraform.tfstate
└── networking/vpc/prod/terraform.tfstate.tflock  # Lock file

The lock file contains metadata about who holds the lock:

{
  "ID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "Operation": "OperationTypeApply",
  "Who": "alice@mycompany.com",
  "Created": "2025-09-20T10:30:00Z"
}

When to use which:

ApproachProsCons
S3 NativeSimpler setup, no DynamoDB costs, fewer resources to manageRequires Terraform 1.10+, newer feature
DynamoDBBattle-tested, works with older Terraform versionsExtra resource to manage, small cost

Migration from DynamoDB to S3 native:

# Step 1: Update backend config
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "networking/vpc/prod/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    use_lockfile   = true       # Add this
    # dynamodb_table = "terraform-locks"  # Remove this
  }
}

# Step 2: Run terraform init -reconfigure

For new projects on Terraform 1.10+, prefer S3 native locking for simplicity.

State File Organisation

One state file per component per environment:

s3://terraform-state/
├── networking/
│   ├── vpc/
│   │   ├── dev/terraform.tfstate
│   │   ├── staging/terraform.tfstate
│   │   └── prod/terraform.tfstate
│   └── transit-gateway/
│       └── prod/terraform.tfstate
├── platform/
│   ├── eks/
│   │   ├── dev/terraform.tfstate
│   │   ├── staging/terraform.tfstate
│   │   └── prod/terraform.tfstate
│   └── shared-services/
│       └── prod/terraform.tfstate
└── applications/
    └── api/
        ├── dev/terraform.tfstate
        ├── staging/terraform.tfstate
        └── prod/terraform.tfstate

Why separate state files:

  1. Blast radius - A mistake in the API service state doesn’t affect networking
  2. Performance - Smaller states = faster plans
  3. Permissions - Different teams can have access to different states
  4. Parallelism - Teams can apply simultaneously to different components

State File Security

State files contain sensitive data (passwords, keys, tokens):

# This ends up in state file in plain text!
resource "aws_db_instance" "main" {
  password = var.db_password  # Stored in state
}

Security measures:

# 1. Encrypt state at rest
terraform {
  backend "s3" {
    encrypt = true  # SSE-S3 encryption
    
    # Or use KMS
    kms_key_id = "arn:aws:kms:eu-west-1:123456789:key/abc-123"
  }
}

# 2. Restrict state bucket access
resource "aws_s3_bucket_policy" "state" {
  bucket = aws_s3_bucket.terraform_state.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Deny"
        Principal = "*"
        Action = "s3:*"
        Resource = [
          aws_s3_bucket.terraform_state.arn,
          "${aws_s3_bucket.terraform_state.arn}/*"
        ]
        Condition = {
          Bool = {
            "aws:SecureTransport" = "false"
          }
        }
      }
    ]
  })
}

# 3. Enable versioning for recovery
resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

Workspaces: Use Sparingly

Terraform workspaces allow multiple states from the same configuration:

terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
terraform apply

The appeal: Less duplication - one set of .tf files for all environments.

The problems:

  1. All environments share code - You can’t have different resources in prod vs dev
  2. Harder to review - PRs don’t show which environment changes
  3. Easy to apply to wrong environment - terraform apply without checking workspace
  4. State paths are less clear - All in one bucket path with workspace suffix

When workspaces make sense:

  • Identical ephemeral environments (PR preview environments)
  • True multi-tenancy where each tenant is identical

Prefer directory separation for environments that need to differ (which is almost always).


Module Design

Modules are Terraform’s abstraction mechanism. Good modules are reusable, composable, and encapsulate complexity.

Module Structure

modules/
└── rds-instance/
    ├── main.tf           # Resources
    ├── variables.tf      # Input variables
    ├── outputs.tf        # Output values
    ├── versions.tf       # Required versions
    ├── locals.tf         # Local values
    ├── data.tf           # Data sources
    ├── README.md         # Documentation
    └── examples/
        ├── simple/
        │   └── main.tf
        └── complete/
            └── main.tf

Variable Design

Use descriptive names and descriptions:

# Bad
variable "size" {
  type = string
}

# Good
variable "instance_class" {
  description = "The RDS instance class (e.g., db.t3.micro, db.r5.large)"
  type        = string
  default     = "db.t3.micro"

  validation {
    condition     = can(regex("^db\\.", var.instance_class))
    error_message = "Instance class must start with 'db.' prefix."
  }
}

Use object types for related variables:

# Instead of many separate variables
variable "vpc_id" {}
variable "subnet_ids" {}
variable "security_group_ids" {}

# Group them
variable "network_config" {
  description = "Network configuration for the database"
  type = object({
    vpc_id             = string
    subnet_ids         = list(string)
    security_group_ids = list(string)
  })
}

Provide sensible defaults:

variable "backup_retention_period" {
  description = "Number of days to retain backups"
  type        = number
  default     = 7

  validation {
    condition     = var.backup_retention_period >= 0 && var.backup_retention_period <= 35
    error_message = "Backup retention must be between 0 and 35 days."
  }
}

Output Design

Output everything consumers might need:

# outputs.tf
output "endpoint" {
  description = "The connection endpoint for the database"
  value       = aws_db_instance.main.endpoint
}

output "port" {
  description = "The port the database is listening on"
  value       = aws_db_instance.main.port
}

output "arn" {
  description = "The ARN of the RDS instance"
  value       = aws_db_instance.main.arn
}

output "id" {
  description = "The RDS instance identifier"
  value       = aws_db_instance.main.id
}

# Output as object for convenience
output "database" {
  description = "All database attributes"
  value = {
    endpoint = aws_db_instance.main.endpoint
    port     = aws_db_instance.main.port
    arn      = aws_db_instance.main.arn
    id       = aws_db_instance.main.id
  }
}

Mark sensitive outputs:

output "master_password" {
  description = "The master password (if generated)"
  value       = random_password.master.result
  sensitive   = true
}

Module Composition

Build complex infrastructure from simple modules:

# High-level module composes lower-level modules
module "web_application" {
  source = "./modules/web-application"

  name        = "myapp"
  environment = "prod"

  # This module internally uses:
  # - modules/alb
  # - modules/ecs-service
  # - modules/rds-instance
  # - modules/elasticache
}
# modules/web-application/main.tf
module "alb" {
  source = "../alb"
  
  name       = var.name
  vpc_id     = var.vpc_id
  subnet_ids = var.public_subnet_ids
}

module "database" {
  source = "../rds-instance"
  
  identifier     = "${var.name}-db"
  instance_class = var.db_instance_class
  subnet_ids     = var.private_subnet_ids
}

module "cache" {
  source = "../elasticache"
  
  cluster_id = "${var.name}-cache"
  subnet_ids = var.private_subnet_ids
}

module "service" {
  source = "../ecs-service"
  
  name             = var.name
  cluster_arn      = var.ecs_cluster_arn
  target_group_arn = module.alb.target_group_arn
  
  environment_variables = {
    DATABASE_URL = module.database.connection_string
    REDIS_URL    = module.cache.endpoint
  }
}

Module Versioning

Always version your modules:

# Using Terraform Registry
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.0"  # Pin to specific version
}

# Using Git
module "vpc" {
  source = "git::https://github.com/myorg/terraform-modules.git//vpc?ref=v2.1.0"
}

# Using private registry
module "vpc" {
  source  = "app.terraform.io/myorg/vpc/aws"
  version = "~> 2.0"  # Allow minor updates
}

Version constraints:

version = "2.1.0"      # Exact version
version = ">= 2.0"     # Minimum version
version = "~> 2.1"     # Allow 2.1.x, not 2.2.0
version = ">= 2.0, < 3.0"  # Range

Provider Configuration

Lock Provider Versions

# versions.tf
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.5"
    }
  }
}

Dependency Lock File

The .terraform.lock.hcl file pins exact provider versions:

# .terraform.lock.hcl (auto-generated)
provider "registry.terraform.io/hashicorp/aws" {
  version     = "5.17.0"
  constraints = "~> 5.0"
  hashes = [
    "h1:abc123...",
    "zh:def456...",
  ]
}

Always commit this file to ensure everyone uses identical provider versions.

# Update lock file when changing provider constraints
terraform init -upgrade

Multiple Provider Configurations

# Default provider
provider "aws" {
  region = "eu-west-1"
}

# Aliased provider for different region
provider "aws" {
  alias  = "us_east"
  region = "us-east-1"
}

# Use in resources
resource "aws_s3_bucket" "eu_bucket" {
  bucket = "my-eu-bucket"
  # Uses default provider
}

resource "aws_s3_bucket" "us_bucket" {
  provider = aws.us_east
  bucket   = "my-us-bucket"
}

Provider Configuration in Modules

Modules should accept provider configurations, not define them:

# modules/s3-bucket/main.tf
terraform {
  required_providers {
    aws = {
      source                = "hashicorp/aws"
      version               = ">= 5.0"
      configuration_aliases = [aws.replica]  # Accept additional provider
    }
  }
}

resource "aws_s3_bucket" "main" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket" "replica" {
  provider = aws.replica
  bucket   = "${var.bucket_name}-replica"
}
# Root module passes providers
module "bucket" {
  source = "./modules/s3-bucket"

  bucket_name = "my-bucket"

  providers = {
    aws         = aws
    aws.replica = aws.us_east
  }
}

Naming Conventions

Consistent naming makes code readable and maintainable.

Resource Naming

# Use descriptive, lowercase names with underscores
resource "aws_security_group" "web_servers" {}  # Good
resource "aws_security_group" "sg1" {}          # Bad - not descriptive
resource "aws_security_group" "WebServers" {}   # Bad - inconsistent case

# Include purpose in the name
resource "aws_iam_role" "lambda_execution" {}
resource "aws_s3_bucket" "application_logs" {}

Variable Naming

# Use lowercase with underscores
variable "instance_type" {}      # Good
variable "instanceType" {}       # Bad - camelCase
variable "instance-type" {}      # Bad - hyphens

# Be specific
variable "vpc_cidr_block" {}     # Good
variable "cidr" {}               # Bad - ambiguous

Output Naming

# Match the resource/attribute being output
output "vpc_id" {}
output "private_subnet_ids" {}
output "database_endpoint" {}

Module Naming

# Name modules by what they create
module "networking" {}           # Good
module "module1" {}              # Bad

# Use consistent patterns
module "api_database" {}
module "api_cache" {}
module "api_service" {}

Coming in Part 2

Part 2 covers advanced practices:

  • Testing Terraform code
  • CI/CD pipelines for infrastructure
  • Security best practices
  • Working with teams
  • Drift detection and remediation
  • Performance optimisation

References

Found this helpful?

Comments