Building Production AMIs with Packer

At a previous company, we managed 200+ EC2 instances across multiple environments. Every deployment was a configuration management nightmare - Ansible runs that took 45 minutes, drift between instances, and “works on my machine” debugging sessions.

Then we switched to immutable infrastructure with Packer-built AMIs. Deploy time dropped to 3 minutes. Rollbacks became instant. Debugging became “which AMI version was running?”

This guide covers everything we learned: the CI pipeline, Terraform integration with ASGs, rollback strategies, AMI maintenance, and the security hardening that passed our SOC 2 audit.

Code Repository: All code from this post is available at github.com/moabukar/blog-code/packer-ami-production

TL;DR

Build AMIs with Packer in CI - every merge to main produces a versioned AMI
Terraform references AMIs by tag/filter, not hardcoded ID
ASG rolling updates with health checks enable zero-downtime deploys
Keep last 5 AMIs for instant rollbacks, automate cleanup of older ones
Security: no SSH keys baked in, CIS benchmarks, encrypted root volumes

Architecture Overview

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   GitHub    │────▶│  CI Server  │────▶│  AWS AMI    │
│  (Packer)   │     │  (Build)    │     │  Registry   │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Terraform  │────▶│  Launch     │────▶│    ASG      │
│  (Deploy)   │     │  Template   │     │  Instances  │
└─────────────┘     └─────────────┘     └─────────────┘

Flow:

Code Change ──▶ Packer Build ──▶ AMI Created ──▶ Terraform Apply ──▶ ASG Rolling Update

Why Immutable AMIs?

Before diving into implementation, here’s why we made the switch:

APPROACH DEPLOY TIME ROLLBACK DRIFT RISK DEBUGGING ======== =========== ======== ========== ========= Config Management 30-60 min Rebuild High Complex Container (ECS/K8s) 2-5 min Instant None Medium Immutable AMI 2-5 min Instant None Simple

Immutable AMIs give you:

Consistency - Every instance is identical, always
Fast rollbacks - Just point ASG to previous AMI
Audit trail - Know exactly what’s running from the AMI tag
Simplified debugging - Reproduce issues with the exact AMI version

Prerequisites

TOOL VERSION PURPOSE ==== ======= ======= Packer >= 1.9.0 AMI builds Terraform >= 1.5.0 Infrastructure deployment AWS CLI >= 2.0 Authentication jq >= 1.6 JSON parsing in scripts

Directory Structure

infrastructure/
├── packer/
│   ├── base-ami.pkr.hcl           # Base AMI template
│   ├── app-ami.pkr.hcl            # Application AMI template
│   ├── variables.pkr.hcl          # Shared variables
│   ├── scripts/
│   │   ├── base-setup.sh          # OS hardening, base packages
│   │   ├── app-install.sh         # Application installation
│   │   └── cleanup.sh             # Pre-AMI cleanup
│   └── ansible/
│       └── playbook.yml           # Optional: Ansible provisioner
├── terraform/
│   ├── modules/
│   │   └── asg/
│   │       ├── main.tf
│   │       ├── variables.tf
│   │       └── outputs.tf
│   └── environments/
│       ├── staging/
│       │   └── main.tf
│       └── production/
│           └── main.tf
└── .github/
    └── workflows/
        ├── packer-build.yml       # AMI build pipeline
        └── terraform-deploy.yml   # Deployment pipeline

The Packer Template

Here’s our production Packer template. Key decisions explained in comments.

# packer/app-ami.pkr.hcl

packer {
  required_plugins {
    amazon = {
      version = ">= 1.2.0"
      source  = "github.com/hashicorp/amazon"
    }
  }
}

# Variables - passed from CI or .auto.pkrvars.hcl
variable "aws_region" {
  type    = string
  default = "eu-west-1"
}

variable "app_version" {
  type        = string
  description = "Application version - typically git SHA or semver"
}

variable "base_ami_name" {
  type    = string
  default = "amzn2-ami-hvm-*-x86_64-gp2"
}

variable "instance_type" {
  type    = string
  default = "t3.medium"
  # Use same instance type as production for accurate builds
}

variable "vpc_id" {
  type        = string
  description = "VPC for build instance - use dedicated build VPC"
}

variable "subnet_id" {
  type        = string
  description = "Subnet for build instance - private subnet recommended"
}

# Find the latest Amazon Linux 2 AMI
source "amazon-ebs" "app" {
  ami_name        = "myapp-${var.app_version}-{{timestamp}}"
  ami_description = "MyApp AMI - Version ${var.app_version}"
  instance_type   = var.instance_type
  region          = var.aws_region

  # Source AMI filter - always builds from latest base
  source_ami_filter {
    filters = {
      name                = var.base_ami_name
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["amazon"]
  }

  # Network configuration
  vpc_id                      = var.vpc_id
  subnet_id                   = var.subnet_id
  associate_public_ip_address = false  # Private subnet, use NAT

  # Security: Use SSM instead of SSH
  communicator         = "ssh"
  ssh_username         = "ec2-user"
  ssh_interface        = "session_manager"
  iam_instance_profile = "PackerBuildRole"

  # EBS configuration
  launch_block_device_mappings {
    device_name           = "/dev/xvda"
    volume_size           = 30
    volume_type           = "gp3"
    iops                  = 3000
    throughput            = 125
    encrypted             = true  # Always encrypt root volumes
    delete_on_termination = true
  }

  # Tags - critical for Terraform lookups and cost tracking
  tags = {
    Name        = "myapp-${var.app_version}"
    Application = "myapp"
    Version     = var.app_version
    BuildTime   = "{{timestamp}}"
    Builder     = "packer"
    Environment = "all"  # AMI usable in any environment
  }

  # Snapshot tags for cost tracking
  snapshot_tags = {
    Name        = "myapp-${var.app_version}"
    Application = "myapp"
  }

  # Build timeout - fail fast if something's wrong
  aws_polling {
    delay_seconds = 30
    max_attempts  = 60
  }
}

build {
  name    = "myapp"
  sources = ["source.amazon-ebs.app"]

  # Base OS setup
  provisioner "shell" {
    scripts = [
      "scripts/base-setup.sh"
    ]
    environment_vars = [
      "APP_VERSION=${var.app_version}"
    ]
  }

  # Application installation
  provisioner "shell" {
    script = "scripts/app-install.sh"
    environment_vars = [
      "APP_VERSION=${var.app_version}"
    ]
  }

  # Optional: Ansible for complex configuration
  # provisioner "ansible" {
  #   playbook_file = "ansible/playbook.yml"
  #   extra_arguments = [
  #     "--extra-vars", "app_version=${var.app_version}"
  #   ]
  # }

  # CRITICAL: Always run cleanup last
  provisioner "shell" {
    script = "scripts/cleanup.sh"
  }

  # Output AMI ID for downstream use
  post-processor "manifest" {
    output     = "manifest.json"
    strip_path = true
  }
}

Provisioning Scripts

Base Setup Script

#!/bin/bash
# scripts/base-setup.sh
set -euo pipefail

echo "=== Starting base setup ==="

# Update system packages
sudo yum update -y

# Install essential packages
sudo yum install -y \
    aws-cli \
    jq \
    htop \
    vim \
    curl \
    wget \
    unzip

# Install CloudWatch agent for metrics/logs
sudo yum install -y amazon-cloudwatch-agent

# Install SSM agent (usually pre-installed on Amazon Linux 2)
sudo yum install -y amazon-ssm-agent
sudo systemctl enable amazon-ssm-agent

# Configure time sync (critical for distributed systems)
sudo yum install -y chrony
sudo systemctl enable chronyd

# Security: Disable root login
sudo sed -i 's/PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config

# Security: Disable password authentication
sudo sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config

# Create application user (non-root)
sudo useradd -m -s /bin/bash appuser

echo "=== Base setup complete ==="

Application Install Script

#!/bin/bash
# scripts/app-install.sh
set -euo pipefail

echo "=== Installing application version ${APP_VERSION} ==="

# Download application artifact from S3
# Using versioned path ensures reproducibility
aws s3 cp "s3://mycompany-artifacts/myapp/${APP_VERSION}/myapp.tar.gz" /tmp/myapp.tar.gz

# Verify checksum (uploaded alongside artifact)
aws s3 cp "s3://mycompany-artifacts/myapp/${APP_VERSION}/myapp.tar.gz.sha256" /tmp/
cd /tmp && sha256sum -c myapp.tar.gz.sha256

# Extract and install
sudo mkdir -p /opt/myapp
sudo tar -xzf /tmp/myapp.tar.gz -C /opt/myapp
sudo chown -R appuser:appuser /opt/myapp

# Install systemd service
sudo cat > /etc/systemd/system/myapp.service << 'EOF'
[Unit]
Description=MyApp Service
After=network.target

[Service]
Type=simple
User=appuser
Group=appuser
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/myapp
Restart=always
RestartSec=5
Environment=APP_ENV=production

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/myapp/data /var/log/myapp

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable myapp

# Create log directory
sudo mkdir -p /var/log/myapp
sudo chown appuser:appuser /var/log/myapp

# Store version for debugging
echo "${APP_VERSION}" | sudo tee /opt/myapp/VERSION

echo "=== Application installation complete ==="

Cleanup Script

This script is critical - it removes sensitive data before creating the AMI.

#!/bin/bash
# scripts/cleanup.sh
set -euo pipefail

echo "=== Starting pre-AMI cleanup ==="

# Remove SSH host keys (regenerated on first boot)
sudo rm -f /etc/ssh/ssh_host_*

# Remove temporary files
sudo rm -rf /tmp/*
sudo rm -rf /var/tmp/*

# Clean yum cache
sudo yum clean all
sudo rm -rf /var/cache/yum

# Remove shell history
sudo rm -f /root/.bash_history
rm -f ~/.bash_history
history -c

# Remove cloud-init artifacts (forces re-run on new instance)
sudo rm -rf /var/lib/cloud/instances/*

# Remove machine ID (regenerated on boot)
sudo truncate -s 0 /etc/machine-id

# Zero out free space for smaller AMI (optional, adds build time)
# sudo dd if=/dev/zero of=/EMPTY bs=1M || true
# sudo rm -f /EMPTY

# Sync filesystem
sync

echo "=== Cleanup complete ==="

CI/CD Pipeline

GitHub Actions workflow for building AMIs on every merge to main.

# .github/workflows/packer-build.yml
name: Build AMI

on:
  push:
    branches: [main]
    paths:
      - 'packer/**'
      - 'src/**'  # Rebuild AMI when application code changes
  workflow_dispatch:
    inputs:
      version:
        description: 'Version tag (defaults to git SHA)'
        required: false

env:
  AWS_REGION: eu-west-1
  PACKER_VERSION: 1.9.4

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      id-token: write  # Required for OIDC
      contents: read

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsPackerRole
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Packer
        uses: hashicorp/setup-packer@main
        with:
          version: ${{ env.PACKER_VERSION }}

      - name: Set version
        id: version
        run: |
          if [ -n "${{ github.event.inputs.version }}" ]; then
            echo "version=${{ github.event.inputs.version }}" >> $GITHUB_OUTPUT
          else
            echo "version=${GITHUB_SHA::8}" >> $GITHUB_OUTPUT
          fi

      - name: Build application artifact
        run: |
          # Build your application here
          make build VERSION=${{ steps.version.outputs.version }}
          
          # Upload to S3 for Packer to retrieve
          aws s3 cp dist/myapp.tar.gz \
            s3://mycompany-artifacts/myapp/${{ steps.version.outputs.version }}/myapp.tar.gz
          
          # Upload checksum
          sha256sum dist/myapp.tar.gz > dist/myapp.tar.gz.sha256
          aws s3 cp dist/myapp.tar.gz.sha256 \
            s3://mycompany-artifacts/myapp/${{ steps.version.outputs.version }}/myapp.tar.gz.sha256

      - name: Packer init
        working-directory: packer
        run: packer init .

      - name: Packer validate
        working-directory: packer
        run: |
          packer validate \
            -var="app_version=${{ steps.version.outputs.version }}" \
            -var="vpc_id=${{ secrets.BUILD_VPC_ID }}" \
            -var="subnet_id=${{ secrets.BUILD_SUBNET_ID }}" \
            app-ami.pkr.hcl

      - name: Packer build
        working-directory: packer
        run: |
          packer build \
            -var="app_version=${{ steps.version.outputs.version }}" \
            -var="vpc_id=${{ secrets.BUILD_VPC_ID }}" \
            -var="subnet_id=${{ secrets.BUILD_SUBNET_ID }}" \
            -color=false \
            app-ami.pkr.hcl

      - name: Extract AMI ID
        id: ami
        working-directory: packer
        run: |
          AMI_ID=$(jq -r '.builds[-1].artifact_id | split(":")[1]' manifest.json)
          echo "ami_id=${AMI_ID}" >> $GITHUB_OUTPUT
          echo "AMI created: ${AMI_ID}"

      - name: Store AMI ID
        run: |
          # Store AMI ID in Parameter Store for Terraform
          aws ssm put-parameter \
            --name "/myapp/ami/latest" \
            --value "${{ steps.ami.outputs.ami_id }}" \
            --type String \
            --overwrite
          
          # Also store with version tag
          aws ssm put-parameter \
            --name "/myapp/ami/${{ steps.version.outputs.version }}" \
            --value "${{ steps.ami.outputs.ami_id }}" \
            --type String \
            --overwrite

    outputs:
      ami_id: ${{ steps.ami.outputs.ami_id }}
      version: ${{ steps.version.outputs.version }}

  # Optional: Trigger deployment to staging
  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Trigger Terraform deployment
        run: |
          # Trigger your deployment pipeline
          gh workflow run terraform-deploy.yml \
            -f environment=staging \
            -f ami_id=${{ needs.build.outputs.ami_id }}

Terraform Integration

ASG Module

This module creates an Auto Scaling Group that dynamically fetches the latest AMI.

# terraform/modules/asg/main.tf

variable "app_name" {
  type = string
}

variable "environment" {
  type = string
}

variable "ami_version" {
  type        = string
  default     = "latest"
  description = "AMI version tag or 'latest'"
}

variable "instance_type" {
  type    = string
  default = "t3.medium"
}

variable "min_size" {
  type    = number
  default = 2
}

variable "max_size" {
  type    = number
  default = 10
}

variable "desired_capacity" {
  type    = number
  default = 2
}

variable "vpc_id" {
  type = string
}

variable "subnet_ids" {
  type = list(string)
}

variable "target_group_arns" {
  type    = list(string)
  default = []
}

# Fetch AMI ID from SSM Parameter Store
# This allows Packer to update the parameter, and Terraform to read it
data "aws_ssm_parameter" "ami_id" {
  name = "/myapp/ami/${var.ami_version}"
}

# Alternative: Fetch AMI by tags (useful for cross-account scenarios)
data "aws_ami" "app" {
  most_recent = true
  owners      = ["self"]

  filter {
    name   = "name"
    values = ["myapp-*"]
  }

  filter {
    name   = "tag:Application"
    values = [var.app_name]
  }

  # Optional: filter by specific version
  dynamic "filter" {
    for_each = var.ami_version != "latest" ? [1] : []
    content {
      name   = "tag:Version"
      values = [var.ami_version]
    }
  }
}

# Launch template - preferred over launch configurations
resource "aws_launch_template" "app" {
  name_prefix   = "${var.app_name}-${var.environment}-"
  image_id      = data.aws_ssm_parameter.ami_id.value
  instance_type = var.instance_type

  # Use IMDSv2 only (security best practice)
  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"  # Enforces IMDSv2
    http_put_response_hop_limit = 1
  }

  # IAM role for the instance
  iam_instance_profile {
    name = aws_iam_instance_profile.app.name
  }

  # Security groups
  vpc_security_group_ids = [aws_security_group.app.id]

  # User data for instance-specific configuration
  user_data = base64encode(templatefile("${path.module}/user-data.sh", {
    environment = var.environment
    app_name    = var.app_name
  }))

  # Enable detailed monitoring
  monitoring {
    enabled = true
  }

  # Root volume (already encrypted in AMI, but explicit is good)
  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      encrypted   = true
      volume_type = "gp3"
      volume_size = 30
    }
  }

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name        = "${var.app_name}-${var.environment}"
      Environment = var.environment
      Application = var.app_name
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Auto Scaling Group
resource "aws_autoscaling_group" "app" {
  name                = "${var.app_name}-${var.environment}"
  vpc_zone_identifier = var.subnet_ids
  target_group_arns   = var.target_group_arns
  health_check_type   = "ELB"  # Use ALB health checks
  health_check_grace_period = 300

  min_size         = var.min_size
  max_size         = var.max_size
  desired_capacity = var.desired_capacity

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  # Rolling update configuration
  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 75  # Keep 75% healthy during update
      instance_warmup        = 120 # Wait 2 mins before considering healthy
    }
    triggers = ["tag"]  # Refresh when tags change
  }

  # Termination policy for predictable scaling
  termination_policies = ["OldestInstance"]

  # Tags propagated to instances
  tag {
    key                 = "Name"
    value               = "${var.app_name}-${var.environment}"
    propagate_at_launch = true
  }

  tag {
    key                 = "Environment"
    value               = var.environment
    propagate_at_launch = true
  }

  # AMI version tag for debugging
  tag {
    key                 = "AMI-Version"
    value               = var.ami_version
    propagate_at_launch = true
  }

  lifecycle {
    create_before_destroy = true
    # Ignore desired_capacity changes from autoscaling
    ignore_changes = [desired_capacity]
  }
}

# Security group
resource "aws_security_group" "app" {
  name_prefix = "${var.app_name}-${var.environment}-"
  vpc_id      = var.vpc_id

  # Allow inbound from ALB only
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = var.alb_security_group_ids
  }

  # Allow all outbound
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

# IAM role for instances
resource "aws_iam_role" "app" {
  name_prefix = "${var.app_name}-${var.environment}-"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

# SSM access for Session Manager (no SSH needed)
resource "aws_iam_role_policy_attachment" "ssm" {
  role       = aws_iam_role.app.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

resource "aws_iam_instance_profile" "app" {
  name_prefix = "${var.app_name}-${var.environment}-"
  role        = aws_iam_role.app.name
}

output "asg_name" {
  value = aws_autoscaling_group.app.name
}

output "launch_template_id" {
  value = aws_launch_template.app.id
}

Environment Configuration

# terraform/environments/production/main.tf

provider "aws" {
  region = "eu-west-1"
}

module "app_asg" {
  source = "../../modules/asg"

  app_name    = "myapp"
  environment = "production"
  
  # Pin to specific version in production
  # Change this to deploy a new version
  ami_version = "v1.2.3"  # Or use "latest" for auto-deploy

  instance_type    = "t3.large"
  min_size         = 3
  max_size         = 20
  desired_capacity = 5

  vpc_id     = data.aws_vpc.main.id
  subnet_ids = data.aws_subnets.private.ids

  target_group_arns = [aws_lb_target_group.app.arn]
}

Rollback Strategy

Instant rollbacks are one of the biggest benefits of immutable AMIs.

Option 1: Terraform Rollback

# Change ami_version in Terraform and apply
# terraform/environments/production/main.tf
# ami_version = "v1.2.2"  # Previous version

terraform apply

The ASG instance refresh will automatically roll out the old AMI.

Option 2: Manual ASG Update

For emergency rollbacks without Terraform:

#!/bin/bash
# scripts/rollback.sh

set -euo pipefail

PREVIOUS_AMI="ami-0abc123"  # Get from SSM or tags
ASG_NAME="myapp-production"
LAUNCH_TEMPLATE_NAME="myapp-production"

# Create new launch template version with old AMI
aws ec2 create-launch-template-version \
    --launch-template-name "${LAUNCH_TEMPLATE_NAME}" \
    --source-version '$Latest' \
    --launch-template-data "{\"ImageId\":\"${PREVIOUS_AMI}\"}"

# Start instance refresh
aws autoscaling start-instance-refresh \
    --auto-scaling-group-name "${ASG_NAME}" \
    --preferences '{
        "MinHealthyPercentage": 75,
        "InstanceWarmup": 120
    }'

echo "Rollback initiated. Monitor with:"
echo "aws autoscaling describe-instance-refreshes --auto-scaling-group-name ${ASG_NAME}"

Option 3: Blue-Green with Target Groups

For zero-downtime rollbacks, use blue-green deployments:

# Two ASGs - blue and green
# Switch ALB listener between them

resource "aws_lb_listener_rule" "app" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100

  action {
    type             = "forward"
    # Switch between blue and green target groups
    target_group_arn = var.active_color == "blue" ? 
      aws_lb_target_group.blue.arn : 
      aws_lb_target_group.green.arn
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

AMI Maintenance

Keeping AMI inventory clean is crucial for cost and security.

Automated Cleanup

#!/bin/bash
# scripts/cleanup-old-amis.sh
# Run weekly via cron or scheduled Lambda

set -euo pipefail

APP_NAME="myapp"
KEEP_COUNT=5  # Keep last 5 AMIs

echo "=== Cleaning up old AMIs for ${APP_NAME} ==="

# Get all AMIs sorted by creation date
AMIS=$(aws ec2 describe-images \
    --owners self \
    --filters "Name=tag:Application,Values=${APP_NAME}" \
    --query 'sort_by(Images, &CreationDate)[*].[ImageId,CreationDate,Name]' \
    --output text)

TOTAL=$(echo "${AMIS}" | wc -l)
DELETE_COUNT=$((TOTAL - KEEP_COUNT))

if [ ${DELETE_COUNT} -le 0 ]; then
    echo "Only ${TOTAL} AMIs exist, keeping all"
    exit 0
fi

echo "Found ${TOTAL} AMIs, will delete ${DELETE_COUNT}"

# Get AMIs to delete (oldest first)
TO_DELETE=$(echo "${AMIS}" | head -n ${DELETE_COUNT})

echo "${TO_DELETE}" | while read -r ami_id created_at ami_name; do
    echo "Deleting: ${ami_id} (${ami_name}, created ${created_at})"
    
    # Get associated snapshots
    SNAPSHOTS=$(aws ec2 describe-images \
        --image-ids "${ami_id}" \
        --query 'Images[0].BlockDeviceMappings[*].Ebs.SnapshotId' \
        --output text)
    
    # Deregister AMI
    aws ec2 deregister-image --image-id "${ami_id}"
    
    # Delete snapshots
    for snapshot in ${SNAPSHOTS}; do
        if [ "${snapshot}" != "None" ]; then
            echo "  Deleting snapshot: ${snapshot}"
            aws ec2 delete-snapshot --snapshot-id "${snapshot}"
        fi
    done
done

echo "=== Cleanup complete ==="

Lambda for Scheduled Cleanup

# lambda/cleanup_amis.py
import boto3
from datetime import datetime, timedelta

def handler(event, context):
    ec2 = boto3.client('ec2')
    
    app_name = event.get('app_name', 'myapp')
    keep_count = event.get('keep_count', 5)
    
    # Get all AMIs for the application
    response = ec2.describe_images(
        Owners=['self'],
        Filters=[
            {'Name': 'tag:Application', 'Values': [app_name]},
            {'Name': 'state', 'Values': ['available']}
        ]
    )
    
    # Sort by creation date
    amis = sorted(response['Images'], key=lambda x: x['CreationDate'])
    
    # Calculate how many to delete
    delete_count = len(amis) - keep_count
    
    if delete_count <= 0:
        print(f"Only {len(amis)} AMIs exist, nothing to delete")
        return {'deleted': 0}
    
    deleted = 0
    for ami in amis[:delete_count]:
        ami_id = ami['ImageId']
        print(f"Deleting AMI: {ami_id}")
        
        # Get snapshots
        snapshots = [
            bdm['Ebs']['SnapshotId'] 
            for bdm in ami.get('BlockDeviceMappings', [])
            if 'Ebs' in bdm and 'SnapshotId' in bdm['Ebs']
        ]
        
        # Deregister AMI
        ec2.deregister_image(ImageId=ami_id)
        
        # Delete snapshots
        for snapshot_id in snapshots:
            print(f"  Deleting snapshot: {snapshot_id}")
            ec2.delete_snapshot(SnapshotId=snapshot_id)
        
        deleted += 1
    
    return {'deleted': deleted}

Security Best Practices

AMI Hardening Checklist

ITEM                              STATUS    NOTES
====                              ======    =====
Root login disabled               [x]       /etc/ssh/sshd_config
Password auth disabled            [x]       SSH keys only (or no SSH)
No SSH keys baked in              [x]       Removed in cleanup script
Root volume encrypted             [x]       Packer template
IMDSv2 enforced                   [x]       Launch template
Non-root application user         [x]       appuser in install script
Systemd security options          [x]       NoNewPrivileges, ProtectSystem
Automatic security updates        [x]       yum-cron or SSM Patch Manager
CloudWatch agent installed        [x]       Logs and metrics
SSM agent installed               [x]       No SSH needed
File integrity monitoring         [ ]       Consider AIDE or Wazuh
CIS benchmark compliance          [ ]       Use amazon-linux-cis AMI or hardening script

CIS Hardening Script

#!/bin/bash
# scripts/cis-hardening.sh
# Based on CIS Amazon Linux 2 Benchmark

set -euo pipefail

echo "=== Applying CIS hardening ==="

# 1.1.1 - Disable unused filesystems
for fs in cramfs freevxfs jffs2 hfs hfsplus squashfs udf; do
    echo "install ${fs} /bin/true" >> /etc/modprobe.d/CIS.conf
done

# 1.4.1 - Ensure permissions on bootloader config
chmod 600 /boot/grub2/grub.cfg 2>/dev/null || true

# 2.2.x - Remove unnecessary services
for svc in rpcbind cups avahi-daemon; do
    systemctl disable ${svc} 2>/dev/null || true
    systemctl stop ${svc} 2>/dev/null || true
done

# 3.1.1 - Disable IP forwarding
echo "net.ipv4.ip_forward = 0" >> /etc/sysctl.d/99-cis.conf

# 3.2.2 - Disable ICMP redirects
echo "net.ipv4.conf.all.accept_redirects = 0" >> /etc/sysctl.d/99-cis.conf
echo "net.ipv4.conf.default.accept_redirects = 0" >> /etc/sysctl.d/99-cis.conf

# 4.1.x - Configure auditd
yum install -y audit
systemctl enable auditd

# 5.2.x - SSH hardening (additional)
cat >> /etc/ssh/sshd_config << 'EOF'
Protocol 2
MaxAuthTries 4
IgnoreRhosts yes
HostbasedAuthentication no
PermitEmptyPasswords no
ClientAliveInterval 300
ClientAliveCountMax 0
LoginGraceTime 60
AllowTcpForwarding no
X11Forwarding no
EOF

# 5.4.1 - Password requirements
# (Not needed if using SSM-only access)

# Apply sysctl changes
sysctl -p /etc/sysctl.d/99-cis.conf

echo "=== CIS hardening complete ==="

Secrets Management

Never bake secrets into AMIs. Use these approaches instead:

# Option 1: AWS Secrets Manager (recommended)
# In your application startup script:
DB_PASSWORD=$(aws secretsmanager get-secret-value \
    --secret-id myapp/production/db \
    --query SecretString --output text | jq -r '.password')

# Option 2: SSM Parameter Store
DB_PASSWORD=$(aws ssm get-parameter \
    --name /myapp/production/db-password \
    --with-decryption \
    --query Parameter.Value --output text)

# Option 3: Instance metadata + IAM role
# Attach IAM role with access to specific secrets
# Application uses AWS SDK to fetch at runtime

Troubleshooting

AMI Build Failures

Problem: Build times out waiting for SSH

==> amazon-ebs.app: Waiting for SSH to become available...
==> amazon-ebs.app: Timeout waiting for SSH.

Solution: Check security group allows outbound to SSM endpoints, or use public subnet with internet access.

Problem: Provisioner script fails

Solution: Add -x to bash scripts for verbose output:

provisioner "shell" {
  inline = ["set -x", "bash /tmp/script.sh"]
}

Deployment Issues

Problem: New instances fail health checks

Solution:

Check security group allows ALB health check port
Verify application starts correctly with journalctl -u myapp
Increase health_check_grace_period if app needs warm-up time

Problem: Instance refresh stuck

# Check refresh status
aws autoscaling describe-instance-refreshes \
    --auto-scaling-group-name myapp-production

# Cancel if needed
aws autoscaling cancel-instance-refresh \
    --auto-scaling-group-name myapp-production

References

Packer Documentation: https://developer.hashicorp.com/packer/docs
AWS AMI Best Practices: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html
CIS Amazon Linux 2 Benchmark: https://www.cisecurity.org/benchmark/amazon_linux
Terraform ASG Module: https://registry.terraform.io/modules/terraform-aws-modules/autoscaling
AWS Instance Refresh: https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html

======================================== Packer + Terraform + ASG

Immutable. Versioned. Production-ready.

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Building Production AMIs with Packer

TL;DR

Architecture Overview

Why Immutable AMIs?

Prerequisites

Directory Structure

The Packer Template

Provisioning Scripts

Base Setup Script

Application Install Script

Cleanup Script

CI/CD Pipeline

Terraform Integration

ASG Module

Environment Configuration

Rollback Strategy

Option 1: Terraform Rollback

Option 2: Manual ASG Update

Option 3: Blue-Green with Target Groups

AMI Maintenance

Automated Cleanup

Lambda for Scheduled Cleanup

Security Best Practices

AMI Hardening Checklist

CIS Hardening Script

Secrets Management

Troubleshooting

AMI Build Failures

Deployment Issues

References

======================================== Packer + Terraform + ASG

Immutable. Versioned. Production-ready.

Comments

Building Production AMIs with Packer

TL;DR

Architecture Overview

Why Immutable AMIs?

Prerequisites

Directory Structure

The Packer Template

Provisioning Scripts

Base Setup Script

Application Install Script

Cleanup Script

CI/CD Pipeline

Terraform Integration

ASG Module

Environment Configuration

Rollback Strategy

Option 1: Terraform Rollback

Option 2: Manual ASG Update

Option 3: Blue-Green with Target Groups

AMI Maintenance

Automated Cleanup

Lambda for Scheduled Cleanup

Security Best Practices

AMI Hardening Checklist

CIS Hardening Script

Secrets Management

Troubleshooting

AMI Build Failures

Deployment Issues

References

======================================== Packer + Terraform + ASG

Immutable. Versioned. Production-ready.

Related Posts

OpenTelemetry Changed How I Think About Observability

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

Comments