Building Production AMIs with Packer
At a previous company, we managed 200+ EC2 instances across multiple environments. Every deployment was a configuration management nightmare - Ansible runs that took 45 minutes, drift between instances, and “works on my machine” debugging sessions.
Then we switched to immutable infrastructure with Packer-built AMIs. Deploy time dropped to 3 minutes. Rollbacks became instant. Debugging became “which AMI version was running?”
This guide covers everything we learned: the CI pipeline, Terraform integration with ASGs, rollback strategies, AMI maintenance, and the security hardening that passed our SOC 2 audit.
Code Repository: All code from this post is available at github.com/moabukar/blog-code/packer-ami-production
TL;DR
- Build AMIs with Packer in CI - every merge to main produces a versioned AMI
- Terraform references AMIs by tag/filter, not hardcoded ID
- ASG rolling updates with health checks enable zero-downtime deploys
- Keep last 5 AMIs for instant rollbacks, automate cleanup of older ones
- Security: no SSH keys baked in, CIS benchmarks, encrypted root volumes
Architecture Overview
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ GitHub │────▶│ CI Server │────▶│ AWS AMI │
│ (Packer) │ │ (Build) │ │ Registry │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Terraform │────▶│ Launch │────▶│ ASG │
│ (Deploy) │ │ Template │ │ Instances │
└─────────────┘ └─────────────┘ └─────────────┘
Flow:
Code Change ──▶ Packer Build ──▶ AMI Created ──▶ Terraform Apply ──▶ ASG Rolling Update
Why Immutable AMIs?
Before diving into implementation, here’s why we made the switch:
APPROACH DEPLOY TIME ROLLBACK DRIFT RISK DEBUGGING ======== =========== ======== ========== ========= Config Management 30-60 min Rebuild High Complex Container (ECS/K8s) 2-5 min Instant None Medium Immutable AMI 2-5 min Instant None Simple
Immutable AMIs give you:
- Consistency - Every instance is identical, always
- Fast rollbacks - Just point ASG to previous AMI
- Audit trail - Know exactly what’s running from the AMI tag
- Simplified debugging - Reproduce issues with the exact AMI version
Prerequisites
TOOL VERSION PURPOSE ==== ======= ======= Packer >= 1.9.0 AMI builds Terraform >= 1.5.0 Infrastructure deployment AWS CLI >= 2.0 Authentication jq >= 1.6 JSON parsing in scripts
Directory Structure
infrastructure/
├── packer/
│ ├── base-ami.pkr.hcl # Base AMI template
│ ├── app-ami.pkr.hcl # Application AMI template
│ ├── variables.pkr.hcl # Shared variables
│ ├── scripts/
│ │ ├── base-setup.sh # OS hardening, base packages
│ │ ├── app-install.sh # Application installation
│ │ └── cleanup.sh # Pre-AMI cleanup
│ └── ansible/
│ └── playbook.yml # Optional: Ansible provisioner
├── terraform/
│ ├── modules/
│ │ └── asg/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── environments/
│ ├── staging/
│ │ └── main.tf
│ └── production/
│ └── main.tf
└── .github/
└── workflows/
├── packer-build.yml # AMI build pipeline
└── terraform-deploy.yml # Deployment pipeline
The Packer Template
Here’s our production Packer template. Key decisions explained in comments.
# packer/app-ami.pkr.hcl
packer {
required_plugins {
amazon = {
version = ">= 1.2.0"
source = "github.com/hashicorp/amazon"
}
}
}
# Variables - passed from CI or .auto.pkrvars.hcl
variable "aws_region" {
type = string
default = "eu-west-1"
}
variable "app_version" {
type = string
description = "Application version - typically git SHA or semver"
}
variable "base_ami_name" {
type = string
default = "amzn2-ami-hvm-*-x86_64-gp2"
}
variable "instance_type" {
type = string
default = "t3.medium"
# Use same instance type as production for accurate builds
}
variable "vpc_id" {
type = string
description = "VPC for build instance - use dedicated build VPC"
}
variable "subnet_id" {
type = string
description = "Subnet for build instance - private subnet recommended"
}
# Find the latest Amazon Linux 2 AMI
source "amazon-ebs" "app" {
ami_name = "myapp-${var.app_version}-{{timestamp}}"
ami_description = "MyApp AMI - Version ${var.app_version}"
instance_type = var.instance_type
region = var.aws_region
# Source AMI filter - always builds from latest base
source_ami_filter {
filters = {
name = var.base_ami_name
root-device-type = "ebs"
virtualization-type = "hvm"
}
most_recent = true
owners = ["amazon"]
}
# Network configuration
vpc_id = var.vpc_id
subnet_id = var.subnet_id
associate_public_ip_address = false # Private subnet, use NAT
# Security: Use SSM instead of SSH
communicator = "ssh"
ssh_username = "ec2-user"
ssh_interface = "session_manager"
iam_instance_profile = "PackerBuildRole"
# EBS configuration
launch_block_device_mappings {
device_name = "/dev/xvda"
volume_size = 30
volume_type = "gp3"
iops = 3000
throughput = 125
encrypted = true # Always encrypt root volumes
delete_on_termination = true
}
# Tags - critical for Terraform lookups and cost tracking
tags = {
Name = "myapp-${var.app_version}"
Application = "myapp"
Version = var.app_version
BuildTime = "{{timestamp}}"
Builder = "packer"
Environment = "all" # AMI usable in any environment
}
# Snapshot tags for cost tracking
snapshot_tags = {
Name = "myapp-${var.app_version}"
Application = "myapp"
}
# Build timeout - fail fast if something's wrong
aws_polling {
delay_seconds = 30
max_attempts = 60
}
}
build {
name = "myapp"
sources = ["source.amazon-ebs.app"]
# Base OS setup
provisioner "shell" {
scripts = [
"scripts/base-setup.sh"
]
environment_vars = [
"APP_VERSION=${var.app_version}"
]
}
# Application installation
provisioner "shell" {
script = "scripts/app-install.sh"
environment_vars = [
"APP_VERSION=${var.app_version}"
]
}
# Optional: Ansible for complex configuration
# provisioner "ansible" {
# playbook_file = "ansible/playbook.yml"
# extra_arguments = [
# "--extra-vars", "app_version=${var.app_version}"
# ]
# }
# CRITICAL: Always run cleanup last
provisioner "shell" {
script = "scripts/cleanup.sh"
}
# Output AMI ID for downstream use
post-processor "manifest" {
output = "manifest.json"
strip_path = true
}
}
Provisioning Scripts
Base Setup Script
#!/bin/bash
# scripts/base-setup.sh
set -euo pipefail
echo "=== Starting base setup ==="
# Update system packages
sudo yum update -y
# Install essential packages
sudo yum install -y \
aws-cli \
jq \
htop \
vim \
curl \
wget \
unzip
# Install CloudWatch agent for metrics/logs
sudo yum install -y amazon-cloudwatch-agent
# Install SSM agent (usually pre-installed on Amazon Linux 2)
sudo yum install -y amazon-ssm-agent
sudo systemctl enable amazon-ssm-agent
# Configure time sync (critical for distributed systems)
sudo yum install -y chrony
sudo systemctl enable chronyd
# Security: Disable root login
sudo sed -i 's/PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
# Security: Disable password authentication
sudo sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
# Create application user (non-root)
sudo useradd -m -s /bin/bash appuser
echo "=== Base setup complete ==="
Application Install Script
#!/bin/bash
# scripts/app-install.sh
set -euo pipefail
echo "=== Installing application version ${APP_VERSION} ==="
# Download application artifact from S3
# Using versioned path ensures reproducibility
aws s3 cp "s3://mycompany-artifacts/myapp/${APP_VERSION}/myapp.tar.gz" /tmp/myapp.tar.gz
# Verify checksum (uploaded alongside artifact)
aws s3 cp "s3://mycompany-artifacts/myapp/${APP_VERSION}/myapp.tar.gz.sha256" /tmp/
cd /tmp && sha256sum -c myapp.tar.gz.sha256
# Extract and install
sudo mkdir -p /opt/myapp
sudo tar -xzf /tmp/myapp.tar.gz -C /opt/myapp
sudo chown -R appuser:appuser /opt/myapp
# Install systemd service
sudo cat > /etc/systemd/system/myapp.service << 'EOF'
[Unit]
Description=MyApp Service
After=network.target
[Service]
Type=simple
User=appuser
Group=appuser
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/myapp
Restart=always
RestartSec=5
Environment=APP_ENV=production
# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/myapp/data /var/log/myapp
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable myapp
# Create log directory
sudo mkdir -p /var/log/myapp
sudo chown appuser:appuser /var/log/myapp
# Store version for debugging
echo "${APP_VERSION}" | sudo tee /opt/myapp/VERSION
echo "=== Application installation complete ==="
Cleanup Script
This script is critical - it removes sensitive data before creating the AMI.
#!/bin/bash
# scripts/cleanup.sh
set -euo pipefail
echo "=== Starting pre-AMI cleanup ==="
# Remove SSH host keys (regenerated on first boot)
sudo rm -f /etc/ssh/ssh_host_*
# Remove temporary files
sudo rm -rf /tmp/*
sudo rm -rf /var/tmp/*
# Clean yum cache
sudo yum clean all
sudo rm -rf /var/cache/yum
# Remove shell history
sudo rm -f /root/.bash_history
rm -f ~/.bash_history
history -c
# Remove cloud-init artifacts (forces re-run on new instance)
sudo rm -rf /var/lib/cloud/instances/*
# Remove machine ID (regenerated on boot)
sudo truncate -s 0 /etc/machine-id
# Zero out free space for smaller AMI (optional, adds build time)
# sudo dd if=/dev/zero of=/EMPTY bs=1M || true
# sudo rm -f /EMPTY
# Sync filesystem
sync
echo "=== Cleanup complete ==="
CI/CD Pipeline
GitHub Actions workflow for building AMIs on every merge to main.
# .github/workflows/packer-build.yml
name: Build AMI
on:
push:
branches: [main]
paths:
- 'packer/**'
- 'src/**' # Rebuild AMI when application code changes
workflow_dispatch:
inputs:
version:
description: 'Version tag (defaults to git SHA)'
required: false
env:
AWS_REGION: eu-west-1
PACKER_VERSION: 1.9.4
jobs:
build:
runs-on: ubuntu-latest
permissions:
id-token: write # Required for OIDC
contents: read
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsPackerRole
aws-region: ${{ env.AWS_REGION }}
- name: Setup Packer
uses: hashicorp/setup-packer@main
with:
version: ${{ env.PACKER_VERSION }}
- name: Set version
id: version
run: |
if [ -n "${{ github.event.inputs.version }}" ]; then
echo "version=${{ github.event.inputs.version }}" >> $GITHUB_OUTPUT
else
echo "version=${GITHUB_SHA::8}" >> $GITHUB_OUTPUT
fi
- name: Build application artifact
run: |
# Build your application here
make build VERSION=${{ steps.version.outputs.version }}
# Upload to S3 for Packer to retrieve
aws s3 cp dist/myapp.tar.gz \
s3://mycompany-artifacts/myapp/${{ steps.version.outputs.version }}/myapp.tar.gz
# Upload checksum
sha256sum dist/myapp.tar.gz > dist/myapp.tar.gz.sha256
aws s3 cp dist/myapp.tar.gz.sha256 \
s3://mycompany-artifacts/myapp/${{ steps.version.outputs.version }}/myapp.tar.gz.sha256
- name: Packer init
working-directory: packer
run: packer init .
- name: Packer validate
working-directory: packer
run: |
packer validate \
-var="app_version=${{ steps.version.outputs.version }}" \
-var="vpc_id=${{ secrets.BUILD_VPC_ID }}" \
-var="subnet_id=${{ secrets.BUILD_SUBNET_ID }}" \
app-ami.pkr.hcl
- name: Packer build
working-directory: packer
run: |
packer build \
-var="app_version=${{ steps.version.outputs.version }}" \
-var="vpc_id=${{ secrets.BUILD_VPC_ID }}" \
-var="subnet_id=${{ secrets.BUILD_SUBNET_ID }}" \
-color=false \
app-ami.pkr.hcl
- name: Extract AMI ID
id: ami
working-directory: packer
run: |
AMI_ID=$(jq -r '.builds[-1].artifact_id | split(":")[1]' manifest.json)
echo "ami_id=${AMI_ID}" >> $GITHUB_OUTPUT
echo "AMI created: ${AMI_ID}"
- name: Store AMI ID
run: |
# Store AMI ID in Parameter Store for Terraform
aws ssm put-parameter \
--name "/myapp/ami/latest" \
--value "${{ steps.ami.outputs.ami_id }}" \
--type String \
--overwrite
# Also store with version tag
aws ssm put-parameter \
--name "/myapp/ami/${{ steps.version.outputs.version }}" \
--value "${{ steps.ami.outputs.ami_id }}" \
--type String \
--overwrite
outputs:
ami_id: ${{ steps.ami.outputs.ami_id }}
version: ${{ steps.version.outputs.version }}
# Optional: Trigger deployment to staging
deploy-staging:
needs: build
runs-on: ubuntu-latest
environment: staging
steps:
- name: Trigger Terraform deployment
run: |
# Trigger your deployment pipeline
gh workflow run terraform-deploy.yml \
-f environment=staging \
-f ami_id=${{ needs.build.outputs.ami_id }}
Terraform Integration
ASG Module
This module creates an Auto Scaling Group that dynamically fetches the latest AMI.
# terraform/modules/asg/main.tf
variable "app_name" {
type = string
}
variable "environment" {
type = string
}
variable "ami_version" {
type = string
default = "latest"
description = "AMI version tag or 'latest'"
}
variable "instance_type" {
type = string
default = "t3.medium"
}
variable "min_size" {
type = number
default = 2
}
variable "max_size" {
type = number
default = 10
}
variable "desired_capacity" {
type = number
default = 2
}
variable "vpc_id" {
type = string
}
variable "subnet_ids" {
type = list(string)
}
variable "target_group_arns" {
type = list(string)
default = []
}
# Fetch AMI ID from SSM Parameter Store
# This allows Packer to update the parameter, and Terraform to read it
data "aws_ssm_parameter" "ami_id" {
name = "/myapp/ami/${var.ami_version}"
}
# Alternative: Fetch AMI by tags (useful for cross-account scenarios)
data "aws_ami" "app" {
most_recent = true
owners = ["self"]
filter {
name = "name"
values = ["myapp-*"]
}
filter {
name = "tag:Application"
values = [var.app_name]
}
# Optional: filter by specific version
dynamic "filter" {
for_each = var.ami_version != "latest" ? [1] : []
content {
name = "tag:Version"
values = [var.ami_version]
}
}
}
# Launch template - preferred over launch configurations
resource "aws_launch_template" "app" {
name_prefix = "${var.app_name}-${var.environment}-"
image_id = data.aws_ssm_parameter.ami_id.value
instance_type = var.instance_type
# Use IMDSv2 only (security best practice)
metadata_options {
http_endpoint = "enabled"
http_tokens = "required" # Enforces IMDSv2
http_put_response_hop_limit = 1
}
# IAM role for the instance
iam_instance_profile {
name = aws_iam_instance_profile.app.name
}
# Security groups
vpc_security_group_ids = [aws_security_group.app.id]
# User data for instance-specific configuration
user_data = base64encode(templatefile("${path.module}/user-data.sh", {
environment = var.environment
app_name = var.app_name
}))
# Enable detailed monitoring
monitoring {
enabled = true
}
# Root volume (already encrypted in AMI, but explicit is good)
block_device_mappings {
device_name = "/dev/xvda"
ebs {
encrypted = true
volume_type = "gp3"
volume_size = 30
}
}
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.app_name}-${var.environment}"
Environment = var.environment
Application = var.app_name
}
}
lifecycle {
create_before_destroy = true
}
}
# Auto Scaling Group
resource "aws_autoscaling_group" "app" {
name = "${var.app_name}-${var.environment}"
vpc_zone_identifier = var.subnet_ids
target_group_arns = var.target_group_arns
health_check_type = "ELB" # Use ALB health checks
health_check_grace_period = 300
min_size = var.min_size
max_size = var.max_size
desired_capacity = var.desired_capacity
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
# Rolling update configuration
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 75 # Keep 75% healthy during update
instance_warmup = 120 # Wait 2 mins before considering healthy
}
triggers = ["tag"] # Refresh when tags change
}
# Termination policy for predictable scaling
termination_policies = ["OldestInstance"]
# Tags propagated to instances
tag {
key = "Name"
value = "${var.app_name}-${var.environment}"
propagate_at_launch = true
}
tag {
key = "Environment"
value = var.environment
propagate_at_launch = true
}
# AMI version tag for debugging
tag {
key = "AMI-Version"
value = var.ami_version
propagate_at_launch = true
}
lifecycle {
create_before_destroy = true
# Ignore desired_capacity changes from autoscaling
ignore_changes = [desired_capacity]
}
}
# Security group
resource "aws_security_group" "app" {
name_prefix = "${var.app_name}-${var.environment}-"
vpc_id = var.vpc_id
# Allow inbound from ALB only
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = var.alb_security_group_ids
}
# Allow all outbound
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
}
# IAM role for instances
resource "aws_iam_role" "app" {
name_prefix = "${var.app_name}-${var.environment}-"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
# SSM access for Session Manager (no SSH needed)
resource "aws_iam_role_policy_attachment" "ssm" {
role = aws_iam_role.app.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
resource "aws_iam_instance_profile" "app" {
name_prefix = "${var.app_name}-${var.environment}-"
role = aws_iam_role.app.name
}
output "asg_name" {
value = aws_autoscaling_group.app.name
}
output "launch_template_id" {
value = aws_launch_template.app.id
}
Environment Configuration
# terraform/environments/production/main.tf
provider "aws" {
region = "eu-west-1"
}
module "app_asg" {
source = "../../modules/asg"
app_name = "myapp"
environment = "production"
# Pin to specific version in production
# Change this to deploy a new version
ami_version = "v1.2.3" # Or use "latest" for auto-deploy
instance_type = "t3.large"
min_size = 3
max_size = 20
desired_capacity = 5
vpc_id = data.aws_vpc.main.id
subnet_ids = data.aws_subnets.private.ids
target_group_arns = [aws_lb_target_group.app.arn]
}
Rollback Strategy
Instant rollbacks are one of the biggest benefits of immutable AMIs.
Option 1: Terraform Rollback
# Change ami_version in Terraform and apply
# terraform/environments/production/main.tf
# ami_version = "v1.2.2" # Previous version
terraform apply
The ASG instance refresh will automatically roll out the old AMI.
Option 2: Manual ASG Update
For emergency rollbacks without Terraform:
#!/bin/bash
# scripts/rollback.sh
set -euo pipefail
PREVIOUS_AMI="ami-0abc123" # Get from SSM or tags
ASG_NAME="myapp-production"
LAUNCH_TEMPLATE_NAME="myapp-production"
# Create new launch template version with old AMI
aws ec2 create-launch-template-version \
--launch-template-name "${LAUNCH_TEMPLATE_NAME}" \
--source-version '$Latest' \
--launch-template-data "{\"ImageId\":\"${PREVIOUS_AMI}\"}"
# Start instance refresh
aws autoscaling start-instance-refresh \
--auto-scaling-group-name "${ASG_NAME}" \
--preferences '{
"MinHealthyPercentage": 75,
"InstanceWarmup": 120
}'
echo "Rollback initiated. Monitor with:"
echo "aws autoscaling describe-instance-refreshes --auto-scaling-group-name ${ASG_NAME}"
Option 3: Blue-Green with Target Groups
For zero-downtime rollbacks, use blue-green deployments:
# Two ASGs - blue and green
# Switch ALB listener between them
resource "aws_lb_listener_rule" "app" {
listener_arn = aws_lb_listener.https.arn
priority = 100
action {
type = "forward"
# Switch between blue and green target groups
target_group_arn = var.active_color == "blue" ?
aws_lb_target_group.blue.arn :
aws_lb_target_group.green.arn
}
condition {
path_pattern {
values = ["/*"]
}
}
}
AMI Maintenance
Keeping AMI inventory clean is crucial for cost and security.
Automated Cleanup
#!/bin/bash
# scripts/cleanup-old-amis.sh
# Run weekly via cron or scheduled Lambda
set -euo pipefail
APP_NAME="myapp"
KEEP_COUNT=5 # Keep last 5 AMIs
echo "=== Cleaning up old AMIs for ${APP_NAME} ==="
# Get all AMIs sorted by creation date
AMIS=$(aws ec2 describe-images \
--owners self \
--filters "Name=tag:Application,Values=${APP_NAME}" \
--query 'sort_by(Images, &CreationDate)[*].[ImageId,CreationDate,Name]' \
--output text)
TOTAL=$(echo "${AMIS}" | wc -l)
DELETE_COUNT=$((TOTAL - KEEP_COUNT))
if [ ${DELETE_COUNT} -le 0 ]; then
echo "Only ${TOTAL} AMIs exist, keeping all"
exit 0
fi
echo "Found ${TOTAL} AMIs, will delete ${DELETE_COUNT}"
# Get AMIs to delete (oldest first)
TO_DELETE=$(echo "${AMIS}" | head -n ${DELETE_COUNT})
echo "${TO_DELETE}" | while read -r ami_id created_at ami_name; do
echo "Deleting: ${ami_id} (${ami_name}, created ${created_at})"
# Get associated snapshots
SNAPSHOTS=$(aws ec2 describe-images \
--image-ids "${ami_id}" \
--query 'Images[0].BlockDeviceMappings[*].Ebs.SnapshotId' \
--output text)
# Deregister AMI
aws ec2 deregister-image --image-id "${ami_id}"
# Delete snapshots
for snapshot in ${SNAPSHOTS}; do
if [ "${snapshot}" != "None" ]; then
echo " Deleting snapshot: ${snapshot}"
aws ec2 delete-snapshot --snapshot-id "${snapshot}"
fi
done
done
echo "=== Cleanup complete ==="
Lambda for Scheduled Cleanup
# lambda/cleanup_amis.py
import boto3
from datetime import datetime, timedelta
def handler(event, context):
ec2 = boto3.client('ec2')
app_name = event.get('app_name', 'myapp')
keep_count = event.get('keep_count', 5)
# Get all AMIs for the application
response = ec2.describe_images(
Owners=['self'],
Filters=[
{'Name': 'tag:Application', 'Values': [app_name]},
{'Name': 'state', 'Values': ['available']}
]
)
# Sort by creation date
amis = sorted(response['Images'], key=lambda x: x['CreationDate'])
# Calculate how many to delete
delete_count = len(amis) - keep_count
if delete_count <= 0:
print(f"Only {len(amis)} AMIs exist, nothing to delete")
return {'deleted': 0}
deleted = 0
for ami in amis[:delete_count]:
ami_id = ami['ImageId']
print(f"Deleting AMI: {ami_id}")
# Get snapshots
snapshots = [
bdm['Ebs']['SnapshotId']
for bdm in ami.get('BlockDeviceMappings', [])
if 'Ebs' in bdm and 'SnapshotId' in bdm['Ebs']
]
# Deregister AMI
ec2.deregister_image(ImageId=ami_id)
# Delete snapshots
for snapshot_id in snapshots:
print(f" Deleting snapshot: {snapshot_id}")
ec2.delete_snapshot(SnapshotId=snapshot_id)
deleted += 1
return {'deleted': deleted}
Security Best Practices
AMI Hardening Checklist
ITEM STATUS NOTES
==== ====== =====
Root login disabled [x] /etc/ssh/sshd_config
Password auth disabled [x] SSH keys only (or no SSH)
No SSH keys baked in [x] Removed in cleanup script
Root volume encrypted [x] Packer template
IMDSv2 enforced [x] Launch template
Non-root application user [x] appuser in install script
Systemd security options [x] NoNewPrivileges, ProtectSystem
Automatic security updates [x] yum-cron or SSM Patch Manager
CloudWatch agent installed [x] Logs and metrics
SSM agent installed [x] No SSH needed
File integrity monitoring [ ] Consider AIDE or Wazuh
CIS benchmark compliance [ ] Use amazon-linux-cis AMI or hardening script
CIS Hardening Script
#!/bin/bash
# scripts/cis-hardening.sh
# Based on CIS Amazon Linux 2 Benchmark
set -euo pipefail
echo "=== Applying CIS hardening ==="
# 1.1.1 - Disable unused filesystems
for fs in cramfs freevxfs jffs2 hfs hfsplus squashfs udf; do
echo "install ${fs} /bin/true" >> /etc/modprobe.d/CIS.conf
done
# 1.4.1 - Ensure permissions on bootloader config
chmod 600 /boot/grub2/grub.cfg 2>/dev/null || true
# 2.2.x - Remove unnecessary services
for svc in rpcbind cups avahi-daemon; do
systemctl disable ${svc} 2>/dev/null || true
systemctl stop ${svc} 2>/dev/null || true
done
# 3.1.1 - Disable IP forwarding
echo "net.ipv4.ip_forward = 0" >> /etc/sysctl.d/99-cis.conf
# 3.2.2 - Disable ICMP redirects
echo "net.ipv4.conf.all.accept_redirects = 0" >> /etc/sysctl.d/99-cis.conf
echo "net.ipv4.conf.default.accept_redirects = 0" >> /etc/sysctl.d/99-cis.conf
# 4.1.x - Configure auditd
yum install -y audit
systemctl enable auditd
# 5.2.x - SSH hardening (additional)
cat >> /etc/ssh/sshd_config << 'EOF'
Protocol 2
MaxAuthTries 4
IgnoreRhosts yes
HostbasedAuthentication no
PermitEmptyPasswords no
ClientAliveInterval 300
ClientAliveCountMax 0
LoginGraceTime 60
AllowTcpForwarding no
X11Forwarding no
EOF
# 5.4.1 - Password requirements
# (Not needed if using SSM-only access)
# Apply sysctl changes
sysctl -p /etc/sysctl.d/99-cis.conf
echo "=== CIS hardening complete ==="
Secrets Management
Never bake secrets into AMIs. Use these approaches instead:
# Option 1: AWS Secrets Manager (recommended)
# In your application startup script:
DB_PASSWORD=$(aws secretsmanager get-secret-value \
--secret-id myapp/production/db \
--query SecretString --output text | jq -r '.password')
# Option 2: SSM Parameter Store
DB_PASSWORD=$(aws ssm get-parameter \
--name /myapp/production/db-password \
--with-decryption \
--query Parameter.Value --output text)
# Option 3: Instance metadata + IAM role
# Attach IAM role with access to specific secrets
# Application uses AWS SDK to fetch at runtime
Troubleshooting
AMI Build Failures
Problem: Build times out waiting for SSH
==> amazon-ebs.app: Waiting for SSH to become available...
==> amazon-ebs.app: Timeout waiting for SSH.
Solution: Check security group allows outbound to SSM endpoints, or use public subnet with internet access.
Problem: Provisioner script fails
Solution: Add -x to bash scripts for verbose output:
provisioner "shell" {
inline = ["set -x", "bash /tmp/script.sh"]
}
Deployment Issues
Problem: New instances fail health checks
Solution:
- Check security group allows ALB health check port
- Verify application starts correctly with
journalctl -u myapp - Increase
health_check_grace_periodif app needs warm-up time
Problem: Instance refresh stuck
# Check refresh status
aws autoscaling describe-instance-refreshes \
--auto-scaling-group-name myapp-production
# Cancel if needed
aws autoscaling cancel-instance-refresh \
--auto-scaling-group-name myapp-production
References
- Packer Documentation: https://developer.hashicorp.com/packer/docs
- AWS AMI Best Practices: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html
- CIS Amazon Linux 2 Benchmark: https://www.cisecurity.org/benchmark/amazon_linux
- Terraform ASG Module: https://registry.terraform.io/modules/terraform-aws-modules/autoscaling
- AWS Instance Refresh: https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html