NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep
NAT Gateways are AWS’s best-kept profit center. They’re easy to set up, fully managed, and quietly drain your budget at $0.045/hour plus $0.045/GB of data processed.
Run the numbers on a moderately busy workload - 1TB of outbound traffic per month - and you’re looking at $77/month. Per NAT Gateway. Per AZ. For something that just routes packets.
In one environment I worked on, NAT Gateway costs were 40% of the total AWS bill. Not compute. Not storage. NAT Gateways.
Let’s fix that.
TL;DR
- NAT Gateways cost $0.045/hour + $0.045/GB - adds up fast
- NAT instances can cut costs 80%+ but require management
- VPC endpoints eliminate NAT entirely for AWS services
- IPv6 removes the need for NAT for many workloads
- The right solution depends on your traffic patterns and team capacity
Understanding the Cost
Before optimising, understand where the money goes:
NAT Gateway Pricing (us-east-1):
- Hourly charge: $0.045/hour = $32.40/month per NAT Gateway
- Data processing: $0.045/GB
Example: 3 AZs, 2TB outbound/month each
- Hourly: 3 × $32.40 = $97.20/month
- Data: 6TB × $0.045 = $270/month
- Total: $367.20/month just for NAT
And that’s before data transfer charges to the internet ($0.09/GB for the first 10TB).
Where does NAT traffic come from?
Most teams are surprised when they analyze their NAT traffic:
- AWS API calls - Every
aws s3 cp, ECR image pull, Secrets Manager fetch - Package downloads - npm, pip, apt during builds and deployments
- External APIs - Payment providers, SaaS integrations
- Logging/monitoring - If you’re shipping to external services
- Legitimate application traffic - Your actual workload
Categories 1 and 2 often dominate - and they’re the easiest to eliminate.
Solution 1: VPC Endpoints (Gateway & Interface)
Best for: Eliminating NAT traffic to AWS services
VPC Endpoints let private subnets talk directly to AWS services without going through NAT.
Gateway Endpoints (Free)
S3 and DynamoDB have Gateway Endpoints - completely free, just routing table entries.
# Terraform - S3 Gateway Endpoint
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = [
aws_route_table.private_a.id,
aws_route_table.private_b.id,
aws_route_table.private_c.id,
]
tags = {
Name = "s3-gateway-endpoint"
}
}
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.dynamodb"
vpc_endpoint_type = "Gateway"
route_table_ids = [
aws_route_table.private_a.id,
aws_route_table.private_b.id,
aws_route_table.private_c.id,
]
tags = {
Name = "dynamodb-gateway-endpoint"
}
}
Impact: If you’re pulling container images from ECR (which uses S3), this alone can cut NAT traffic by 50%+.
Interface Endpoints (Paid, but cheaper than NAT)
For other AWS services, Interface Endpoints cost $0.01/hour + $0.01/GB - significantly cheaper than NAT’s $0.045/GB.
Priority order for Interface Endpoints:
# High-value endpoints - create these first
locals {
interface_endpoints = [
"ecr.api", # Container registry API
"ecr.dkr", # Container registry Docker
"logs", # CloudWatch Logs
"secretsmanager", # Secrets Manager
"ssm", # Systems Manager
"ssmmessages", # Session Manager
"ec2messages", # SSM agent
"sts", # STS for IAM roles
"kms", # KMS for encryption
]
}
resource "aws_vpc_endpoint" "interface" {
for_each = toset(local.interface_endpoints)
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.${each.value}"
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "${each.value}-endpoint"
}
}
resource "aws_security_group" "vpc_endpoints" {
name_prefix = "vpc-endpoints-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [aws_vpc.main.cidr_block]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Cost comparison for ECR pulls (1TB/month):
Via NAT Gateway: 1000GB × $0.045 = $45.00
Via Interface EP: 1000GB × $0.01 = $10.00 + $7.20 (hourly)
Savings: ~62%
Solution 2: NAT Instances
Best for: Teams comfortable with EC2 management, high-throughput workloads
A NAT instance is just an EC2 instance configured to forward traffic. No per-GB charge - just the instance cost.
Modern NAT Instance Setup
# Use the latest Amazon Linux 2023 AMI with NAT configuration
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
}
resource "aws_instance" "nat" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t3.micro" # Start small, monitor
subnet_id = var.public_subnet_id
associate_public_ip_address = true
source_dest_check = false # Required for NAT
iam_instance_profile = aws_iam_instance_profile.nat.name
user_data = <<-EOF
#!/bin/bash
# Enable IP forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
# Configure iptables for NAT
yum install -y iptables-services
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth0 -o eth0 -j ACCEPT
service iptables save
systemctl enable iptables
EOF
tags = {
Name = "nat-instance"
}
}
# Route table for private subnets
resource "aws_route" "nat_instance" {
route_table_id = var.private_route_table_id
destination_cidr_block = "0.0.0.0/0"
network_interface_id = aws_instance.nat.primary_network_interface_id
}
Cost Comparison
NAT Gateway (3 AZs, 2TB/month):
- Hourly: 3 × $32.40 = $97.20
- Data: 2000GB × $0.045 = $90.00
- Total: $187.20/month
NAT Instance (t3.small, single AZ):
- Instance: $15.18/month (on-demand)
- Total: $15.18/month
Savings: 92%
The Trade-offs
NAT instances require you to manage:
- High availability - Instance failure = no outbound connectivity
- Scaling - t3.micro maxes out at ~5Gbps
- Patching - It’s your EC2, you patch it
- Monitoring - Network throughput, CPU, connections
HA NAT Instance Architecture
For production, run NAT instances in an Auto Scaling Group:
resource "aws_autoscaling_group" "nat" {
name = "nat-asg"
min_size = 1
max_size = 1
desired_capacity = 1
vpc_zone_identifier = [var.public_subnet_id]
launch_template {
id = aws_launch_template.nat.id
version = "$Latest"
}
health_check_type = "EC2"
health_check_grace_period = 120
tag {
key = "Name"
value = "nat-instance"
propagate_at_launch = true
}
lifecycle {
create_before_destroy = true
}
}
# Lambda to update route table when instance replaces
resource "aws_lambda_function" "nat_failover" {
filename = "nat_failover.zip"
function_name = "nat-route-failover"
role = aws_iam_role.nat_failover.arn
handler = "index.handler"
runtime = "python3.11"
environment {
variables = {
ROUTE_TABLE_ID = var.private_route_table_id
}
}
}
Solution 3: IPv6
Best for: Modern architectures, eliminating NAT entirely
IPv6 addresses are globally routable - no NAT needed. AWS provides them free.
Enabling IPv6
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
assign_generated_ipv6_cidr_block = true
tags = {
Name = "main-vpc"
}
}
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
ipv6_cidr_block = cidrsubnet(aws_vpc.main.ipv6_cidr_block, 8, 1)
assign_ipv6_address_on_creation = true
tags = {
Name = "private-subnet"
}
}
# Egress-only internet gateway for IPv6
resource "aws_egress_only_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
resource "aws_route" "private_ipv6" {
route_table_id = aws_route_table.private.id
destination_ipv6_cidr_block = "::/0"
egress_only_gateway_id = aws_egress_only_internet_gateway.main.id
}
The Catch
Not everything supports IPv6:
- Many third-party APIs are IPv4-only
- Some AWS services don’t have IPv6 endpoints
- Legacy applications may not handle dual-stack
Hybrid approach: Use IPv6 for AWS-to-internet traffic, keep a small NAT Gateway for IPv4-only destinations.
Solution 4: Architectural Changes
Sometimes the best NAT optimization is not needing NAT.
Move builds to public subnets
CI/CD runners pulling packages don’t need to be in private subnets:
# GitLab Runner in public subnet with no private data
[[runners]]
executor = "docker"
[runners.docker]
network_mode = "host"
# Runner in public subnet, direct internet access
Use ECR pull-through cache
Instead of pulling from Docker Hub (through NAT), cache in ECR:
# Create pull-through cache rule
aws ecr create-pull-through-cache-rule \
--ecr-repository-prefix docker-hub \
--upstream-registry-url registry-1.docker.io
# Pull via ECR (through VPC endpoint, no NAT)
docker pull 123456789.dkr.ecr.us-east-1.amazonaws.com/docker-hub/nginx:latest
Pre-bake AMIs and container images
Don’t download packages at runtime:
# Bad: Downloads at every deploy
FROM node:20
RUN npm install
# Good: Dependencies in image
FROM node:20 as builder
COPY package*.json ./
RUN npm ci
FROM node:20-slim
COPY --from=builder /node_modules ./node_modules
Use S3 for artifact distribution
Instead of downloading from the internet:
# Upload build artifacts to S3 (via gateway endpoint)
aws s3 cp build.zip s3://my-artifacts/
# Download in private subnet (no NAT needed)
aws s3 cp s3://my-artifacts/build.zip .
Decision Framework
| Scenario | Recommendation |
|---|---|
| Mostly AWS API calls | VPC Endpoints (Gateway + Interface) |
| High throughput, ops capacity | NAT Instances |
| New/modern architecture | IPv6 with minimal NAT fallback |
| Cost-critical, low traffic | Single NAT Gateway + VPC Endpoints |
| Multi-AZ HA required | NAT Gateway (accept the cost) |
My Recommended Stack
For most production environments:
# 1. Gateway endpoints (free) - always
resource "aws_vpc_endpoint" "s3" { ... }
resource "aws_vpc_endpoint" "dynamodb" { ... }
# 2. Interface endpoints for heavy AWS services
resource "aws_vpc_endpoint" "ecr_api" { ... }
resource "aws_vpc_endpoint" "ecr_dkr" { ... }
resource "aws_vpc_endpoint" "logs" { ... }
# 3. Single NAT Gateway for remaining traffic
resource "aws_nat_gateway" "main" {
# One NAT Gateway, not three
# Accept ~5 min failover during AZ issues
# Use for actual internet-bound traffic only
}
# 4. Enable IPv6 for future flexibility
resource "aws_vpc" "main" {
assign_generated_ipv6_cidr_block = true
}
Result: 60-80% cost reduction with minimal operational overhead.
Monitoring NAT Costs
Set up alerts before costs spiral:
resource "aws_cloudwatch_metric_alarm" "nat_bytes" {
alarm_name = "nat-gateway-high-throughput"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "BytesOutToDestination"
namespace = "AWS/NATGateway"
period = 86400 # Daily
statistic = "Sum"
threshold = 107374182400 # 100GB/day
dimensions = {
NatGatewayId = aws_nat_gateway.main.id
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
Use VPC Flow Logs to identify what’s generating traffic:
# Query flow logs for NAT traffic
aws logs filter-log-events \
--log-group-name vpc-flow-logs \
--filter-pattern "[version, account, eni, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, status]" \
--query 'events[*].message' \
| grep "NAT-gateway-eni"
Conclusion
NAT Gateways are convenient but expensive. For most workloads:
- Start with VPC Endpoints - Free for S3/DynamoDB, cheap for other AWS services
- Analyze your traffic - Know what’s going through NAT before optimising
- Consider NAT instances - If you have ops capacity and high throughput
- Enable IPv6 - Future-proof your architecture
The “right” answer depends on your traffic patterns, team capacity, and risk tolerance. But doing nothing and paying $0.045/GB is almost never the right answer.