Backstage on AWS ECS - Production-Ready Deployment
Backstage is Spotify’s open-source developer portal. It unifies all your infrastructure tooling, services, and documentation into a single interface. This guide covers deploying Backstage to AWS ECS Fargate with PostgreSQL RDS for persistence and Cognito for authentication.
┌─────────────────────────────────────────┐
│ AWS Cloud │
│ │
Users ──────────────┤──► ALB ──► ECS Fargate (Backstage) │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ Cognito RDS PostgreSQL │
│ │ │ │
└──────────────┤──────────────┘ │
(OAuth2) │ │
└─────────────────────────────────────────┘
TL;DR
Code Repository: All code from this post is available at github.com/moabukar/blog-code/backstage-aws-ecs-production
- Backstage on ECS Fargate (serverless containers)
- PostgreSQL RDS for catalog and app-config storage
- Cognito User Pool for authentication
- ALB with HTTPS termination
- Secrets Manager for credentials
- Terraform for infrastructure
- GitHub Actions for CI/CD
Architecture Overview
COMPONENT SERVICE PURPOSE
========= ======= =======
Compute ECS Fargate Serverless container hosting
Database RDS PostgreSQL Catalog storage, app state
Auth Cognito User Pool OAuth2/OIDC authentication
Load Balancer Application LB HTTPS termination, routing
DNS Route 53 Custom domain
Secrets Secrets Manager DB creds, API keys
Container Registry ECR Backstage Docker images
Networking VPC Private subnets, NAT Gateway
Monitoring CloudWatch Logs, metrics, alarms
Project Structure
backstage-aws/
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── vpc.tf
│ ├── rds.tf
│ ├── ecs.tf
│ ├── alb.tf
│ ├── cognito.tf
│ ├── ecr.tf
│ ├── secrets.tf
│ ├── iam.tf
│ └── cloudwatch.tf
├── backstage/
│ ├── app-config.yaml
│ ├── app-config.production.yaml
│ ├── Dockerfile
│ ├── packages/
│ │ ├── app/
│ │ └── backend/
│ └── package.json
├── .github/
│ └── workflows/
│ └── deploy.yml
└── README.md
Prerequisites
TOOL VERSION INSTALLATION
==== ======= ============
Terraform >= 1.5 brew install terraform
AWS CLI >= 2.0 brew install awscli
Node.js >= 18 brew install node@18
Docker >= 24 Docker Desktop
AWS account with permissions for:
- ECS, ECR, RDS, ALB, Cognito, Secrets Manager, VPC, Route 53, CloudWatch
Part 1: VPC and Networking
First, create the network foundation:
# terraform/vpc.tf
data "aws_availability_zones" "available" {
state = "available"
}
locals {
azs = slice(data.aws_availability_zones.available.names, 0, 3)
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.1.2"
name = "${var.project_name}-vpc"
cidr = var.vpc_cidr
azs = local.azs
private_subnets = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 4, i)]
public_subnets = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 4, i + 4)]
database_subnets = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 4, i + 8)]
enable_nat_gateway = true
single_nat_gateway = var.environment != "production"
enable_dns_hostnames = true
enable_dns_support = true
create_database_subnet_group = true
tags = {
Environment = var.environment
Project = var.project_name
}
}
# Security Groups
resource "aws_security_group" "alb" {
name = "${var.project_name}-alb-sg"
description = "Security group for ALB"
vpc_id = module.vpc.vpc_id
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP redirect"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-alb-sg"
}
}
resource "aws_security_group" "ecs" {
name = "${var.project_name}-ecs-sg"
description = "Security group for ECS tasks"
vpc_id = module.vpc.vpc_id
ingress {
description = "Traffic from ALB"
from_port = var.backstage_port
to_port = var.backstage_port
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-ecs-sg"
}
}
resource "aws_security_group" "rds" {
name = "${var.project_name}-rds-sg"
description = "Security group for RDS"
vpc_id = module.vpc.vpc_id
ingress {
description = "PostgreSQL from ECS"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.ecs.id]
}
tags = {
Name = "${var.project_name}-rds-sg"
}
}
Part 2: RDS PostgreSQL
Backstage requires PostgreSQL for the catalog database:
# terraform/rds.tf
resource "random_password" "db_password" {
length = 32
special = false
}
resource "aws_secretsmanager_secret" "db_credentials" {
name = "${var.project_name}/db-credentials"
recovery_window_in_days = 7
tags = {
Name = "${var.project_name}-db-credentials"
}
}
resource "aws_secretsmanager_secret_version" "db_credentials" {
secret_id = aws_secretsmanager_secret.db_credentials.id
secret_string = jsonencode({
username = var.db_username
password = random_password.db_password.result
host = module.rds.db_instance_address
port = 5432
database = var.db_name
})
}
module "rds" {
source = "terraform-aws-modules/rds/aws"
version = "6.1.1"
identifier = "${var.project_name}-postgres"
engine = "postgres"
engine_version = "15.4"
family = "postgres15"
major_engine_version = "15"
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_max_allocated_storage
db_name = var.db_name
username = var.db_username
password = random_password.db_password.result
port = 5432
multi_az = var.environment == "production"
db_subnet_group_name = module.vpc.database_subnet_group_name
vpc_security_group_ids = [aws_security_group.rds.id]
maintenance_window = "Mon:00:00-Mon:03:00"
backup_window = "03:00-06:00"
backup_retention_period = var.environment == "production" ? 30 : 7
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
performance_insights_enabled = true
performance_insights_retention_period = 7
deletion_protection = var.environment == "production"
skip_final_snapshot = var.environment != "production"
parameters = [
{
name = "log_statement"
value = "all"
},
{
name = "log_min_duration_statement"
value = "1000"
}
]
tags = {
Environment = var.environment
Project = var.project_name
}
}
Database sizing guidelines:
ENVIRONMENT INSTANCE CLASS STORAGE MULTI-AZ
=========== ============== ======= ========
Development db.t3.micro 20 GB No
Staging db.t3.small 50 GB No
Production db.r6g.large 100 GB Yes
Part 3: Cognito Authentication
Set up Cognito User Pool for OAuth2/OIDC authentication:
# terraform/cognito.tf
resource "aws_cognito_user_pool" "backstage" {
name = "${var.project_name}-users"
# Password policy
password_policy {
minimum_length = 12
require_lowercase = true
require_numbers = true
require_symbols = true
require_uppercase = true
temporary_password_validity_days = 7
}
# MFA configuration
mfa_configuration = var.environment == "production" ? "ON" : "OPTIONAL"
software_token_mfa_configuration {
enabled = true
}
# Account recovery
account_recovery_setting {
recovery_mechanism {
name = "verified_email"
priority = 1
}
}
# Email configuration
email_configuration {
email_sending_account = "COGNITO_DEFAULT"
}
# Schema attributes
schema {
name = "email"
attribute_data_type = "String"
required = true
mutable = true
string_attribute_constraints {
min_length = 1
max_length = 256
}
}
schema {
name = "name"
attribute_data_type = "String"
required = true
mutable = true
string_attribute_constraints {
min_length = 1
max_length = 256
}
}
# Auto-verified attributes
auto_verified_attributes = ["email"]
# User pool add-ons
user_pool_add_ons {
advanced_security_mode = "ENFORCED"
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_cognito_user_pool_domain" "backstage" {
domain = "${var.project_name}-${var.environment}"
user_pool_id = aws_cognito_user_pool.backstage.id
}
resource "aws_cognito_user_pool_client" "backstage" {
name = "${var.project_name}-client"
user_pool_id = aws_cognito_user_pool.backstage.id
generate_secret = true
# OAuth configuration
allowed_oauth_flows = ["code"]
allowed_oauth_flows_user_pool_client = true
allowed_oauth_scopes = ["email", "openid", "profile"]
callback_urls = [
"https://${var.domain_name}/api/auth/aws-alb-oidc/handler/frame",
"https://${var.domain_name}/api/auth/cognito/handler/frame"
]
logout_urls = [
"https://${var.domain_name}"
]
supported_identity_providers = ["COGNITO"]
# Token validity
access_token_validity = 1 # hours
id_token_validity = 1 # hours
refresh_token_validity = 30 # days
token_validity_units {
access_token = "hours"
id_token = "hours"
refresh_token = "days"
}
# Prevent user existence errors
prevent_user_existence_errors = "ENABLED"
explicit_auth_flows = [
"ALLOW_REFRESH_TOKEN_AUTH",
"ALLOW_USER_SRP_AUTH"
]
}
# Store client secret in Secrets Manager
resource "aws_secretsmanager_secret" "cognito_client_secret" {
name = "${var.project_name}/cognito-client-secret"
recovery_window_in_days = 7
}
resource "aws_secretsmanager_secret_version" "cognito_client_secret" {
secret_id = aws_secretsmanager_secret.cognito_client_secret.id
secret_string = jsonencode({
client_id = aws_cognito_user_pool_client.backstage.id
client_secret = aws_cognito_user_pool_client.backstage.client_secret
user_pool_id = aws_cognito_user_pool.backstage.id
domain = aws_cognito_user_pool_domain.backstage.domain
region = var.aws_region
})
}
# Create admin group
resource "aws_cognito_user_group" "admins" {
name = "admins"
user_pool_id = aws_cognito_user_pool.backstage.id
description = "Backstage administrators"
}
resource "aws_cognito_user_group" "developers" {
name = "developers"
user_pool_id = aws_cognito_user_pool.backstage.id
description = "Backstage developers"
}
Part 4: ECR and Docker Image
Create the container registry and Backstage Docker image:
# terraform/ecr.tf
resource "aws_ecr_repository" "backstage" {
name = "${var.project_name}/backstage"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "AES256"
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_ecr_lifecycle_policy" "backstage" {
repository = aws_ecr_repository.backstage.name
policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Keep last 10 images"
selection = {
tagStatus = "tagged"
tagPrefixList = ["v"]
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
},
{
rulePriority = 2
description = "Remove untagged images older than 7 days"
selection = {
tagStatus = "untagged"
countType = "sinceImagePushed"
countUnit = "days"
countNumber = 7
}
action = {
type = "expire"
}
}
]
})
}
Backstage Dockerfile:
# backstage/Dockerfile
# Stage 1: Build
FROM node:18-bookworm-slim AS build
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y \
python3 \
g++ \
make \
git \
&& rm -rf /var/lib/apt/lists/*
# Copy package files
COPY package.json yarn.lock ./
COPY packages/app/package.json ./packages/app/
COPY packages/backend/package.json ./packages/backend/
# Install dependencies
RUN yarn install --frozen-lockfile --network-timeout 600000
# Copy source
COPY . .
# Build backend
RUN yarn build:backend
# Stage 2: Production
FROM node:18-bookworm-slim AS production
WORKDIR /app
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
python3 \
g++ \
make \
git \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy built backend
COPY --from=build /app/packages/backend/dist ./packages/backend/dist
COPY --from=build /app/yarn.lock ./
COPY --from=build /app/package.json ./
# Copy backend package.json
COPY --from=build /app/packages/backend/package.json ./packages/backend/
# Install production dependencies
RUN yarn install --frozen-lockfile --production --network-timeout 600000
# Copy app-config
COPY app-config.yaml app-config.production.yaml ./
# Set environment
ENV NODE_ENV=production
# Create non-root user
RUN groupadd -r backstage && useradd -r -g backstage backstage
RUN chown -R backstage:backstage /app
USER backstage
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:7007/healthcheck || exit 1
EXPOSE 7007
CMD ["node", "packages/backend", "--config", "app-config.yaml", "--config", "app-config.production.yaml"]
Part 5: ECS Fargate Deployment
Deploy Backstage to ECS Fargate:
# terraform/ecs.tf
resource "aws_ecs_cluster" "backstage" {
name = "${var.project_name}-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_ecs_cluster_capacity_providers" "backstage" {
cluster_name = aws_ecs_cluster.backstage.name
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
default_capacity_provider_strategy {
base = 1
weight = 100
capacity_provider = var.environment == "production" ? "FARGATE" : "FARGATE_SPOT"
}
}
resource "aws_cloudwatch_log_group" "backstage" {
name = "/ecs/${var.project_name}"
retention_in_days = var.environment == "production" ? 90 : 30
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_ecs_task_definition" "backstage" {
family = var.project_name
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.ecs_cpu
memory = var.ecs_memory
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = "backstage"
image = "${aws_ecr_repository.backstage.repository_url}:${var.image_tag}"
essential = true
portMappings = [
{
containerPort = var.backstage_port
protocol = "tcp"
}
]
environment = [
{
name = "NODE_ENV"
value = "production"
},
{
name = "APP_CONFIG_app_baseUrl"
value = "https://${var.domain_name}"
},
{
name = "APP_CONFIG_backend_baseUrl"
value = "https://${var.domain_name}"
},
{
name = "APP_CONFIG_backend_cors_origin"
value = "https://${var.domain_name}"
}
]
secrets = [
{
name = "POSTGRES_HOST"
valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:host::"
},
{
name = "POSTGRES_PORT"
valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:port::"
},
{
name = "POSTGRES_USER"
valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:username::"
},
{
name = "POSTGRES_PASSWORD"
valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:password::"
},
{
name = "COGNITO_CLIENT_ID"
valueFrom = "${aws_secretsmanager_secret.cognito_client_secret.arn}:client_id::"
},
{
name = "COGNITO_CLIENT_SECRET"
valueFrom = "${aws_secretsmanager_secret.cognito_client_secret.arn}:client_secret::"
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.backstage.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "backstage"
}
}
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:${var.backstage_port}/healthcheck || exit 1"]
interval = 30
timeout = 10
retries = 3
startPeriod = 60
}
}
])
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_ecs_service" "backstage" {
name = var.project_name
cluster = aws_ecs_cluster.backstage.id
task_definition = aws_ecs_task_definition.backstage.arn
desired_count = var.ecs_desired_count
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
launch_type = "FARGATE"
platform_version = "LATEST"
health_check_grace_period_seconds = 120
network_configuration {
security_groups = [aws_security_group.ecs.id]
subnets = module.vpc.private_subnets
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.backstage.arn
container_name = "backstage"
container_port = var.backstage_port
}
deployment_circuit_breaker {
enable = true
rollback = true
}
lifecycle {
ignore_changes = [task_definition, desired_count]
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
# Auto-scaling
resource "aws_appautoscaling_target" "backstage" {
max_capacity = var.ecs_max_count
min_capacity = var.ecs_min_count
resource_id = "service/${aws_ecs_cluster.backstage.name}/${aws_ecs_service.backstage.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "cpu" {
name = "${var.project_name}-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.backstage.resource_id
scalable_dimension = aws_appautoscaling_target.backstage.scalable_dimension
service_namespace = aws_appautoscaling_target.backstage.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
resource "aws_appautoscaling_policy" "memory" {
name = "${var.project_name}-memory-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.backstage.resource_id
scalable_dimension = aws_appautoscaling_target.backstage.scalable_dimension
service_namespace = aws_appautoscaling_target.backstage.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = 80.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
ECS sizing guidelines:
ENVIRONMENT CPU MEMORY DESIRED MIN MAX
=========== === ====== ======= === ===
Development 512 1024 1 1 2
Staging 1024 2048 2 1 4
Production 2048 4096 3 2 10
Part 6: Application Load Balancer
# terraform/alb.tf
resource "aws_lb" "backstage" {
name = "${var.project_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = module.vpc.public_subnets
enable_deletion_protection = var.environment == "production"
access_logs {
bucket = aws_s3_bucket.alb_logs.id
prefix = "alb"
enabled = true
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_lb_target_group" "backstage" {
name = "${var.project_name}-tg"
port = var.backstage_port
protocol = "HTTP"
vpc_id = module.vpc.vpc_id
target_type = "ip"
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
interval = 30
matcher = "200"
path = "/healthcheck"
port = "traffic-port"
protocol = "HTTP"
timeout = 10
}
stickiness {
type = "lb_cookie"
cookie_duration = 86400
enabled = true
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
# HTTPS listener
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.backstage.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = aws_acm_certificate_validation.backstage.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.backstage.arn
}
}
# HTTP to HTTPS redirect
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.backstage.arn
port = 80
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
# ACM Certificate
resource "aws_acm_certificate" "backstage" {
domain_name = var.domain_name
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_route53_record" "cert_validation" {
for_each = {
for dvo in aws_acm_certificate.backstage.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = data.aws_route53_zone.main.zone_id
}
resource "aws_acm_certificate_validation" "backstage" {
certificate_arn = aws_acm_certificate.backstage.arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
# Route 53 record
resource "aws_route53_record" "backstage" {
zone_id = data.aws_route53_zone.main.zone_id
name = var.domain_name
type = "A"
alias {
name = aws_lb.backstage.dns_name
zone_id = aws_lb.backstage.zone_id
evaluate_target_health = true
}
}
Part 7: IAM Roles
# terraform/iam.tf
# ECS Execution Role (for pulling images, writing logs)
resource "aws_iam_role" "ecs_execution" {
name = "${var.project_name}-ecs-execution"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_iam_role_policy_attachment" "ecs_execution" {
role = aws_iam_role.ecs_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
resource "aws_iam_role_policy" "ecs_execution_secrets" {
name = "${var.project_name}-secrets-access"
role = aws_iam_role.ecs_execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = [
aws_secretsmanager_secret.db_credentials.arn,
aws_secretsmanager_secret.cognito_client_secret.arn
]
}
]
})
}
# ECS Task Role (for application permissions)
resource "aws_iam_role" "ecs_task" {
name = "${var.project_name}-ecs-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
tags = {
Environment = var.environment
Project = var.project_name
}
}
# Add permissions for Backstage integrations
resource "aws_iam_role_policy" "ecs_task_permissions" {
name = "${var.project_name}-task-permissions"
role = aws_iam_role.ecs_task.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:ListBucket"
]
Resource = [
"arn:aws:s3:::${var.project_name}-*"
]
},
{
Effect = "Allow"
Action = [
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:GetParametersByPath"
]
Resource = [
"arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter/${var.project_name}/*"
]
}
]
})
}
Part 8: Backstage Configuration
Production app-config for Backstage:
# backstage/app-config.production.yaml
app:
title: Backstage Developer Portal
baseUrl: ${APP_CONFIG_app_baseUrl}
organization:
name: Your Company
backend:
baseUrl: ${APP_CONFIG_backend_baseUrl}
listen:
port: 7007
host: 0.0.0.0
cors:
origin: ${APP_CONFIG_backend_cors_origin}
methods: [GET, HEAD, PATCH, POST, PUT, DELETE]
credentials: true
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
database: backstage
ssl:
require: true
rejectUnauthorized: false
cache:
store: memory
csp:
connect-src: ["'self'", 'http:', 'https:']
img-src: ["'self'", 'data:', 'https:']
script-src: ["'self'", "'unsafe-eval'"]
auth:
environment: production
providers:
cognito:
production:
clientId: ${COGNITO_CLIENT_ID}
clientSecret: ${COGNITO_CLIENT_SECRET}
issuer: https://cognito-idp.${AWS_REGION}.amazonaws.com/${COGNITO_USER_POOL_ID}
signIn:
resolvers:
- resolver: emailMatchingUserEntityProfileEmail
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
catalog:
import:
entityFilename: catalog-info.yaml
pullRequestBranchName: backstage-integration
rules:
- allow: [Component, System, API, Resource, Location, Domain, Group, User]
locations:
- type: file
target: ../catalog-info.yaml
- type: url
target: https://github.com/your-org/software-catalog/blob/main/all.yaml
rules:
- allow: [Component, System, API, Resource, Domain]
techdocs:
builder: 'local'
generator:
runIn: 'docker'
publisher:
type: 'awsS3'
awsS3:
bucketName: ${TECHDOCS_BUCKET}
region: ${AWS_REGION}
kubernetes:
serviceLocatorMethod:
type: 'multiTenant'
clusterLocatorMethods:
- type: 'config'
clusters: []
scaffolder:
defaultAuthor:
name: Backstage Scaffolder
email: scaffolder@company.com
defaultCommitMessage: 'Initial commit from Backstage'
Part 9: CI/CD Pipeline
GitHub Actions workflow for deployment:
# .github/workflows/deploy.yml
name: Deploy Backstage
on:
push:
branches: [main]
workflow_dispatch:
inputs:
environment:
description: 'Environment to deploy'
required: true
default: 'staging'
type: choice
options:
- staging
- production
env:
AWS_REGION: eu-west-1
ECR_REPOSITORY: backstage/backstage
permissions:
id-token: write
contents: read
jobs:
build:
name: Build and Push
runs-on: ubuntu-latest
outputs:
image_tag: ${{ steps.meta.outputs.version }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
tags: |
type=sha,prefix=
type=ref,event=branch
type=semver,pattern={{version}}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
uses: docker/build-push-action@v5
with:
context: ./backstage
file: ./backstage/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
name: Deploy to ECS
needs: build
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment || 'staging' }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Download task definition
run: |
aws ecs describe-task-definition \
--task-definition backstage \
--query taskDefinition > task-definition.json
- name: Update task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: task-definition.json
container-name: backstage
image: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ needs.build.outputs.image_tag }}
- name: Deploy to Amazon ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: backstage
cluster: backstage-cluster
wait-for-service-stability: true
wait-for-minutes: 10
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Backstage deployment failed!",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Backstage Deployment Failed* :x:\nEnvironment: ${{ github.event.inputs.environment || 'staging' }}\nCommit: ${{ github.sha }}"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
Part 10: Variables and Outputs
# terraform/variables.tf
variable "project_name" {
description = "Project name"
type = string
default = "backstage"
}
variable "environment" {
description = "Environment (development, staging, production)"
type = string
}
variable "aws_region" {
description = "AWS region"
type = string
default = "eu-west-1"
}
variable "vpc_cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "domain_name" {
description = "Domain name for Backstage"
type = string
}
variable "db_name" {
description = "Database name"
type = string
default = "backstage"
}
variable "db_username" {
description = "Database username"
type = string
default = "backstage"
}
variable "db_instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.small"
}
variable "db_allocated_storage" {
description = "RDS allocated storage (GB)"
type = number
default = 20
}
variable "db_max_allocated_storage" {
description = "RDS max allocated storage (GB)"
type = number
default = 100
}
variable "ecs_cpu" {
description = "ECS task CPU units"
type = number
default = 1024
}
variable "ecs_memory" {
description = "ECS task memory (MB)"
type = number
default = 2048
}
variable "ecs_desired_count" {
description = "ECS desired task count"
type = number
default = 2
}
variable "ecs_min_count" {
description = "ECS minimum task count"
type = number
default = 1
}
variable "ecs_max_count" {
description = "ECS maximum task count"
type = number
default = 10
}
variable "backstage_port" {
description = "Backstage container port"
type = number
default = 7007
}
variable "image_tag" {
description = "Docker image tag"
type = string
default = "latest"
}
# terraform/outputs.tf
output "alb_dns_name" {
description = "ALB DNS name"
value = aws_lb.backstage.dns_name
}
output "backstage_url" {
description = "Backstage URL"
value = "https://${var.domain_name}"
}
output "cognito_user_pool_id" {
description = "Cognito User Pool ID"
value = aws_cognito_user_pool.backstage.id
}
output "cognito_domain" {
description = "Cognito domain"
value = "https://${aws_cognito_user_pool_domain.backstage.domain}.auth.${var.aws_region}.amazoncognito.com"
}
output "ecr_repository_url" {
description = "ECR repository URL"
value = aws_ecr_repository.backstage.repository_url
}
output "rds_endpoint" {
description = "RDS endpoint"
value = module.rds.db_instance_endpoint
sensitive = true
}
output "ecs_cluster_name" {
description = "ECS cluster name"
value = aws_ecs_cluster.backstage.name
}
Deployment
# Initialize Terraform
cd terraform
terraform init
# Plan changes
terraform plan -var-file=production.tfvars
# Apply infrastructure
terraform apply -var-file=production.tfvars
# Build and push Docker image
cd ../backstage
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin <account>.dkr.ecr.eu-west-1.amazonaws.com
docker build -t backstage .
docker tag backstage:latest <account>.dkr.ecr.eu-west-1.amazonaws.com/backstage/backstage:latest
docker push <account>.dkr.ecr.eu-west-1.amazonaws.com/backstage/backstage:latest
# Force new deployment
aws ecs update-service --cluster backstage-cluster --service backstage --force-new-deployment
Monitoring and Alerts
# terraform/cloudwatch.tf
resource "aws_cloudwatch_metric_alarm" "ecs_cpu_high" {
alarm_name = "${var.project_name}-cpu-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ECS"
period = 300
statistic = "Average"
threshold = 85
alarm_description = "ECS CPU utilization is high"
dimensions = {
ClusterName = aws_ecs_cluster.backstage.name
ServiceName = aws_ecs_service.backstage.name
}
alarm_actions = [aws_sns_topic.alerts.arn]
ok_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "rds_cpu_high" {
alarm_name = "${var.project_name}-rds-cpu-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "RDS CPU utilization is high"
dimensions = {
DBInstanceIdentifier = module.rds.db_instance_identifier
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "alb_5xx_errors" {
alarm_name = "${var.project_name}-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "HTTPCode_ELB_5XX_Count"
namespace = "AWS/ApplicationELB"
period = 300
statistic = "Sum"
threshold = 10
alarm_description = "ALB 5xx errors are high"
dimensions = {
LoadBalancer = aws_lb.backstage.arn_suffix
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_sns_topic" "alerts" {
name = "${var.project_name}-alerts"
}
Cost Estimation
Monthly cost estimate for production:
RESOURCE SIZE COST/MONTH (USD)
======== ==== ================
ECS Fargate 2x (2 vCPU, 4GB) ~$140
RDS PostgreSQL db.r6g.large ~$180
NAT Gateway 1x ~$45
ALB 1x ~$25
Route 53 1 zone ~$0.50
Secrets Manager 2 secrets ~$1
CloudWatch Logs + metrics ~$20
ECR ~5GB storage ~$0.50
------------------------------------------------
TOTAL ~$412/month
For non-production, use Fargate Spot and smaller instances:
RESOURCE SIZE COST/MONTH (USD)
======== ==== ================
ECS Fargate Spot 1x (1 vCPU, 2GB) ~$25
RDS PostgreSQL db.t3.micro ~$15
NAT Gateway 1x ~$45
ALB 1x ~$25
------------------------------------------------
TOTAL ~$115/month
Troubleshooting
ECS task failing to start:
# Check task stopped reason
aws ecs describe-tasks --cluster backstage-cluster --tasks <task-id>
# Check CloudWatch logs
aws logs tail /ecs/backstage --follow
Database connection issues:
# Test from bastion/local
psql -h <rds-endpoint> -U backstage -d backstage
# Check security groups allow traffic
aws ec2 describe-security-groups --group-ids <sg-id>
Cognito authentication failing:
# Verify callback URLs match exactly
aws cognito-idp describe-user-pool-client \
--user-pool-id <pool-id> \
--client-id <client-id>
Health check failing:
# Test health endpoint locally
curl http://localhost:7007/healthcheck
# Check ALB target health
aws elbv2 describe-target-health --target-group-arn <tg-arn>
References
- Backstage Documentation: https://backstage.io/docs
- AWS ECS Best Practices: https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide
- Terraform AWS Modules: https://registry.terraform.io/namespaces/terraform-aws-modules
- Cognito Developer Guide: https://docs.aws.amazon.com/cognito/latest/developerguide