Skip to content
Back to blog Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

AWSPlatform Engineering

Backstage on AWS ECS - Production-Ready Deployment

Backstage is Spotify’s open-source developer portal. It unifies all your infrastructure tooling, services, and documentation into a single interface. This guide covers deploying Backstage to AWS ECS Fargate with PostgreSQL RDS for persistence and Cognito for authentication.

                        ┌─────────────────────────────────────────┐
                        │              AWS Cloud                  │
                        │                                         │
    Users ──────────────┤──► ALB ──► ECS Fargate (Backstage)     │
         │              │              │           │              │
         │              │              ▼           ▼              │
         │              │           Cognito    RDS PostgreSQL     │
         │              │              │                          │
         └──────────────┤──────────────┘                          │
           (OAuth2)     │                                         │
                        └─────────────────────────────────────────┘

TL;DR

Code Repository: All code from this post is available at github.com/moabukar/blog-code/backstage-aws-ecs-production

  • Backstage on ECS Fargate (serverless containers)
  • PostgreSQL RDS for catalog and app-config storage
  • Cognito User Pool for authentication
  • ALB with HTTPS termination
  • Secrets Manager for credentials
  • Terraform for infrastructure
  • GitHub Actions for CI/CD

Architecture Overview

COMPONENT              SERVICE                 PURPOSE
=========              =======                 =======
Compute                ECS Fargate             Serverless container hosting
Database               RDS PostgreSQL          Catalog storage, app state
Auth                   Cognito User Pool       OAuth2/OIDC authentication
Load Balancer          Application LB          HTTPS termination, routing
DNS                    Route 53                Custom domain
Secrets                Secrets Manager         DB creds, API keys
Container Registry     ECR                     Backstage Docker images
Networking             VPC                     Private subnets, NAT Gateway
Monitoring             CloudWatch              Logs, metrics, alarms

Project Structure

backstage-aws/
├── terraform/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── vpc.tf
│   ├── rds.tf
│   ├── ecs.tf
│   ├── alb.tf
│   ├── cognito.tf
│   ├── ecr.tf
│   ├── secrets.tf
│   ├── iam.tf
│   └── cloudwatch.tf
├── backstage/
│   ├── app-config.yaml
│   ├── app-config.production.yaml
│   ├── Dockerfile
│   ├── packages/
│   │   ├── app/
│   │   └── backend/
│   └── package.json
├── .github/
│   └── workflows/
│       └── deploy.yml
└── README.md

Prerequisites

TOOL                   VERSION         INSTALLATION
====                   =======         ============
Terraform              >= 1.5          brew install terraform
AWS CLI                >= 2.0          brew install awscli
Node.js                >= 18           brew install node@18
Docker                 >= 24           Docker Desktop

AWS account with permissions for:

  • ECS, ECR, RDS, ALB, Cognito, Secrets Manager, VPC, Route 53, CloudWatch

Part 1: VPC and Networking

First, create the network foundation:

# terraform/vpc.tf

data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  azs = slice(data.aws_availability_zones.available.names, 0, 3)
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.2"

  name = "${var.project_name}-vpc"
  cidr = var.vpc_cidr

  azs              = local.azs
  private_subnets  = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 4, i)]
  public_subnets   = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 4, i + 4)]
  database_subnets = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 4, i + 8)]

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "production"
  enable_dns_hostnames   = true
  enable_dns_support     = true

  create_database_subnet_group = true

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

# Security Groups
resource "aws_security_group" "alb" {
  name        = "${var.project_name}-alb-sg"
  description = "Security group for ALB"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "HTTPS from internet"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTP redirect"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-alb-sg"
  }
}

resource "aws_security_group" "ecs" {
  name        = "${var.project_name}-ecs-sg"
  description = "Security group for ECS tasks"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description     = "Traffic from ALB"
    from_port       = var.backstage_port
    to_port         = var.backstage_port
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-ecs-sg"
  }
}

resource "aws_security_group" "rds" {
  name        = "${var.project_name}-rds-sg"
  description = "Security group for RDS"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description     = "PostgreSQL from ECS"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.ecs.id]
  }

  tags = {
    Name = "${var.project_name}-rds-sg"
  }
}

Part 2: RDS PostgreSQL

Backstage requires PostgreSQL for the catalog database:

# terraform/rds.tf

resource "random_password" "db_password" {
  length  = 32
  special = false
}

resource "aws_secretsmanager_secret" "db_credentials" {
  name                    = "${var.project_name}/db-credentials"
  recovery_window_in_days = 7

  tags = {
    Name = "${var.project_name}-db-credentials"
  }
}

resource "aws_secretsmanager_secret_version" "db_credentials" {
  secret_id = aws_secretsmanager_secret.db_credentials.id
  secret_string = jsonencode({
    username = var.db_username
    password = random_password.db_password.result
    host     = module.rds.db_instance_address
    port     = 5432
    database = var.db_name
  })
}

module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "6.1.1"

  identifier = "${var.project_name}-postgres"

  engine               = "postgres"
  engine_version       = "15.4"
  family               = "postgres15"
  major_engine_version = "15"
  instance_class       = var.db_instance_class

  allocated_storage     = var.db_allocated_storage
  max_allocated_storage = var.db_max_allocated_storage

  db_name  = var.db_name
  username = var.db_username
  password = random_password.db_password.result
  port     = 5432

  multi_az               = var.environment == "production"
  db_subnet_group_name   = module.vpc.database_subnet_group_name
  vpc_security_group_ids = [aws_security_group.rds.id]

  maintenance_window      = "Mon:00:00-Mon:03:00"
  backup_window           = "03:00-06:00"
  backup_retention_period = var.environment == "production" ? 30 : 7

  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]

  performance_insights_enabled          = true
  performance_insights_retention_period = 7

  deletion_protection = var.environment == "production"
  skip_final_snapshot = var.environment != "production"

  parameters = [
    {
      name  = "log_statement"
      value = "all"
    },
    {
      name  = "log_min_duration_statement"
      value = "1000"
    }
  ]

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

Database sizing guidelines:

ENVIRONMENT     INSTANCE CLASS      STORAGE     MULTI-AZ
===========     ==============      =======     ========
Development     db.t3.micro         20 GB       No
Staging         db.t3.small         50 GB       No
Production      db.r6g.large        100 GB      Yes

Part 3: Cognito Authentication

Set up Cognito User Pool for OAuth2/OIDC authentication:

# terraform/cognito.tf

resource "aws_cognito_user_pool" "backstage" {
  name = "${var.project_name}-users"

  # Password policy
  password_policy {
    minimum_length                   = 12
    require_lowercase                = true
    require_numbers                  = true
    require_symbols                  = true
    require_uppercase                = true
    temporary_password_validity_days = 7
  }

  # MFA configuration
  mfa_configuration = var.environment == "production" ? "ON" : "OPTIONAL"

  software_token_mfa_configuration {
    enabled = true
  }

  # Account recovery
  account_recovery_setting {
    recovery_mechanism {
      name     = "verified_email"
      priority = 1
    }
  }

  # Email configuration
  email_configuration {
    email_sending_account = "COGNITO_DEFAULT"
  }

  # Schema attributes
  schema {
    name                = "email"
    attribute_data_type = "String"
    required            = true
    mutable             = true

    string_attribute_constraints {
      min_length = 1
      max_length = 256
    }
  }

  schema {
    name                = "name"
    attribute_data_type = "String"
    required            = true
    mutable             = true

    string_attribute_constraints {
      min_length = 1
      max_length = 256
    }
  }

  # Auto-verified attributes
  auto_verified_attributes = ["email"]

  # User pool add-ons
  user_pool_add_ons {
    advanced_security_mode = "ENFORCED"
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_cognito_user_pool_domain" "backstage" {
  domain       = "${var.project_name}-${var.environment}"
  user_pool_id = aws_cognito_user_pool.backstage.id
}

resource "aws_cognito_user_pool_client" "backstage" {
  name         = "${var.project_name}-client"
  user_pool_id = aws_cognito_user_pool.backstage.id

  generate_secret = true

  # OAuth configuration
  allowed_oauth_flows                  = ["code"]
  allowed_oauth_flows_user_pool_client = true
  allowed_oauth_scopes                 = ["email", "openid", "profile"]

  callback_urls = [
    "https://${var.domain_name}/api/auth/aws-alb-oidc/handler/frame",
    "https://${var.domain_name}/api/auth/cognito/handler/frame"
  ]

  logout_urls = [
    "https://${var.domain_name}"
  ]

  supported_identity_providers = ["COGNITO"]

  # Token validity
  access_token_validity  = 1   # hours
  id_token_validity      = 1   # hours
  refresh_token_validity = 30  # days

  token_validity_units {
    access_token  = "hours"
    id_token      = "hours"
    refresh_token = "days"
  }

  # Prevent user existence errors
  prevent_user_existence_errors = "ENABLED"

  explicit_auth_flows = [
    "ALLOW_REFRESH_TOKEN_AUTH",
    "ALLOW_USER_SRP_AUTH"
  ]
}

# Store client secret in Secrets Manager
resource "aws_secretsmanager_secret" "cognito_client_secret" {
  name                    = "${var.project_name}/cognito-client-secret"
  recovery_window_in_days = 7
}

resource "aws_secretsmanager_secret_version" "cognito_client_secret" {
  secret_id = aws_secretsmanager_secret.cognito_client_secret.id
  secret_string = jsonencode({
    client_id     = aws_cognito_user_pool_client.backstage.id
    client_secret = aws_cognito_user_pool_client.backstage.client_secret
    user_pool_id  = aws_cognito_user_pool.backstage.id
    domain        = aws_cognito_user_pool_domain.backstage.domain
    region        = var.aws_region
  })
}

# Create admin group
resource "aws_cognito_user_group" "admins" {
  name         = "admins"
  user_pool_id = aws_cognito_user_pool.backstage.id
  description  = "Backstage administrators"
}

resource "aws_cognito_user_group" "developers" {
  name         = "developers"
  user_pool_id = aws_cognito_user_pool.backstage.id
  description  = "Backstage developers"
}

Part 4: ECR and Docker Image

Create the container registry and Backstage Docker image:

# terraform/ecr.tf

resource "aws_ecr_repository" "backstage" {
  name                 = "${var.project_name}/backstage"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_ecr_lifecycle_policy" "backstage" {
  repository = aws_ecr_repository.backstage.name

  policy = jsonencode({
    rules = [
      {
        rulePriority = 1
        description  = "Keep last 10 images"
        selection = {
          tagStatus     = "tagged"
          tagPrefixList = ["v"]
          countType     = "imageCountMoreThan"
          countNumber   = 10
        }
        action = {
          type = "expire"
        }
      },
      {
        rulePriority = 2
        description  = "Remove untagged images older than 7 days"
        selection = {
          tagStatus   = "untagged"
          countType   = "sinceImagePushed"
          countUnit   = "days"
          countNumber = 7
        }
        action = {
          type = "expire"
        }
      }
    ]
  })
}

Backstage Dockerfile:

# backstage/Dockerfile

# Stage 1: Build
FROM node:18-bookworm-slim AS build

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    g++ \
    make \
    git \
    && rm -rf /var/lib/apt/lists/*

# Copy package files
COPY package.json yarn.lock ./
COPY packages/app/package.json ./packages/app/
COPY packages/backend/package.json ./packages/backend/

# Install dependencies
RUN yarn install --frozen-lockfile --network-timeout 600000

# Copy source
COPY . .

# Build backend
RUN yarn build:backend

# Stage 2: Production
FROM node:18-bookworm-slim AS production

WORKDIR /app

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    g++ \
    make \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy built backend
COPY --from=build /app/packages/backend/dist ./packages/backend/dist
COPY --from=build /app/yarn.lock ./
COPY --from=build /app/package.json ./

# Copy backend package.json
COPY --from=build /app/packages/backend/package.json ./packages/backend/

# Install production dependencies
RUN yarn install --frozen-lockfile --production --network-timeout 600000

# Copy app-config
COPY app-config.yaml app-config.production.yaml ./

# Set environment
ENV NODE_ENV=production

# Create non-root user
RUN groupadd -r backstage && useradd -r -g backstage backstage
RUN chown -R backstage:backstage /app
USER backstage

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:7007/healthcheck || exit 1

EXPOSE 7007

CMD ["node", "packages/backend", "--config", "app-config.yaml", "--config", "app-config.production.yaml"]

Part 5: ECS Fargate Deployment

Deploy Backstage to ECS Fargate:

# terraform/ecs.tf

resource "aws_ecs_cluster" "backstage" {
  name = "${var.project_name}-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_ecs_cluster_capacity_providers" "backstage" {
  cluster_name = aws_ecs_cluster.backstage.name

  capacity_providers = ["FARGATE", "FARGATE_SPOT"]

  default_capacity_provider_strategy {
    base              = 1
    weight            = 100
    capacity_provider = var.environment == "production" ? "FARGATE" : "FARGATE_SPOT"
  }
}

resource "aws_cloudwatch_log_group" "backstage" {
  name              = "/ecs/${var.project_name}"
  retention_in_days = var.environment == "production" ? 90 : 30

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_ecs_task_definition" "backstage" {
  family                   = var.project_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.ecs_cpu
  memory                   = var.ecs_memory
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([
    {
      name  = "backstage"
      image = "${aws_ecr_repository.backstage.repository_url}:${var.image_tag}"

      essential = true

      portMappings = [
        {
          containerPort = var.backstage_port
          protocol      = "tcp"
        }
      ]

      environment = [
        {
          name  = "NODE_ENV"
          value = "production"
        },
        {
          name  = "APP_CONFIG_app_baseUrl"
          value = "https://${var.domain_name}"
        },
        {
          name  = "APP_CONFIG_backend_baseUrl"
          value = "https://${var.domain_name}"
        },
        {
          name  = "APP_CONFIG_backend_cors_origin"
          value = "https://${var.domain_name}"
        }
      ]

      secrets = [
        {
          name      = "POSTGRES_HOST"
          valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:host::"
        },
        {
          name      = "POSTGRES_PORT"
          valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:port::"
        },
        {
          name      = "POSTGRES_USER"
          valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:username::"
        },
        {
          name      = "POSTGRES_PASSWORD"
          valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:password::"
        },
        {
          name      = "COGNITO_CLIENT_ID"
          valueFrom = "${aws_secretsmanager_secret.cognito_client_secret.arn}:client_id::"
        },
        {
          name      = "COGNITO_CLIENT_SECRET"
          valueFrom = "${aws_secretsmanager_secret.cognito_client_secret.arn}:client_secret::"
        }
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.backstage.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "backstage"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:${var.backstage_port}/healthcheck || exit 1"]
        interval    = 30
        timeout     = 10
        retries     = 3
        startPeriod = 60
      }
    }
  ])

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_ecs_service" "backstage" {
  name                               = var.project_name
  cluster                            = aws_ecs_cluster.backstage.id
  task_definition                    = aws_ecs_task_definition.backstage.arn
  desired_count                      = var.ecs_desired_count
  deployment_minimum_healthy_percent = 50
  deployment_maximum_percent         = 200
  launch_type                        = "FARGATE"
  platform_version                   = "LATEST"
  health_check_grace_period_seconds  = 120

  network_configuration {
    security_groups  = [aws_security_group.ecs.id]
    subnets          = module.vpc.private_subnets
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.backstage.arn
    container_name   = "backstage"
    container_port   = var.backstage_port
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  lifecycle {
    ignore_changes = [task_definition, desired_count]
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

# Auto-scaling
resource "aws_appautoscaling_target" "backstage" {
  max_capacity       = var.ecs_max_count
  min_capacity       = var.ecs_min_count
  resource_id        = "service/${aws_ecs_cluster.backstage.name}/${aws_ecs_service.backstage.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "cpu" {
  name               = "${var.project_name}-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.backstage.resource_id
  scalable_dimension = aws_appautoscaling_target.backstage.scalable_dimension
  service_namespace  = aws_appautoscaling_target.backstage.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

resource "aws_appautoscaling_policy" "memory" {
  name               = "${var.project_name}-memory-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.backstage.resource_id
  scalable_dimension = aws_appautoscaling_target.backstage.scalable_dimension
  service_namespace  = aws_appautoscaling_target.backstage.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }
    target_value       = 80.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

ECS sizing guidelines:

ENVIRONMENT     CPU         MEMORY      DESIRED     MIN     MAX
===========     ===         ======      =======     ===     ===
Development     512         1024        1           1       2
Staging         1024        2048        2           1       4
Production      2048        4096        3           2       10

Part 6: Application Load Balancer

# terraform/alb.tf

resource "aws_lb" "backstage" {
  name               = "${var.project_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = module.vpc.public_subnets

  enable_deletion_protection = var.environment == "production"

  access_logs {
    bucket  = aws_s3_bucket.alb_logs.id
    prefix  = "alb"
    enabled = true
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_lb_target_group" "backstage" {
  name        = "${var.project_name}-tg"
  port        = var.backstage_port
  protocol    = "HTTP"
  vpc_id      = module.vpc.vpc_id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 30
    matcher             = "200"
    path                = "/healthcheck"
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 10
  }

  stickiness {
    type            = "lb_cookie"
    cookie_duration = 86400
    enabled         = true
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

# HTTPS listener
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.backstage.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate_validation.backstage.certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.backstage.arn
  }
}

# HTTP to HTTPS redirect
resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.backstage.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

# ACM Certificate
resource "aws_acm_certificate" "backstage" {
  domain_name       = var.domain_name
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_route53_record" "cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.backstage.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }

  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = data.aws_route53_zone.main.zone_id
}

resource "aws_acm_certificate_validation" "backstage" {
  certificate_arn         = aws_acm_certificate.backstage.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

# Route 53 record
resource "aws_route53_record" "backstage" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = aws_lb.backstage.dns_name
    zone_id                = aws_lb.backstage.zone_id
    evaluate_target_health = true
  }
}

Part 7: IAM Roles

# terraform/iam.tf

# ECS Execution Role (for pulling images, writing logs)
resource "aws_iam_role" "ecs_execution" {
  name = "${var.project_name}-ecs-execution"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

resource "aws_iam_role_policy_attachment" "ecs_execution" {
  role       = aws_iam_role.ecs_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

resource "aws_iam_role_policy" "ecs_execution_secrets" {
  name = "${var.project_name}-secrets-access"
  role = aws_iam_role.ecs_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "secretsmanager:GetSecretValue"
        ]
        Resource = [
          aws_secretsmanager_secret.db_credentials.arn,
          aws_secretsmanager_secret.cognito_client_secret.arn
        ]
      }
    ]
  })
}

# ECS Task Role (for application permissions)
resource "aws_iam_role" "ecs_task" {
  name = "${var.project_name}-ecs-task"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })

  tags = {
    Environment = var.environment
    Project     = var.project_name
  }
}

# Add permissions for Backstage integrations
resource "aws_iam_role_policy" "ecs_task_permissions" {
  name = "${var.project_name}-task-permissions"
  role = aws_iam_role.ecs_task.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:ListBucket"
        ]
        Resource = [
          "arn:aws:s3:::${var.project_name}-*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "ssm:GetParameter",
          "ssm:GetParameters",
          "ssm:GetParametersByPath"
        ]
        Resource = [
          "arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter/${var.project_name}/*"
        ]
      }
    ]
  })
}

Part 8: Backstage Configuration

Production app-config for Backstage:

# backstage/app-config.production.yaml

app:
  title: Backstage Developer Portal
  baseUrl: ${APP_CONFIG_app_baseUrl}

organization:
  name: Your Company

backend:
  baseUrl: ${APP_CONFIG_backend_baseUrl}
  listen:
    port: 7007
    host: 0.0.0.0
  cors:
    origin: ${APP_CONFIG_backend_cors_origin}
    methods: [GET, HEAD, PATCH, POST, PUT, DELETE]
    credentials: true

  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}
      database: backstage
      ssl:
        require: true
        rejectUnauthorized: false

  cache:
    store: memory

  csp:
    connect-src: ["'self'", 'http:', 'https:']
    img-src: ["'self'", 'data:', 'https:']
    script-src: ["'self'", "'unsafe-eval'"]

auth:
  environment: production
  providers:
    cognito:
      production:
        clientId: ${COGNITO_CLIENT_ID}
        clientSecret: ${COGNITO_CLIENT_SECRET}
        issuer: https://cognito-idp.${AWS_REGION}.amazonaws.com/${COGNITO_USER_POOL_ID}
        signIn:
          resolvers:
            - resolver: emailMatchingUserEntityProfileEmail

integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

catalog:
  import:
    entityFilename: catalog-info.yaml
    pullRequestBranchName: backstage-integration
  rules:
    - allow: [Component, System, API, Resource, Location, Domain, Group, User]
  locations:
    - type: file
      target: ../catalog-info.yaml
    - type: url
      target: https://github.com/your-org/software-catalog/blob/main/all.yaml
      rules:
        - allow: [Component, System, API, Resource, Domain]

techdocs:
  builder: 'local'
  generator:
    runIn: 'docker'
  publisher:
    type: 'awsS3'
    awsS3:
      bucketName: ${TECHDOCS_BUCKET}
      region: ${AWS_REGION}

kubernetes:
  serviceLocatorMethod:
    type: 'multiTenant'
  clusterLocatorMethods:
    - type: 'config'
      clusters: []

scaffolder:
  defaultAuthor:
    name: Backstage Scaffolder
    email: scaffolder@company.com
  defaultCommitMessage: 'Initial commit from Backstage'

Part 9: CI/CD Pipeline

GitHub Actions workflow for deployment:

# .github/workflows/deploy.yml

name: Deploy Backstage

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy'
        required: true
        default: 'staging'
        type: choice
        options:
          - staging
          - production

env:
  AWS_REGION: eu-west-1
  ECR_REPOSITORY: backstage/backstage

permissions:
  id-token: write
  contents: read

jobs:
  build:
    name: Build and Push
    runs-on: ubuntu-latest
    outputs:
      image_tag: ${{ steps.meta.outputs.version }}

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=semver,pattern={{version}}

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: ./backstage
          file: ./backstage/Dockerfile
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    name: Deploy to ECS
    needs: build
    runs-on: ubuntu-latest
    environment: ${{ github.event.inputs.environment || 'staging' }}

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Download task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition backstage \
            --query taskDefinition > task-definition.json

      - name: Update task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: backstage
          image: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ needs.build.outputs.image_tag }}

      - name: Deploy to Amazon ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: backstage
          cluster: backstage-cluster
          wait-for-service-stability: true
          wait-for-minutes: 10

      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "Backstage deployment failed!",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Backstage Deployment Failed* :x:\nEnvironment: ${{ github.event.inputs.environment || 'staging' }}\nCommit: ${{ github.sha }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Part 10: Variables and Outputs

# terraform/variables.tf

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "backstage"
}

variable "environment" {
  description = "Environment (development, staging, production)"
  type        = string
}

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "eu-west-1"
}

variable "vpc_cidr" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

variable "domain_name" {
  description = "Domain name for Backstage"
  type        = string
}

variable "db_name" {
  description = "Database name"
  type        = string
  default     = "backstage"
}

variable "db_username" {
  description = "Database username"
  type        = string
  default     = "backstage"
}

variable "db_instance_class" {
  description = "RDS instance class"
  type        = string
  default     = "db.t3.small"
}

variable "db_allocated_storage" {
  description = "RDS allocated storage (GB)"
  type        = number
  default     = 20
}

variable "db_max_allocated_storage" {
  description = "RDS max allocated storage (GB)"
  type        = number
  default     = 100
}

variable "ecs_cpu" {
  description = "ECS task CPU units"
  type        = number
  default     = 1024
}

variable "ecs_memory" {
  description = "ECS task memory (MB)"
  type        = number
  default     = 2048
}

variable "ecs_desired_count" {
  description = "ECS desired task count"
  type        = number
  default     = 2
}

variable "ecs_min_count" {
  description = "ECS minimum task count"
  type        = number
  default     = 1
}

variable "ecs_max_count" {
  description = "ECS maximum task count"
  type        = number
  default     = 10
}

variable "backstage_port" {
  description = "Backstage container port"
  type        = number
  default     = 7007
}

variable "image_tag" {
  description = "Docker image tag"
  type        = string
  default     = "latest"
}
# terraform/outputs.tf

output "alb_dns_name" {
  description = "ALB DNS name"
  value       = aws_lb.backstage.dns_name
}

output "backstage_url" {
  description = "Backstage URL"
  value       = "https://${var.domain_name}"
}

output "cognito_user_pool_id" {
  description = "Cognito User Pool ID"
  value       = aws_cognito_user_pool.backstage.id
}

output "cognito_domain" {
  description = "Cognito domain"
  value       = "https://${aws_cognito_user_pool_domain.backstage.domain}.auth.${var.aws_region}.amazoncognito.com"
}

output "ecr_repository_url" {
  description = "ECR repository URL"
  value       = aws_ecr_repository.backstage.repository_url
}

output "rds_endpoint" {
  description = "RDS endpoint"
  value       = module.rds.db_instance_endpoint
  sensitive   = true
}

output "ecs_cluster_name" {
  description = "ECS cluster name"
  value       = aws_ecs_cluster.backstage.name
}

Deployment

# Initialize Terraform
cd terraform
terraform init

# Plan changes
terraform plan -var-file=production.tfvars

# Apply infrastructure
terraform apply -var-file=production.tfvars

# Build and push Docker image
cd ../backstage
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin <account>.dkr.ecr.eu-west-1.amazonaws.com
docker build -t backstage .
docker tag backstage:latest <account>.dkr.ecr.eu-west-1.amazonaws.com/backstage/backstage:latest
docker push <account>.dkr.ecr.eu-west-1.amazonaws.com/backstage/backstage:latest

# Force new deployment
aws ecs update-service --cluster backstage-cluster --service backstage --force-new-deployment

Monitoring and Alerts

# terraform/cloudwatch.tf

resource "aws_cloudwatch_metric_alarm" "ecs_cpu_high" {
  alarm_name          = "${var.project_name}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 300
  statistic           = "Average"
  threshold           = 85
  alarm_description   = "ECS CPU utilization is high"

  dimensions = {
    ClusterName = aws_ecs_cluster.backstage.name
    ServiceName = aws_ecs_service.backstage.name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
  ok_actions    = [aws_sns_topic.alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "rds_cpu_high" {
  alarm_name          = "${var.project_name}-rds-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/RDS"
  period              = 300
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "RDS CPU utilization is high"

  dimensions = {
    DBInstanceIdentifier = module.rds.db_instance_identifier
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "alb_5xx_errors" {
  alarm_name          = "${var.project_name}-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HTTPCode_ELB_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 300
  statistic           = "Sum"
  threshold           = 10
  alarm_description   = "ALB 5xx errors are high"

  dimensions = {
    LoadBalancer = aws_lb.backstage.arn_suffix
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

resource "aws_sns_topic" "alerts" {
  name = "${var.project_name}-alerts"
}

Cost Estimation

Monthly cost estimate for production:

RESOURCE                    SIZE                COST/MONTH (USD)
========                    ====                ================
ECS Fargate                 2x (2 vCPU, 4GB)    ~$140
RDS PostgreSQL              db.r6g.large        ~$180
NAT Gateway                 1x                  ~$45
ALB                         1x                  ~$25
Route 53                    1 zone              ~$0.50
Secrets Manager             2 secrets           ~$1
CloudWatch                  Logs + metrics      ~$20
ECR                         ~5GB storage        ~$0.50
------------------------------------------------
TOTAL                                           ~$412/month

For non-production, use Fargate Spot and smaller instances:

RESOURCE                    SIZE                COST/MONTH (USD)
========                    ====                ================
ECS Fargate Spot            1x (1 vCPU, 2GB)    ~$25
RDS PostgreSQL              db.t3.micro         ~$15
NAT Gateway                 1x                  ~$45
ALB                         1x                  ~$25
------------------------------------------------
TOTAL                                           ~$115/month

Troubleshooting

ECS task failing to start:

# Check task stopped reason
aws ecs describe-tasks --cluster backstage-cluster --tasks <task-id>

# Check CloudWatch logs
aws logs tail /ecs/backstage --follow

Database connection issues:

# Test from bastion/local
psql -h <rds-endpoint> -U backstage -d backstage

# Check security groups allow traffic
aws ec2 describe-security-groups --group-ids <sg-id>

Cognito authentication failing:

# Verify callback URLs match exactly
aws cognito-idp describe-user-pool-client \
  --user-pool-id <pool-id> \
  --client-id <client-id>

Health check failing:

# Test health endpoint locally
curl http://localhost:7007/healthcheck

# Check ALB target health
aws elbv2 describe-target-health --target-group-arn <tg-arn>

References

======================================== Backstage on AWS ECS

Production-ready. Scalable. Secure.

Found this helpful?

Comments