ECS Task Sets: Blue/Green Deployments Without CodeDeploy

ECS has a feature that most engineers never touch: Task Sets. They let you run multiple versions of a service simultaneously with fine-grained traffic control – essentially giving you blue/green or canary deployments without CodeDeploy.

I explored this at a previous company when we wanted more control over deployment rollouts than the standard ECS rolling update provides. CodeDeploy felt heavyweight for what we needed, and we wanted to understand exactly what was happening during a deployment rather than trusting a black box.

Task Sets give you that control. But they come with trade-offs.

What Are Task Sets?

A Task Set is a subset of tasks within an ECS service. Instead of a service having one homogeneous group of tasks all running the same task definition, you can have multiple task sets – each potentially running a different version.

The mental model:

ECS Service
├── Task Set "blue"  (v1.2.3) ──► 80% traffic
└── Task Set "green" (v1.2.4) ──► 20% traffic

Each task set has:

Its own task definition (version)
Its own desired count or scale percentage
Its own network configuration
A stability status (STEADY_STATE or not)

One task set is designated as the primary. This is the “default” version – the one that remains if you delete others.

Why Use Task Sets?

1. Explicit version control

With rolling deployments, ECS gradually replaces old tasks with new ones. You don’t have two distinct versions running – you have a mix that’s constantly shifting. Task sets let you maintain two complete, stable deployments side by side.

2. Instant rollback

If the green deployment is broken, you delete the task set. Done. No waiting for a rollback deployment to propagate. The blue task set is still running, unchanged.

3. Traffic splitting without a service mesh

Combined with a load balancer and target groups, you can route percentages of traffic to each task set. Canary deployments become possible without Istio or App Mesh.

4. Testing in production (carefully)

You can run a new version at 5% traffic, monitor it, then scale up. Or route specific headers/paths to the new version for internal testing before public release.

The Trade-Offs (Be Honest About These)

1. Complexity overhead

Standard ECS deployments are simple: update the task definition, ECS handles the rest. Task sets require you to manage the lifecycle explicitly – create, scale, promote, delete. More moving parts, more to get wrong.

2. No native CI/CD integration

CodeDeploy has hooks, alarms, automatic rollback. Task sets are manual (or require custom automation). Your pipeline needs to handle the orchestration.

3. Double the running tasks during deployment

Blue/green means both versions run simultaneously. You’re paying for 2x capacity during the transition window. For large services, this isn’t trivial.

4. Load balancer configuration

Traffic splitting requires weighted target groups or ALB rules. This adds infrastructure complexity and another thing to manage/debug.

5. External deployment controller is all-or-nothing

Once you set deployment_controller = EXTERNAL, ECS won’t manage deployments at all. No rolling updates, no circuit breakers. You own it entirely.

Setting It Up

Prerequisites

ECS cluster (Fargate or EC2)
VPC with subnets and security groups configured
A task definition registered
(Optional) ALB with target groups for traffic splitting

Step 1: Create the Service with External Deployment Controller

The key is --deployment-controller type=EXTERNAL. This tells ECS you’ll manage task sets yourself.

aws ecs create-service \
  --cluster my-cluster \
  --service-name my-service \
  --desired-count 0 \
  --deployment-controller type=EXTERNAL \
  --scheduling-strategy REPLICA \
  --deployment-configuration maximumPercent=200,minimumHealthyPercent=100

Note: desired-count at service level is ignored when using external controller – it’s set per task set.

Step 2: Create the Blue Task Set

aws ecs create-task-set \
  --cluster my-cluster \
  --service my-service \
  --external-id blue \
  --task-definition my-app:42 \
  --launch-type FARGATE \
  --scale unit=PERCENT,value=100 \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123,subnet-def456],securityGroups=[sg-123456],assignPublicIp=ENABLED}"

The --scale unit=PERCENT,value=100 means this task set gets 100% of the service’s compute capacity. The --external-id is your label – use it to track which is blue/green.

Step 3: Set the Primary Task Set

aws ecs update-service-primary-task-set \
  --cluster my-cluster \
  --service my-service \
  --primary-task-set arn:aws:ecs:eu-west-1:123456789:task-set/my-cluster/my-service/ecs-svc/1234567890

Get the task set ARN from the create-task-set response or:

aws ecs describe-task-sets \
  --cluster my-cluster \
  --service my-service

Step 4: Deploy Green (New Version)

aws ecs create-task-set \
  --cluster my-cluster \
  --service my-service \
  --external-id green \
  --task-definition my-app:43 \
  --launch-type FARGATE \
  --scale unit=PERCENT,value=100 \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123,subnet-def456],securityGroups=[sg-123456],assignPublicIp=ENABLED}"

Now you have two task sets running simultaneously. Both at 100% scale means double capacity – adjust based on your needs.

Step 5: Validate and Promote

Once green is healthy and validated:

# Promote green to primary
aws ecs update-service-primary-task-set \
  --cluster my-cluster \
  --service my-service \
  --primary-task-set arn:aws:ecs:eu-west-1:123456789:task-set/my-cluster/my-service/ecs-svc/9876543210

# Delete blue
aws ecs delete-task-set \
  --cluster my-cluster \
  --service my-service \
  --task-set arn:aws:ecs:eu-west-1:123456789:task-set/my-cluster/my-service/ecs-svc/1234567890 \
  --force

The --force flag deletes even if tasks are still running. Without it, you’d need to scale down first.

Rollback

If green is broken:

# Just delete green, blue is still primary and running
aws ecs delete-task-set \
  --cluster my-cluster \
  --service my-service \
  --task-set arn:aws:ecs:eu-west-1:123456789:task-set/my-cluster/my-service/ecs-svc/9876543210 \
  --force

That’s it. Blue continues serving traffic. No deployment, no waiting.

Terraform Configuration

Here’s the equivalent in Terraform:

# Service with external deployment controller
resource "aws_ecs_service" "main" {
  name            = "my-service"
  cluster         = aws_ecs_cluster.main.id
  
  # Don't set task_definition here - it's managed per task set
  
  deployment_controller {
    type = "EXTERNAL"
  }

  # These are ignored with EXTERNAL controller but required by the API
  scheduling_strategy = "REPLICA"
}

# Blue task set
resource "aws_ecs_task_set" "blue" {
  service         = aws_ecs_service.main.id
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app_v1.arn
  
  external_id = "blue"
  
  launch_type = "FARGATE"

  scale {
    unit  = "PERCENT"
    value = 100
  }

  network_configuration {
    subnets          = var.private_subnets
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  # Optional: register with load balancer
  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "app"
    container_port   = 8080
  }

  lifecycle {
    ignore_changes = [scale]  # Scale might be adjusted manually
  }
}

# Green task set (created during deployment)
resource "aws_ecs_task_set" "green" {
  count = var.deploy_green ? 1 : 0

  service         = aws_ecs_service.main.id
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app_v2.arn
  
  external_id = "green"
  
  launch_type = "FARGATE"

  scale {
    unit  = "PERCENT"
    value = 100
  }

  network_configuration {
    subnets          = var.private_subnets
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.green.arn
    container_name   = "app"
    container_port   = 8080
  }
}

# Primary task set designation
resource "aws_ecs_cluster_capacity_providers" "main" {
  # ... capacity provider config
}

# Note: aws_ecs_service_primary_task_set resource doesn't exist
# You'll need to use a null_resource with local-exec or handle this in CI/CD
resource "null_resource" "set_primary" {
  depends_on = [aws_ecs_task_set.blue]

  provisioner "local-exec" {
    command = <<-EOT
      aws ecs update-service-primary-task-set \
        --cluster ${aws_ecs_cluster.main.name} \
        --service ${aws_ecs_service.main.name} \
        --primary-task-set ${aws_ecs_task_set.blue.id}
    EOT
  }
}

Traffic Splitting with ALB

For weighted traffic between blue and green:

resource "aws_lb_listener_rule" "weighted" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100

  action {
    type = "forward"

    forward {
      target_group {
        arn    = aws_lb_target_group.blue.arn
        weight = var.blue_weight  # e.g., 90
      }

      target_group {
        arn    = aws_lb_target_group.green.arn
        weight = var.green_weight  # e.g., 10
      }

      stickiness {
        enabled  = true
        duration = 600
      }
    }
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Adjust blue_weight and green_weight to control traffic split. Start at 90/10, validate, move to 50/50, then 0/100.

Monitoring During Deployment

Key metrics to watch:

# Task set stability
aws ecs describe-task-sets \
  --cluster my-cluster \
  --service my-service \
  --query 'taskSets[*].{Id:externalId,Status:status,Stability:stabilityStatus,Running:runningCount,Pending:pendingCount}'

# Output:
# [
#   {"Id": "blue", "Status": "ACTIVE", "Stability": "STEADY_STATE", "Running": 4, "Pending": 0},
#   {"Id": "green", "Status": "ACTIVE", "Stability": "STABILIZING", "Running": 2, "Pending": 2}
# ]

Wait for green to reach STEADY_STATE before promoting. A task set is steady when:

Running count matches desired
No pending tasks
Health checks passing (if configured)

When to Use Task Sets vs Alternatives

Scenario	Recommendation
Simple rolling updates are fine	Don’t use task sets – unnecessary complexity
Need instant rollback	Task sets or CodeDeploy
Want traffic splitting/canary	Task sets + ALB, or App Mesh
Require deployment hooks/alarms	CodeDeploy (it’s built for this)
Full control, custom orchestration	Task sets
GitOps/declarative deployments	Task sets with careful state management

Task sets make sense when you need the control and are willing to build the automation. If CodeDeploy does what you need, use it – it’s less to maintain.

Gotchas

1. Task set ARNs are not predictable

You can’t construct them ahead of time. Always capture the ARN from the create response or describe call.

2. Deleting the primary task set fails

You must promote another task set to primary first, or delete the entire service.

3. Scale percentages are relative to service compute

scale=100% on two task sets means 200% total compute. Plan your capacity accordingly.

4. No built-in health gate

Unlike CodeDeploy, task sets don’t automatically roll back on health check failures. You need external monitoring and automation.

5. Terraform state can drift

If you modify task sets via CLI, Terraform won’t know. Consider managing deployments outside Terraform or use ignore_changes liberally.

Summary

ECS Task Sets give you low-level control over blue/green deployments without CodeDeploy’s abstractions. You get explicit version management, instant rollback, and traffic splitting capabilities – but you also take on the orchestration burden.

Use them when you need that control. Stick with rolling deployments or CodeDeploy when you don’t.

Using task sets in production or found edge cases I missed? Let me know on LinkedIn.