AWS Control Tower Account Factory - The Gotchas Nobody Tells You

AWS Control Tower’s Account Factory sounds straightforward. You define an OU structure, wire up Service Catalog, and Terraform handles the rest. New accounts on demand.

In practice, it’s a minefield of silent failures, IAM permission gaps, and timing issues that aren’t in the documentation. I recently automated account provisioning for a client’s multi-account setup and hit every single one of these.

This post isn’t a setup guide. It’s the list of things that broke, why they broke, and how to fix them - so you don’t waste the same days I did.

TL;DR

Service Catalog products silently hang if portfolio associations are wrong
The AWSControlTowerExecution role can get deleted by failed provisioned product terminations
StackSets have eventual consistency - your automation will race against them
IAM session duration limits will bite you mid-provisioning
SSO access isn’t automatic after enrollment - you need to wire it yourself
Always verify your actual IAM role names against what’s in your templates

Architecture Context

The setup in question:

Management Account
├── Control Tower (landing zone)
├── Service Catalog (Account Factory product)
├── CloudFormation StackSets (baseline deployment)
│
├── Platform OU
│   └── DevOps Sandbox (existing)
│
├── Sandbox OU
│   └── New accounts provisioned here
│
├── Staging OU
├── Prod OU
│
└── Security OU
    ├── Log Archive
    └── Audit

Terraform provisions new accounts through the aws_servicecatalog_provisioned_product resource, which triggers Account Factory under the hood. A CloudFormation StackSet auto-deploys IAM roles into every new account.

Simple on paper. Brutal in practice.

Gotcha 1: Service Catalog Portfolio Associations

This one cost the most time because the failure mode is completely silent.

When you provision a product through Service Catalog, the IAM role making the API call must be a principal on the portfolio that contains the product. Not just on the product - on the portfolio.

If the association is missing, Terraform doesn’t throw an error. The aws_servicecatalog_provisioned_product resource just… hangs. No timeout. No failure. It sits at UNDER_CHANGE until you kill it.

# This is the bit most people forget
resource "aws_servicecatalog_principal_portfolio_association" "provisioning_role" {
  portfolio_id  = data.aws_servicecatalog_portfolio.account_factory.id
  principal_arn = aws_iam_role.provisioning.arn
  principal_type = "IAM"
}

You need this association BEFORE any provisioned product resource runs. Put it in a separate module or use depends_on explicitly.

resource "aws_servicecatalog_provisioned_product" "new_account" {
  name                = "platform-engineering-sandbox"
  product_id          = data.aws_servicecatalog_product.account_factory.id
  provisioning_artifact_id = data.aws_servicecatalog_product.account_factory.default_provisioning_artifact_id

  provisioning_parameters {
    key   = "AccountName"
    value = "platform-engineering-sandbox"
  }

  provisioning_parameters {
    key   = "AccountEmail"
    value = "aws+pe-sandbox@company.com"
  }

  provisioning_parameters {
    key   = "ManagedOrganizationalUnit"
    value = "Sandbox"
  }

  provisioning_parameters {
    key   = "SSOUserEmail"
    value = "admin@company.com"
  }

  provisioning_parameters {
    key   = "SSOUserFirstName"
    value = "Platform"
  }

  provisioning_parameters {
    key   = "SSOUserLastName"
    value = "Admin"
  }

  # Critical - ensure portfolio association exists first
  depends_on = [
    aws_servicecatalog_principal_portfolio_association.provisioning_role
  ]
}

How to check: In the AWS Console, go to Service Catalog > Portfolios > your Account Factory portfolio > Access tab. Your provisioning role should be listed there.

Gotcha 2: The AWSControlTowerExecution Role Deletion Trap

When you provision an account through Account Factory, Control Tower creates the AWSControlTowerExecution role in the new account. This role is how Control Tower manages the account going forward - baseline deployments, guardrails, drift detection.

Here’s the trap: if a provisioned product enters a failed state and you terminate it, the termination process can delete this role from the account. But the account itself still exists in your organisation. Now you have an account that Control Tower can’t manage.

SEQUENCE OF PAIN
================

1. Provision account via Service Catalog        → Account created
2. Provisioning fails halfway (timeout, perms)  → Status: TAINTED/ERROR
3. Terminate the provisioned product             → Role deleted
4. Try to re-provision or enroll the account     → Fails (no execution role)
5. Try to create the role manually               → Needs access to the account
6. Can't access the account                      → No SSO, no role, nothing

The fix is to NEVER terminate a failed provisioned product if the account was actually created. Instead:

Check if the account exists in AWS Organizations
If it does, enroll it through the Control Tower console (not Terraform)
Import the state into Terraform after enrollment

# Check if the account was actually created
aws organizations list-accounts \
  --query "Accounts[?Name=='platform-engineering-sandbox']"

# If it exists, enroll via console, then import
terraform import \
  aws_servicecatalog_provisioned_product.new_account \
  pp-xxxxxxxxxxxxx

Gotcha 3: IAM Session Duration Limits

Most CI/CD platforms have a maximum session duration. If your pipeline assumes an IAM role to run Terraform, that session has a clock on it.

Account Factory provisioning is not fast. Depending on the complexity of your baseline StackSet, it can take 20-45 minutes. If your session expires before provisioning completes, Terraform loses its connection to the AWS API and the provisioned product sits in limbo.

TIMING BREAKDOWN
================

Step                              Time
====                              ====
Service Catalog product launch    ~2 min
Account creation in Organisations ~5 min
Control Tower baseline deploy     ~15-30 min
StackSet instance deployment      ~5-10 min
SSO configuration                 ~2-5 min
────────────────────────────────────────
Total                             ~30-50 min

The default max session for most IAM roles is 1 hour. That sounds like enough until you add Terraform plan time, state locking, and any other resources in the same run.

Fix:

# In your IAM role CloudFormation template
ProvisioningRole:
  Type: AWS::IAM::Role
  Properties:
    RoleName: terraform-provisioning
    MaxSessionDuration: 7200  # 2 hours
    AssumeRolePolicyDocument:
      # ... your trust policy

And in Terraform:

resource "aws_iam_role" "provisioning" {
  name                 = "terraform-provisioning"
  max_session_duration = 7200  # seconds
  assume_role_policy   = data.aws_iam_policy_document.trust.json
}

Also set appropriate timeouts on the Terraform resource:

resource "aws_servicecatalog_provisioned_product" "new_account" {
  # ... config ...

  timeouts {
    create = "60m"
    update = "60m"
    delete = "60m"
  }
}

Gotcha 4: StackSet Eventual Consistency

CloudFormation StackSets deploy asynchronously. When Control Tower enrolls an account, it triggers StackSet deployments for guardrails and baseline configurations. These don’t complete instantly.

If your Terraform automation tries to interact with the new account immediately after provisioning (create IAM integrations, deploy resources, configure providers), it will race against the StackSet deployments.

Common symptoms:

SYMPTOMS OF STACKSET RACES
===========================

- Resources exist momentarily then disappear (StackSet overwrites them)
- IAM roles have different permissions than expected (baseline hasn't applied yet)
- aws_caller_identity returns the account but STS calls fail
- Random AccessDenied errors that work 5 minutes later

Fix: Add explicit waits or use a two-stage pipeline.

Stage 1 provisions the account. Stage 2 runs separately (triggered by a delay or webhook) and configures the account.

# Stage 1: Provision account
resource "aws_servicecatalog_provisioned_product" "account" {
  # ... provisioning config ...
}

# Stage 2: Wait for StackSet baseline (separate Terraform workspace)
data "aws_cloudformation_stack_set" "baseline" {
  name = "AWSControlTowerBP-BASELINE-ROLES"
}

# Verify the stack instance exists in the new account
data "aws_cloudformation_stack_set_instance" "baseline_check" {
  stack_set_name = data.aws_cloudformation_stack_set.baseline.name
  account_id     = aws_servicecatalog_provisioned_product.account.outputs["AccountId"]
  region         = "eu-west-1"
}

In practice, I ended up just sleeping for 5 minutes between stages. Ugly but reliable:

#!/bin/bash
# provision.sh - two-stage account provisioning

echo "Stage 1: Provisioning account..."
cd terraform/account-provisioning
terraform apply -auto-approve

ACCOUNT_ID=$(terraform output -raw account_id)
echo "Account $ACCOUNT_ID created. Waiting for baseline deployment..."

# StackSets need time to propagate
sleep 300

echo "Stage 2: Configuring account..."
cd ../account-configuration
terraform apply -auto-approve -var="account_id=$ACCOUNT_ID"

Gotcha 5: SSO Access Isn’t Automatic

When Account Factory creates an account, it creates an SSO user and assigns it to the account. But if you’re using IAM Identity Center with an external identity provider (Azure AD, Okta, etc.), or you want to assign permission sets to existing groups, that doesn’t happen automatically.

You need to explicitly create permission set assignments after the account is provisioned.

# Assign admin permission set to your platform team group
resource "aws_ssoadmin_account_assignment" "platform_admin" {
  instance_arn       = tolist(data.aws_ssoadmin_instances.this.arns)[0]
  permission_set_arn = aws_ssoadmin_permission_set.admin.arn

  principal_id   = data.aws_identitystore_group.platform_team.group_id
  principal_type = "GROUP"

  target_id   = aws_servicecatalog_provisioned_product.account.outputs["AccountId"]
  target_type = "AWS_ACCOUNT"
}

resource "aws_ssoadmin_account_assignment" "developer_readonly" {
  instance_arn       = tolist(data.aws_ssoadmin_instances.this.arns)[0]
  permission_set_arn = aws_ssoadmin_permission_set.read_only.arn

  principal_id   = data.aws_identitystore_group.developers.group_id
  principal_type = "GROUP"

  target_id   = aws_servicecatalog_provisioned_product.account.outputs["AccountId"]
  target_type = "AWS_ACCOUNT"
}

Without this, new accounts are provisioned but nobody can actually log into them through SSO. People find out when they click the account in the AWS access portal and get a blank page.

Gotcha 6: Wrong IAM Role Names in Templates

This one sounds obvious but catches everyone at least once.

CloudFormation StackSet templates that deploy IAM roles into member accounts reference a role name. If the actual role used by your CI/CD platform or Terraform runner doesn’t match the name in the template, the StackSet deploys a role that nothing uses, and your automation fails with AccessDenied.

# What's in the StackSet template
Resources:
  DeployRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: terraform-deploy  # <-- This name matters

# What your CI/CD is actually assuming
# terraform-provisioning  <-- WRONG NAME

The template says terraform-deploy. Your CI/CD assumes terraform-provisioning. Everything deploys cleanly. Nothing works.

Fix: Before writing any automation, verify the ACTUAL role name in two places:

# 1. What your CI/CD platform is configured to assume
aws sts get-caller-identity
# Returns: arn:aws:iam::XXXX:assumed-role/ACTUAL-ROLE-NAME/session

# 2. What your StackSet template creates
aws cloudformation describe-stack-set \
  --stack-set-name "your-member-account-roles" \
  --query "StackSet.TemplateBody" | jq -r . | grep RoleName

These two values must match. If they don’t, fix the template, not the CI/CD config - changing role names in CI/CD has knock-on effects everywhere.

Gotcha 7: The SCP and Permissions Boundary Dance

Service Control Policies (SCPs) are organisation-level guardrails. Permissions boundaries are account-level limits. When you use both (and you should), the interaction between them creates a restrictive intersection that’s hard to debug.

EFFECTIVE PERMISSIONS
=====================

Identity Policy (what the role CAN do)
     ∩
Permissions Boundary (maximum the role COULD do)
     ∩
SCP (maximum the account COULD do)
     =
What actually works

Real example: your StackSet deploys a role with AdministratorAccess. Your permissions boundary allows iam:*, s3:*, ec2:*. Your SCP denies iam:CreateUser. Result: the role can’t create IAM users even though both the identity policy and boundary allow it.

The debugging nightmare is that CloudTrail shows the denial from the SCP, but the error message to the user just says AccessDenied with no indication that an SCP is involved.

# Check effective SCPs for an account
aws organizations list-policies-for-target \
  --target-id "ACCOUNT_ID" \
  --filter "SERVICE_CONTROL_POLICY" \
  --query "Policies[].{Name:Name, Id:Id}"

# Get the policy content
aws organizations describe-policy \
  --policy-id "p-xxxxxxxxxx" \
  --query "Policy.Content" | jq -r . | jq .

Tip: Always test IAM operations from a new account before handing it to a team. Run a quick smoke test:

# Smoke test for new accounts
ACTIONS=("sts:GetCallerIdentity" "s3:ListBuckets" "ec2:DescribeRegions" "iam:ListRoles")

for action in "${ACTIONS[@]}"; do
  service=$(echo $action | cut -d: -f1)
  echo -n "$action: "
  aws $service $(echo $action | cut -d: -f2 | sed 's/\([A-Z]\)/-\L\1/g' | sed 's/^-//') 2>&1 | head -1
done

Gotcha 8: Control Tower Drift and Manual Console Changes

Control Tower has a concept called “drift” - when the actual state of your landing zone diverges from what Control Tower expects. Making manual changes in the console (even well-intentioned ones) can trigger drift detection and block further operations.

Common drift triggers:

THINGS THAT CAUSE CONTROL TOWER DRIFT
======================================

- Moving accounts between OUs via the Organisations console (not CT)
- Deleting or modifying the AWSControlTowerExecution role
- Changing SCP attachments directly in Organisations
- Modifying Control Tower managed StackSet instances
- Deleting CloudTrail trails in member accounts
- Removing Config rules deployed by guardrails

When drift is detected, Account Factory stops working entirely. You can’t provision new accounts, update existing ones, or change OU assignments until drift is resolved.

# Check landing zone drift status
aws controltower list-landing-zone-operations \
  --query "landingZoneOperations[?status=='IN_PROGRESS']"

# If you need to resolve drift
aws controltower reset-landing-zone \
  --landing-zone-identifier "arn:aws:controltower:eu-west-1:XXXX:landingzone/XXXX"

Rule: Never make changes to Control Tower-managed resources through the Organisations console, CloudFormation console, or CLI directly. Always go through Control Tower or Terraform with the appropriate provider resources.

Gotcha 9: Email Addresses Are Forever

Every AWS account needs a unique email address. Once used, that email can never be used for another account - even if the original account is closed.

At scale, this becomes an email management problem. Most teams use plus-addressing:

ACCOUNT EMAIL STRATEGY
======================

Pattern: aws+{ou}-{name}@company.com

Examples:
  aws+sandbox-dev1@company.com
  aws+prod-api@company.com
  aws+staging-data@company.com

But here’s the catch: if you close an account and want to recreate it with the same name, you need a different email. Keep a registry.

# Keep a local map of all account emails
locals {
  account_emails = {
    "sandbox-dev1"     = "aws+sandbox-dev1@company.com"
    "sandbox-dev2"     = "aws+sandbox-dev2@company.com"
    "prod-api"         = "aws+prod-api@company.com"
    # Recreated after closure - note the v2 suffix
    "sandbox-testing"  = "aws+sandbox-testing-v2@company.com"
  }
}

Also: the root email for each account can receive important AWS notifications (billing alerts, abuse reports, account recovery). Make sure these go to a monitored mailbox or mailing list, not someone’s personal inbox.

Putting It All Together

Here’s the provisioning flow with all the gotchas accounted for:

1. Verify portfolio association exists           (Gotcha 1)
2. Set session duration to 2h+                   (Gotcha 3)
3. Provision account via Service Catalog          
4. Wait for StackSet baseline completion          (Gotcha 4)
5. Verify execution role exists in new account    (Gotcha 2)
6. Assign SSO permission sets                     (Gotcha 5)
7. Validate IAM role names match templates        (Gotcha 6)
8. Run permission smoke test                      (Gotcha 7)
9. Verify no drift detected                       (Gotcha 8)

The full module structure:

account-provisioning/
├── modules/
│   ├── account/           # Service Catalog provisioned product
│   ├── baseline-check/    # Waits for StackSet completion
│   ├── sso-assignment/    # Permission set assignments
│   └── smoke-test/        # Post-provision validation
├── ou/
│   ├── sandbox/
│   ├── staging/
│   ├── prod/
│   └── platform/
├── bootstrap/
│   ├── member-role.yaml   # StackSet template for member IAM roles
│   └── permissions/       # Boundary policies per OU
└── accounts.tf            # Account definitions

Each new account is a single block:

module "pe_sandbox" {
  source = "./modules/account"

  name          = "platform-engineering-sandbox"
  email         = local.account_emails["pe-sandbox"]
  ou            = "Sandbox"
  sso_groups    = ["PlatformTeam", "Developers"]
  permission_sets = {
    "PlatformTeam" = "AdministratorAccess"
    "Developers"   = "ReadOnlyAccess"
  }
}

What I’d Do Differently

Two-stage pipeline from day one. Don’t try to provision and configure in a single Terraform run. The timing issues aren’t worth fighting.
Test with a throwaway account first. Don’t learn these lessons in a production OU. Create a sandbox account, break it, delete it, try again.
Keep a manual runbook alongside Terraform. When Service Catalog hangs or drift blocks you, knowing how to fix it through the console is faster than debugging Terraform state.
Use moved blocks aggressively. When you refactor your module structure (and you will), Terraform moved blocks save you from destroying and recreating accounts.
Monitor StackSet operations. Set up CloudWatch alarms on StackSet failures. Silent StackSet failures mean accounts exist without proper baselines - a security risk you won’t notice until an audit.

References

======================================== Control Tower + Service Catalog + Terraform

The gotchas they don’t put in the docs.