Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

At a large company with 500+ microservices, we had a common pattern: event sourcing. Services would append events to SQL Server or Oracle tables, building up an audit trail of every state change. The problem? These tables grew massive – hundreds of millions of records – queries slowed down, and the on-prem databases became bottlenecks.

The solution was migrating event data to DynamoDB – purpose-built for high-throughput, append-heavy workloads with predictable latency at any scale.

I was a Senior Platform Engineer supporting the DBA team on this migration. My role was building the DMS infrastructure in Terraform, designing the replication pipelines, and making sure we could migrate hundreds of millions of records without impacting production.

This post covers the technical implementation – the Terraform modules, the DMS configuration, the tricks we used to parallelise migrations, and the lessons learned.

The Migration Architecture

We had two main migration paths:

SQL Server → DynamoDB: Order service events (e-commerce domain)
Oracle → DynamoDB: Customer authentication data

Both used AWS Database Migration Service (DMS) with a key architectural decision: database views as the migration source.

┌─────────────────────┐
│   SQL Server        │
│   (on-prem)         │
│                     │
│  ┌───────────────┐  │
│  │ Source Tables │  │
│  └───────┬───────┘  │
│          │          │
│  ┌───────▼───────┐  │
│  │  Migration    │  │     ┌─────────────────┐     ┌─────────────────┐
│  │  View (VW)    │──────►│  DMS Instance   │────►│   DynamoDB      │
│  └───────────────┘  │     │  (replication)  │     │   (target)      │
└─────────────────────┘     └─────────────────┘     └─────────────────┘
                                    │
                            Partitioned by ID range
                            (parallel tasks)

Why Views Instead of Tables?

DMS can migrate tables directly, but we used views for several reasons:

Pre-transformation: The view transforms SQL data into DynamoDB’s PK/SK format before DMS touches it
Column filtering: Exclude columns that shouldn’t migrate
Data enrichment: Join related tables to denormalise
Partitioning: Add computed columns for range filtering

The DBAs created views like vw_Dynamodbmigration that did the heavy lifting:

-- Simplified example of the migration view
CREATE VIEW dbo.vw_Dynamodbmigration AS
SELECT 
    Id,
    CONCAT('agg#', AggregateId) AS PK,
    CONCAT('evt#', CONVERT(VARCHAR, EventTimestamp, 126)) AS SK,
    'event' AS itemtype,
    Version AS Ver,
    ConversationId AS ConvId,
    CONVERT(VARCHAR, CreatedAt, 126) AS created,
    EventData AS itemdata,
    DATEDIFF(SECOND, '1970-01-01', DATEADD(YEAR, 7, CreatedAt)) AS expiry,
    1 AS Migrated
FROM dbo.OrderEvents;

This meant DMS saw clean, DynamoDB-ready data. No complex transformation rules in DMS itself.

Terraform Infrastructure

Core Module Structure

dms-migration/
├── main.tf              # Provider, context module
├── endpoints.tf         # Source and target endpoints
├── replication.tf       # DMS instance and tasks
├── secrets.tf           # Database credentials
├── task_settings.json   # DMS task configuration
└── mappings/
    ├── table_mappings1.json
    ├── table_mappings2.json
    ├── table_mappings3.json
    └── table_mappings4.json

Network and Security Setup

# main.tf
locals {
  tags            = module.context.tags
  vpc_id          = data.aws_vpc.vpc.id
  subnet_ids      = data.aws_subnets.private.ids
  security_groups = data.aws_security_groups.service.ids
  execution_role  = data.aws_iam_role.execution_role.arn
  dms_subnet_id   = aws_dms_replication_subnet_group.dms_subnet_group.id
}

data "aws_vpc" "vpc" {
  tags = {
    Name = "vpc-${module.context.environment}"
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.vpc.id]
  }
  filter {
    name   = "tag:Name"
    values = ["sn-${module.context.environment}-*-private*"]
  }
}

data "aws_security_groups" "service" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.vpc.id]
  }
  tags = {
    Type = "custom"
    Role = module.context.component_name
  }
}

# DMS needs a subnet group spanning multiple AZs
resource "aws_dms_replication_subnet_group" "dms_subnet_group" {
  replication_subnet_group_description = "DMS replication subnet group"
  replication_subnet_group_id          = "dms-sn-${module.context.environment}-${module.context.component_name}"
  subnet_ids                           = local.subnet_ids
}

Secrets Management (The Right Way)

The initial approach from the DBAs looked like this:

# ❌ What the DBAs initially did - DON'T DO THIS
resource "random_password" "db_password" {
  length           = 24
  special          = true
  override_special = "!#$%&'()*+,-.:<=>?[\\]^_`{|}~"
}

resource "aws_secretsmanager_secret_version" "db_credentials" {
  secret_id     = aws_secretsmanager_secret.db_credentials.id
  secret_string = random_password.db_password.result
}

I had to flag this in code review. The problem: Terraform stores the password in state. Even though the state file is encrypted at rest, the secret is still visible in plaintext to anyone with state access. It also appears in plan output, logs, and CI/CD pipelines.

Two solutions I proposed:

Option 1: SOPS encryption

Encrypt secrets with SOPS before committing, decrypt at apply time. Works well but adds tooling complexity.

Option 2: Bootstrap manually, read with data block ✅ (what we chose)

Create the secret once manually (or via a separate bootstrap script), then reference it in Terraform:

# ✅ The correct approach - secret exists outside Terraform
# Bootstrap step (run once, outside Terraform):
# aws secretsmanager create-secret \
#   --name "prod-ordersvc-dms-credentials" \
#   --secret-string '{"username":"DMSMigrationUser","password":"<secure-password>"}'

# Then in Terraform, just read it:
data "aws_secretsmanager_secret" "db_credentials" {
  name = "${local.env}-${local.service_name}-dms-credentials"
}

data "aws_secretsmanager_secret_version" "db_credentials" {
  secret_id = data.aws_secretsmanager_secret.db_credentials.id
}

locals {
  db_creds = jsondecode(data.aws_secretsmanager_secret_version.db_credentials.secret_string)
}

# Use in endpoint
resource "aws_dms_endpoint" "source" {
  # ...
  username = local.db_creds["username"]
  password = local.db_creds["password"]
}

We went with Option 2. The bootstrap is a one-time operation – the DBA creates the secret manually in Secrets Manager (or via a separate privileged pipeline), and Terraform only ever reads it. The password never touches Terraform state.

For the bootstrap, I gave the DBAs a simple script:

#!/bin/bash
# bootstrap-dms-secret.sh
# Run once per environment to create the DMS credentials secret

ENV=${1:-sandbox}
SERVICE=${2:-ordersvc}
SECRET_NAME="${ENV}-${SERVICE}-dms-credentials"

# Prompt for credentials (don't pass as arguments - they'd appear in shell history)
read -p "DMS Username: " DMS_USER
read -sp "DMS Password: " DMS_PASS
echo

aws secretsmanager create-secret \
  --name "$SECRET_NAME" \
  --description "DMS migration credentials for ${SERVICE} in ${ENV}" \
  --secret-string "{\"username\":\"${DMS_USER}\",\"password\":\"${DMS_PASS}\"}"

echo "✅ Secret created: $SECRET_NAME"

This keeps secrets out of version control, out of Terraform state, and out of CI/CD logs.

SQL Server Source Endpoint

# endpoints.tf
locals {
  env             = module.context.environment
  source_instance = "prod"
  service_name    = "ordersvc"
  source_server   = "10.251.6.91"          # On-prem SQL Server IP
  source_db       = "ECommerce_Order"
  source_engine   = "sqlserver"
  source_port     = 1433
  source_login    = "DMSMigrationUser"
}

resource "aws_dms_endpoint" "source" {
  endpoint_id   = "${local.env}-${local.source_instance}-source-${local.source_engine}-dms-endpoint"
  endpoint_type = "source"
  engine_name   = local.source_engine
  
  server_name   = local.source_server
  port          = local.source_port
  database_name = local.source_db
  username      = local.db_creds["username"]
  password      = local.db_creds["password"]
  
  ssl_mode      = "none"  # Internal network, SSL handled at network layer
}

Oracle Source Endpoint

For Oracle migrations, the configuration differs slightly:

resource "aws_dms_endpoint" "oracle_source" {
  endpoint_id   = "source-dms-${module.context.environment}-${local.source_instance}-${module.context.component_name}"
  endpoint_type = "source"
  engine_name   = "oracle"
  
  server_name   = "oracle-prod.internal.example.com"
  port          = 1521
  database_name = "ORCL"
  username      = "dms_user"
  password      = aws_secretsmanager_secret_version.oracle_credentials.secret_string
  
  ssl_mode      = "none"
  
  # Oracle-specific: Use Binary Reader instead of LogMiner for better performance
  extra_connection_attributes = "useLogMinerReader=N;useBfile=Y"
}

The useLogMinerReader=N;useBfile=Y setting is important for Oracle – Binary Reader is faster and has fewer permission requirements than LogMiner.

DynamoDB Target Endpoint

locals {
  target_engine   = "dynamodb"
  dms_service_role = "arn:aws:iam::123456789012:role/dms-dynamodb-access-role"
}

resource "aws_dms_endpoint" "target" {
  endpoint_id         = "${local.env}-${local.target_instance}-target-${local.target_engine}-dms-endpoint"
  endpoint_type       = "target"
  engine_name         = local.target_engine
  
  # DynamoDB doesn't use connection strings – it uses IAM
  service_access_role = local.dms_service_role
}

The IAM role needs permissions to write to the target DynamoDB table:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:PutItem",
        "dynamodb:CreateTable",
        "dynamodb:DescribeTable",
        "dynamodb:DeleteItem",
        "dynamodb:UpdateItem",
        "dynamodb:BatchWriteItem"
      ],
      "Resource": "arn:aws:iam::123456789012:table/order-events-*"
    }
  ]
}

Replication Instance

resource "aws_dms_replication_instance" "main" {
  replication_instance_id      = "${local.env}-${local.service_name}-repl-instance"
  replication_instance_class   = "dms.t3.medium"  # Size based on data volume
  allocated_storage            = 50
  
  engine_version               = "3.5.2"
  multi_az                     = false            # Single AZ for cost, enable for prod
  publicly_accessible          = false
  
  vpc_security_group_ids       = local.security_groups
  replication_subnet_group_id  = local.dms_subnet_id
  
  apply_immediately            = true
  auto_minor_version_upgrade   = true
  preferred_maintenance_window = "sun:10:30-sun:14:30"
}

Instance sizing guidelines:

dms.t3.medium: < 100GB, low throughput
dms.r5.large: 100GB–1TB, medium throughput
dms.r5.2xlarge: 1TB+, high throughput or many parallel tasks

Partitioned Replication Tasks

Here’s the key insight: migrating hundreds of millions of records in a single task is slow. We split the migration into parallel tasks by ID range.

resource "aws_dms_replication_task" "migration" {
  count = 4  # Four parallel tasks
  
  replication_task_id       = "${local.source_instance}-${local.service_name}-repl-task-${count.index + 1}"
  migration_type            = "full-load"
  
  replication_instance_arn  = aws_dms_replication_instance.main.replication_instance_arn
  source_endpoint_arn       = aws_dms_endpoint.source.endpoint_arn
  target_endpoint_arn       = aws_dms_endpoint.target.endpoint_arn
  
  replication_task_settings = file("task_settings.json")
  table_mappings            = file("mappings/table_mappings${count.index + 1}.json")
}

Each task has its own table mapping file with a different ID range filter:

mappings/table_mappings1.json (IDs 1–100M)

{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "select-events-range-1",
      "object-locator": {
        "schema-name": "dbo",
        "table-name": "vw_Dynamodbmigration",
        "table-type": "view"
      },
      "rule-action": "include",
      "filters": [
        {
          "filter-type": "source",
          "column-name": "Id",
          "filter-conditions": [
            {
              "filter-operator": "between",
              "start-value": "1",
              "end-value": "100000000"
            }
          ]
        }
      ]
    },
    {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "map-to-dynamodb",
      "rule-action": "map-record-to-record",
      "object-locator": {
        "schema-name": "dbo",
        "table-name": "vw_Dynamodbmigration",
        "table-type": "view"
      },
      "target-table-name": "order-events-prod",
      "mapping-parameters": {
        "partition-key-name": "PK",
        "sort-key-name": "SK",
        "exclude-columns": ["Id"],
        "attribute-mappings": [
          {
            "target-attribute-name": "PK",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "agg#${PK}"
          },
          {
            "target-attribute-name": "SK",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${SK}"
          },
          {
            "target-attribute-name": "itemtype",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${itemtype}"
          },
          {
            "target-attribute-name": "Ver",
            "attribute-type": "scalar",
            "attribute-sub-type": "number",
            "value": "${Ver}"
          },
          {
            "target-attribute-name": "ConvId",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${ConvId}"
          },
          {
            "target-attribute-name": "created",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${created}"
          },
          {
            "target-attribute-name": "itemdata",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${itemdata}"
          },
          {
            "target-attribute-name": "expiry",
            "attribute-type": "scalar",
            "attribute-sub-type": "number",
            "value": "${expiry}"
          },
          {
            "target-attribute-name": "Migrated",
            "attribute-type": "scalar",
            "attribute-sub-type": "number",
            "value": "${Migrated}"
          }
        ]
      }
    }
  ]
}

mappings/table_mappings2.json (IDs 100M–200M)

Same structure, different filter:

"filter-conditions": [
  {
    "filter-operator": "between",
    "start-value": "100000001",
    "end-value": "200000000"
  }
]

And so on for tasks 3 and 4. This gave us 4x parallelism on the source database.

Task Settings

The task_settings.json controls DMS behaviour:

{
  "Logging": {
    "EnableLogging": true,
    "EnableLogContext": true,
    "LogComponents": [
      {"Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "TRANSFORMATION"},
      {"Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "SOURCE_UNLOAD"},
      {"Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "TARGET_LOAD"},
      {"Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "SOURCE_CAPTURE"},
      {"Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "TARGET_APPLY"},
      {"Severity": "LOGGER_SEVERITY_DEFAULT", "Id": "TASK_MANAGER"}
    ]
  },
  "FullLoadSettings": {
    "CommitRate": 10000,
    "MaxFullLoadSubTasks": 8,
    "TransactionConsistencyTimeout": 600,
    "TargetTablePrepMode": "DO_NOTHING"
  },
  "TargetMetadata": {
    "SupportLobs": true,
    "LimitedSizeLobMode": true,
    "LobMaxSize": 32,
    "ParallelLoadThreads": 10,
    "ParallelLoadBufferSize": 100
  },
  "ErrorBehavior": {
    "DataErrorPolicy": "LOG_ERROR",
    "DataErrorEscalationPolicy": "SUSPEND_TABLE",
    "TableErrorPolicy": "SUSPEND_TABLE",
    "TableErrorEscalationPolicy": "STOP_TASK",
    "ApplyErrorInsertPolicy": "LOG_ERROR",
    "ApplyErrorUpdatePolicy": "LOG_ERROR",
    "ApplyErrorDeletePolicy": "IGNORE_RECORD",
    "FullLoadIgnoreConflicts": true,
    "FailOnNoTablesCaptured": true
  },
  "ChangeProcessingTuning": {
    "CommitTimeout": 1,
    "BatchApplyMemoryLimit": 500,
    "MemoryLimitTotal": 1024,
    "MemoryKeepTime": 60,
    "StatementCacheSize": 50
  }
}

Key settings explained:

Setting	Value	Why
`CommitRate`	10000	Batch size for commits – higher = faster but more memory
`MaxFullLoadSubTasks`	8	Parallel threads within each task
`ParallelLoadThreads`	10	Target-side parallelism
`TargetTablePrepMode`	DO_NOTHING	Don’t drop/recreate DynamoDB table
`FullLoadIgnoreConflicts`	true	Skip duplicate key errors (idempotent)
`LimitedSizeLobMode`	true	Handle large text/blob columns efficiently

Customer Password Migration (Oracle → DynamoDB)

A separate migration handled customer authentication data from Oracle:

locals {
  cpm_source_instance      = "oracle-prod"
  cpm_destination_instance = "auth-service"
  cpm_port                 = 1521
}

resource "aws_dms_endpoint" "cpm_source" {
  endpoint_id                 = "source-dms-${module.context.environment}-oracle-${module.context.component_name}"
  endpoint_type               = "source"
  engine_name                 = "oracle"
  
  server_name                 = "oracle.internal.example.com"
  port                        = local.cpm_port
  database_name               = "AUTH_DB"
  username                    = "dms_user"
  password                    = aws_secretsmanager_secret_version.cpm.secret_string
  
  ssl_mode                    = "none"
  extra_connection_attributes = "useLogMinerReader=N;useBfile=Y"
}

resource "aws_dms_endpoint" "cpm_target" {
  endpoint_id         = "destination-dms-${module.context.environment}-dynamodb-${module.context.component_name}"
  endpoint_type       = "target"
  engine_name         = "dynamodb"
  service_access_role = local.execution_role
}

resource "aws_dms_replication_task" "cpm_task" {
  migration_type            = "full-load"
  replication_task_id       = "replication-task-${module.context.environment}-oracle-${module.context.component_name}"
  
  replication_instance_arn  = aws_dms_replication_instance.cpm.replication_instance_arn
  source_endpoint_arn       = aws_dms_endpoint.cpm_source.endpoint_arn
  target_endpoint_arn       = aws_dms_endpoint.cpm_target.endpoint_arn
  
  replication_task_settings = file("task-settings/customer-migration.json")
  table_mappings            = file("mappings/customer-password-migration.json")
}

The mapping for this was simpler – just a partition key, no sort key:

{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "select-customer-passwords",
      "object-locator": {
        "schema-name": "AUTH",
        "table-name": "CUSTOMER_PASSWORD_MIGRATION_VW",
        "table-type": "view"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "map-to-dynamodb",
      "rule-action": "map-record-to-record",
      "object-locator": {
        "schema-name": "AUTH",
        "table-name": "CUSTOMER_PASSWORD_MIGRATION_VW",
        "table-type": "view"
      },
      "target-table-name": "customer-credentials-prod",
      "mapping-parameters": {
        "partition-key-name": "PK",
        "attribute-mappings": [
          {
            "target-attribute-name": "PK",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${PK}"
          }
        ]
      }
    }
  ]
}

Running the Migration

Sandbox Testing First

We always tested in a sandbox environment before production:

# Deploy to sandbox
cd environments/sandbox
terraform init
terraform plan
terraform apply

# Monitor in AWS Console
# DMS → Database migration tasks → Select task → Table statistics

Monitoring During Migration

DMS provides CloudWatch metrics. Key ones to watch:

CDCLatencySource      # Lag behind source (for CDC migrations)
CDCLatencyTarget      # Time to apply changes to target
FullLoadThroughputBandwidthTarget  # MB/s to target
FullLoadThroughputRowsTarget       # Rows/s to target

We set up CloudWatch alarms:

resource "aws_cloudwatch_metric_alarm" "dms_task_failed" {
  alarm_name          = "dms-task-failed-${local.service_name}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "TaskStatus"
  namespace           = "AWS/DMS"
  period              = 60
  statistic           = "Maximum"
  threshold           = 0
  alarm_description   = "DMS task has failed"
  
  dimensions = {
    ReplicationInstanceIdentifier = aws_dms_replication_instance.main.replication_instance_id
    ReplicationTaskIdentifier     = aws_dms_replication_task.migration[0].replication_task_id
  }
  
  alarm_actions = [aws_sns_topic.alerts.arn]
}

Production Cutover

The cutover process:

Stop application writes to source database
Wait for DMS tasks to complete (status = “Load complete”)
Validate row counts – source view vs DynamoDB item count
Switch application to read from DynamoDB
Monitor for errors, latency, throughput
Decommission DMS resources after stability period

# Check task status
aws dms describe-replication-tasks \
  --filters Name=replication-task-id,Values=prod-ordersvc-repl-task-1 \
  --query 'ReplicationTasks[0].Status'

# Get table statistics
aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:eu-west-1:123456789012:task:XXX \
  --query 'TableStatistics[*].{Table:TableName,Inserts:Inserts,Updates:Updates,Deletes:Deletes,FullLoadRows:FullLoadRows}'

Problems We Hit

1. View Performance

The migration view initially did a full table scan on every DMS read. We added an index specifically for the migration:

CREATE INDEX IX_OrderEvents_Migration 
ON dbo.OrderEvents (Id) 
INCLUDE (AggregateId, EventTimestamp, Version, ConversationId, CreatedAt, EventData);

2. DynamoDB Throttling

Initial migrations hit write throttling. Solutions:

Switch to on-demand capacity mode during migration
Pre-warm the table with provisioned capacity before migration
Reduce CommitRate in task settings to slow down writes

3. LOB Column Handling

Large EventData columns (JSON blobs) caused issues. The fix was enabling LimitedSizeLobMode with an appropriate LobMaxSize:

"TargetMetadata": {
  "LimitedSizeLobMode": true,
  "LobMaxSize": 32  // KB - increase if you have larger blobs
}

4. Network Timeouts

On-prem to AWS connectivity over Direct Connect occasionally timed out. We increased DMS buffer settings:

"StreamBufferSettings": {
  "StreamBufferCount": 3,
  "StreamBufferSizeInMB": 8,
  "CtrlStreamBufferSizeInMB": 5
}

5. Duplicate Records

Running tasks multiple times (after failures) created duplicates. DynamoDB’s PutItem is idempotent for the same PK/SK, but we added a Migrated flag to track:

{
  "target-attribute-name": "Migrated",
  "attribute-type": "scalar",
  "attribute-sub-type": "number",
  "value": "1"
}

Lessons Learned

1. Views Are Your Friend

Pre-transforming data in database views is far easier than complex DMS transformation rules. The DBAs know SQL – let them handle the transformation.

2. Partition by ID Range

Parallel tasks dramatically speed up migration. We went from 48 hours to 12 hours by using 4 parallel tasks.

3. Test in Sandbox First

Always. Every time. We caught countless issues in sandbox that would have been production incidents.

4. DynamoDB On-Demand for Migrations

Provisioned capacity during migration means constant throttling adjustments. Switch to on-demand, migrate, then switch back if needed for cost.

5. Keep DMS Resources Separate

Don’t share replication instances across unrelated migrations. Isolation prevents one bad task from affecting others.

6. Log Everything

Enable all DMS logging components. When something fails at 3am, you’ll want those logs.

Summary

Migrating event store data from SQL Server and Oracle to DynamoDB required:

Database views for pre-transformation
Partitioned DMS tasks for parallelism
Careful task settings for performance and error handling
Terraform for reproducible infrastructure
Sandbox testing before every production migration

The result: hundreds of millions of records migrated with minimal downtime, and services now reading from DynamoDB with single-digit millisecond latency instead of struggling with aging SQL Server instances.

Planning a DMS migration or hit issues I didn’t cover? Find me on LinkedIn.