Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

TL;DR

Tried 5 different approaches to migrate ClickHouse from EC2 to ClickHouse Cloud
BACKUP/RESTORE via S3 failed due to version mismatch (v25.12 → v25.8) and SharedMergeTree engine requirements
Direct exports OOM’d on a memory-constrained t3.medium
The approach that worked: SSM port-forward + pipe SELECT FORMAT Native | INSERT FORMAT Native through a laptop, partitioned by table
ClickHouse Cloud’s version lag and engine restrictions are the biggest gotchas nobody warns you about

The Setup

Production ClickHouse running on a single t3.medium EC2 instance in eu-west-2. Private subnet, no public IP, no NAT gateway. About 500 MB of data across 7 tables - a mix of time-series data, pre-aggregated rollups, and a large replica table with 13M rows.

The target: ClickHouse Cloud, same region, SharedMergeTree engine under the hood.

Should be straightforward, right? ~500 MB of data. A few tables. Same region. How hard can it be?

Attempt 1: Direct Connectivity

First instinct - connect Cloud to EC2 and use remoteSecure() to pull data directly.

curl -s --max-time 5 https://<cloud-host>:8443/ping; echo $?
# 35

Exit code 35: TLS handshake failure. The EC2 is in a private subnet with no internet egress. There’s some route out (it didn’t timeout), but something - a proxy, firewall, or security group - is stripping TLS on non-standard ports.

Lesson: Don’t assume private subnet EC2 instances can reach ClickHouse Cloud. You need either a NAT gateway, a VPC endpoint (PrivateLink), or a different approach entirely.

We’d later set up a VPC Interface Endpoint (PrivateLink) for post-migration production traffic, but that wasn’t ready yet.

Attempt 2: S3 BACKUP/RESTORE

The EC2 had an IAM role with S3 access to a dedicated backup bucket. ClickHouse v25.12 supports native BACKUP TO S3(). This felt like the clean path.

Problem 1: Missing IAM Permission

BACKUP TABLE db.table1, TABLE db.table2, ...
TO S3('https://s3.eu-west-2.amazonaws.com/my-backup-bucket/migration/')

Failed with:

s3:DeleteObject on resource ".../.lock" because no identity-based policy allows the s3:DeleteObject action

The IAM policy only had PutObject, GetObject, ListBucket. ClickHouse’s backup process creates a .lock file and tries to delete it on completion.

You need s3:DeleteObject in your IAM policy for ClickHouse S3 backups. This isn’t documented clearly anywhere.

Fixed the policy, backup succeeded.

Problem 2: Version Mismatch

-- On ClickHouse Cloud (v25.8)
RESTORE TABLE ... FROM S3('...')

Code: 246. DB::Exception: Unknown version of serialization infos (1). Should be less or equal than 0

The EC2 was running v25.12. Cloud was on v25.8. The backup’s internal serialization.json format changed between versions and isn’t backwards-compatible.

BACKUP/RESTORE does not work across major ClickHouse versions. The backup format is tightly coupled to the server version. There’s no migration path flag that fully downgrades the format.

Problem 3: SharedMergeTree

Even after trying SETTINGS compatibility='25.8.1' on the backup:

Code: 36. DB::Exception: Tables in a Shared database must use engines that do not store data on disk. Attempted to create a table with engine 'MergeTree', which stores data on disk.

ClickHouse Cloud requires SharedMergeTree. You can pre-create tables (Cloud silently converts MergeTree → SharedMergeTree), but the backup’s part-level format was still incompatible.

Three separate failures in the BACKUP/RESTORE path:

IAM permissions (fixable)
Version mismatch (not fixable without upgrading Cloud)
Engine restriction (not fixable without pre-creating tables AND having a compatible backup format)

Attempt 3: Export to S3 as Parquet/Native/CSV

Fine. No backup/restore. Just INSERT INTO FUNCTION s3(...) from the EC2.

INSERT INTO FUNCTION s3(
  'https://s3.eu-west-2.amazonaws.com/my-bucket/export/table.parquet',
  'Parquet'
)
SELECT * FROM db.my_table

This worked for the small tables. But the largest table (13M rows, 310 MB) OOM’d every time:

Code: 241. DB::Exception: (total) memory limit exceeded: would use 3.37 GiB, current RSS: 2.12 GiB, maximum: 3.37 GiB

A t3.medium has 4 GB RAM. The ClickHouse server process was already using ~1.9 GB (NATS engine tables consuming memory for live streaming ingestion). That left barely enough for the export.

What we tried:

Smaller chunk sizes with LIMIT/OFFSET → OOM’d on OFFSET scan
--max_block_size=1024 → still OOM’d (server-level memory, not per-query)
--max_threads=1 → still OOM’d
Streaming to stdout with FORMAT CSVWithNames | gzip → still OOM’d
clickhouse-local to bypass the server → directory locked by running server

What we couldn’t do:

Restart the server to free memory - this is production
Detach the NATS tables - they’re actively ingesting live data
Drop OS caches - tried echo 3 > /proc/sys/vm/drop_caches, didn’t help enough

The fundamental problem: on a memory-constrained instance with a live workload, you can’t export large tables through the ClickHouse server without competing for memory.

Attempt 4: The Approach That Actually Worked

Dumb simple. Pipe data through a laptop.

Setup

Terminal 1: SSM port-forward to make EC2 ClickHouse available on localhost:

aws ssm start-session \
  --target i-xxxxxxxxxxxx \
  --region eu-west-2 \
  --document-name AWS-StartPortForwardingSession \
  --parameters '{"portNumber":["9000"],"localPortNumber":["9000"]}'

Terminal 2: Pipe data from source to target:

clickhouse client --host 127.0.0.1 --port 9000 \
  --user default --password '***' \
  --query "SELECT * FROM db.my_table FORMAT Native" \
| clickhouse client --host <cloud-host> --secure \
  --user default --password '***' \
  --query "INSERT INTO db.my_table FORMAT Native"

Your laptop acts as a dumb pipe. Data streams from EC2 → SSM tunnel → your machine → HTTPS → ClickHouse Cloud. No disk buffering, no S3 intermediary.

Handling the Memory-Constrained Tables

Small tables piped directly - no issues.

For the larger tables that OOM’d on the EC2, we split by partition or column value:

# Split by partition (for partitioned tables)
for part in 202001 202002 202003 ... 202602; do
  clickhouse client --host 127.0.0.1 --port 9000 \
    --user default --password '***' \
    --max_threads=1 --max_block_size=65536 \
    --query "SELECT * FROM db.rollups WHERE toYYYYMM(ts_minute) = ${part} FORMAT Native" \
  | clickhouse client --host <cloud-host> --secure \
    --user default --password '***' \
    --query "INSERT INTO db.rollups FORMAT Native"
done

# Split by column value (for unpartitioned tables)
for sym in value_a value_b value_c ...; do
  clickhouse client --host 127.0.0.1 --port 9000 \
    --user default --password '***' \
    --max_threads=1 --max_block_size=65536 \
    --query "SELECT * FROM db.large_table WHERE category = '${sym}' FORMAT Native" \
  | clickhouse client --host <cloud-host> --secure \
    --user default --password '***' \
    --query "INSERT INTO db.large_table FORMAT Native"
done

Each query only reads a slice of data, keeping EC2 memory usage within bounds.

This worked. All tables migrated.

Things Nobody Tells You About ClickHouse Cloud Migration

1. Version Mismatch Kills BACKUP/RESTORE

ClickHouse Cloud manages its own version and you can’t control it. If your self-hosted version is newer than Cloud’s version, BACKUP/RESTORE simply won’t work. There’s no compatibility layer that fully handles this.

Check versions before you plan anything:

-- Source
SELECT version() -- e.g. 25.12.1

-- Target (Cloud)
SELECT version() -- e.g. 25.8.1

2. SharedMergeTree Changes Everything

Cloud uses SharedMergeTree internally. You can write CREATE TABLE ... ENGINE = MergeTree in DDL and Cloud converts it, but the on-disk part format is different. Backup files contain raw parts with the original engine’s format - Cloud can’t ingest them.

3. NATS Engine Doesn’t Exist on Cloud

If you’re using ClickHouse’s built-in NATS engine for streaming ingestion, there’s no equivalent on Cloud. You need an external consumer that subscribes to NATS and inserts into Cloud via HTTPS.

The materialized view chain still works - if your MVs trigger on INSERT to a base table, they’ll fire regardless of whether the insert came from NATS or an HTTP client. You just need to replace the source.

4. IAM Needs DeleteObject for S3 Backups

ClickHouse creates and deletes a .lock file during backup. Your IAM policy needs s3:PutObject, s3:GetObject, s3:ListBucket, and s3:DeleteObject.

5. Memory-Constrained Instances Can’t Export Large Tables

On a t3.medium (4 GB), if the server is already using 2 GB for live workloads, you don’t have headroom for exporting tables that need to decompress columns into memory. Even streaming to stdout OOMs because the server buffers the read, not the client.

Partition your exports. Or use a bigger instance for the migration window.

6. PrivateLink Is Required, Not Optional

If your EC2 is in a private subnet (no NAT), you need a VPC Interface Endpoint (PrivateLink) to reach ClickHouse Cloud. This is also the “reverse private endpoint” you’ll see referenced in ClickHouse docs - it’s how your VPC talks to Cloud without traversing the public internet.

Set this up before the migration, not during.

The Migration Checklist I Wish I Had

Before starting any ClickHouse → Cloud migration:

Compare versions - if source > target, BACKUP/RESTORE won’t work
Check table engines - NATS, Kafka, MySQL engines won’t migrate to Cloud
Check instance memory - can it handle concurrent reads during export?
Set up PrivateLink first - you’ll need it for migration AND production
IAM policy - ensure s3:DeleteObject if using S3 as intermediary
Pre-create tables on Cloud - Cloud auto-converts to SharedMergeTree
Plan MV recreation order - create target tables first, then MVs
Have a pipe-through-laptop fallback - it’s ugly but it works

Final Thoughts

We tried 4 different approaches. Three failed due to version mismatches, engine restrictions, and memory constraints. The one that worked was the simplest: pipe data through a laptop using SSM port forwarding and FORMAT Native.

For ~500 MB of data, the whole migration took about an hour of actual data transfer (most of the time was spent figuring out why the “proper” approaches didn’t work).

If ClickHouse Cloud let you control the version, or if BACKUP/RESTORE had a real cross-version compatibility mode, this would’ve been a 10-minute job. Instead, it was a full afternoon of debugging.

The takeaway: always check versions first. And keep a simple fallback plan - sometimes the “wrong” approach is the only one that works.