Elastic Cloud Setup Guide - From Zero to Production

Running your own Elasticsearch cluster is powerful but operationally heavy. Upgrades, security patches, scaling, backups - it adds up. Elastic Cloud handles all of that, letting you focus on using the stack rather than managing it.

This guide walks through setting up Elastic Cloud properly - not just clicking through the wizard, but configuring it for real production use with proper security, lifecycle management, and cost optimization.

Why Elastic Cloud?

Before diving in, here’s why you might choose Elastic Cloud over self-managed:

Pros:

Fully managed upgrades (one-click)
Automated backups and snapshots
Built-in security (TLS, RBAC, SSO)
Cross-cloud deployment (AWS, GCP, Azure)
Autoscaling options
Elastic’s support team
Always latest features

Cons:

Higher cost than self-managed (roughly 2-3x)
Less control over infrastructure
Data residency concerns (though many regions available)
Vendor lock-in

For most teams, the operational overhead savings outweigh the cost difference.

Step 1: Create Your Elastic Cloud Account

Go to cloud.elastic.co
Sign up (email or SSO with Google/Microsoft)
Verify your email
You get a 14-day free trial with $400 credit

Step 2: Create Your First Deployment

Choosing a Deployment Template

Elastic Cloud offers several pre-configured templates:

Template	Best For	Components
General Purpose	Most workloads	ES + Kibana balanced
Observability	Logs, metrics, APM	Optimized for time-series
Security	SIEM, threat detection	Elastic Security features
Vector Search	AI/ML, embeddings	ML nodes included
Enterprise Search	Web/app search	App Search + Workplace Search

For this guide, we’ll use Observability - the most common use case.

Deployment Configuration

Click Create deployment and configure:

1. Name: Choose something meaningful

prod-logs-eu-west-1
staging-observability

2. Cloud Provider & Region:

Choose based on where your data sources are
Lower latency = better ingestion performance
Consider data residency requirements

3. Hardware Profile:

For a production observability deployment, I recommend starting with:

Elasticsearch:
  - Hot tier: 2 zones × 4GB RAM (8GB total)
  - Warm tier: 2 zones × 2GB RAM (4GB total) - optional initially
  - Cold tier: None initially

Kibana:
  - 1 zone × 1GB RAM

Integrations Server (APM + Fleet):
  - 1 zone × 1GB RAM

You can scale up later - Elastic Cloud makes this easy.

4. Version:

Always choose the latest stable version (8.x)
Avoid pre-release versions for production

Advanced Settings

Expand Advanced settings for more control:

Snapshot Repository:

Enabled by default (found repository)
Snapshots every 30 minutes
Retained for 100 snapshots or ~2 days

Plugins:

Most plugins are pre-installed
Custom plugins require support ticket

Click Create deployment and wait 5-10 minutes.

Step 3: Save Your Credentials

When deployment completes, you’ll see:

Elasticsearch endpoint: https://my-deployment.es.eu-west-1.aws.found.io:9243
Kibana endpoint: https://my-deployment.kb.eu-west-1.aws.found.io:9243

Username: elastic
Password: <generated-password>

Save these immediately - the password is only shown once.

If you lose it:

Go to deployment → Security
Reset the elastic user password

Step 4: Initial Kibana Setup

Access Kibana

Click the Kibana link or navigate to your Kibana endpoint
Log in with the elastic superuser
You’ll see the Kibana home page

Create Your First Space

Spaces let you organize dashboards and access by team:

Go to Stack Management → Kibana → Spaces
Create spaces like:
- production-logs
- security-team
- platform-team

Set Up Index Patterns (Data Views)

Before you can visualize data, you need data views:

Go to Stack Management → Kibana → Data Views
Click Create data view
For logs: logs-* or filebeat-*
For metrics: metrics-* or metricbeat-*
Select @timestamp as the time field

Step 5: Security Configuration

Create Service Accounts

Never use the elastic superuser for applications. Create dedicated accounts:

Via Kibana:

Stack Management → Security → Users
Create users for each service:

Username: logstash_writer
Role: logstash_writer (built-in)
Password: <strong-password>

Username: beats_writer  
Role: beats_writer (built-in)
Password: <strong-password>

Username: apm_writer
Role: apm_user (built-in)
Password: <strong-password>

Via API (for automation):

# Create a custom role
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_security/role/logs_writer" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "cluster": ["monitor", "manage_index_templates", "manage_ilm"],
  "indices": [
    {
      "names": ["logs-*", "filebeat-*"],
      "privileges": ["create_index", "write", "create", "auto_configure"]
    }
  ]
}'

# Create a user with that role
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_security/user/logs_writer" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "password": "your-secure-password",
  "roles": ["logs_writer"],
  "full_name": "Logs Writer Service Account"
}'

API Keys (Recommended)

For machine-to-machine auth, API keys are better than passwords:

# Create an API key for Filebeat
curl -X POST "https://your-deployment.es.region.aws.found.io:9243/_security/api_key" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "name": "filebeat-prod-servers",
  "role_descriptors": {
    "filebeat_writer": {
      "cluster": ["monitor", "read_ilm"],
      "indices": [
        {
          "names": ["filebeat-*", "logs-*"],
          "privileges": ["create_index", "create_doc", "auto_configure"]
        }
      ]
    }
  },
  "expiration": "365d"
}'

Response:

{
  "id": "abc123",
  "name": "filebeat-prod-servers",
  "api_key": "xyz789...",
  "encoded": "YWJjMTIzOnhejjc4OS4u"  // Base64(id:api_key) - use this
}

Use the encoded value in your Beats config:

output.elasticsearch:
  hosts: ["https://your-deployment.es.region.aws.found.io:9243"]
  api_key: "YWJjMTIzOnhejjc4OS4u"

Enable SSO (Optional but Recommended)

For team access, configure SAML or OIDC:

Deployment → Security → User authentication
Configure your identity provider (Okta, Azure AD, Google)
Map groups to Elastic roles

Step 6: Index Lifecycle Management (ILM)

ILM automatically manages index rollover, tiering, and deletion. This is critical for cost control.

Understanding Data Tiers

Hot Tier  →  Warm Tier  →  Cold Tier  →  Frozen Tier  →  Delete
(fast SSD)   (cheaper)    (cheapest)    (S3 backed)
 0-7 days    7-30 days    30-90 days    90-365 days    365+ days

Create an ILM Policy

Via Kibana:

Stack Management → Index Lifecycle Policies
Click Create policy

Via API:

curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_ilm/policy/logs-policy" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d",
            "max_docs": 100000000
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "allocate": {
            "number_of_replicas": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": {
            "priority": 0
          },
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

Apply ILM to Index Templates

Create an index template that uses your ILM policy:

curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_index_template/logs-template" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy",
      "index.lifecycle.rollover_alias": "logs"
    }
  },
  "composed_of": [],
  "priority": 200,
  "data_stream": {}
}'

Step 7: Data Ingestion Setup

Option A: Elastic Agent (Recommended)

Elastic Agent is the unified way to collect all data types:

Kibana → Fleet → Add agent
Create an agent policy (e.g., “Production Servers”)
Add integrations:
- System (CPU, memory, disk)
- Custom logs
- Docker/Kubernetes
- Cloud provider metrics
Install on your servers:

# Download
curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz
tar xzvf elastic-agent-8.12.0-linux-x86_64.tar.gz
cd elastic-agent-8.12.0-linux-x86_64

# Enroll (Fleet URL and token from Kibana)
sudo ./elastic-agent install \
  --url=https://your-fleet-server.es.region.aws.found.io:443 \
  --enrollment-token=YOUR_ENROLLMENT_TOKEN

Option B: Filebeat (Logs Only)

For simpler log collection:

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/nginx/*.log
    fields:
      environment: production
      service: nginx

output.elasticsearch:
  hosts: ["https://your-deployment.es.region.aws.found.io:9243"]
  api_key: "your-api-key"
  index: "logs-nginx-%{+yyyy.MM.dd}"

setup.ilm.enabled: true
setup.ilm.rollover_alias: "logs-nginx"
setup.ilm.policy_name: "logs-policy"

Option C: Logstash (Complex Processing)

For advanced transformations:

# logstash.conf
input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  
  geoip {
    source => "clientip"
  }
  
  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
  }
}

output {
  elasticsearch {
    hosts => ["https://your-deployment.es.region.aws.found.io:9243"]
    api_key => "your-api-key"
    data_stream => true
    data_stream_type => "logs"
    data_stream_dataset => "nginx"
    data_stream_namespace => "production"
  }
}

Step 8: Monitoring Your Deployment

Deployment Metrics

In Elastic Cloud Console:

Click your deployment
Go to Monitoring
View:
- CPU/Memory usage
- Disk usage
- Request rate
- Search/Index latency

Stack Monitoring in Kibana

For deeper insights:

Kibana → Stack Monitoring
Enable self-monitoring if prompted
View:
- Cluster health
- Node metrics
- Index stats
- Logstash pipeline metrics

Set Up Alerts

Via Kibana:

Stack Management → Rules and Connectors
Create rules for:

- Cluster health is not green
- Disk usage > 80%
- CPU usage > 90% for 5 minutes
- No data received in 10 minutes
- Search latency > 500ms

Notification channels:

Email
Slack
PagerDuty
Webhook

Step 9: Cost Optimization

Right-Sizing Your Deployment

Start small and scale up. Monitor for 2 weeks, then adjust:

If CPU consistently < 30%: Scale down
If CPU consistently > 70%: Scale up
If memory pressure high: Add more RAM
If disk > 80%: Add storage or review ILM

Autoscaling (Recommended)

Enable autoscaling to handle traffic spikes:

Deployment → Edit
Enable autoscaling for hot tier
Set min/max bounds

Example:

Hot tier: 
  Min: 4GB
  Max: 32GB
  Scale up when: Memory pressure > 75%
  Scale down when: Memory pressure < 50%

Data Tiering Strategy

Move old data to cheaper tiers:

Age	Tier	Approximate Cost
0-7 days	Hot	$$$$
7-30 days	Warm	$$$
30-90 days	Cold	$$
90+ days	Frozen	$

Frozen tier uses searchable snapshots - data lives in object storage (S3/GCS) but remains searchable.

Reserved Capacity

If your usage is predictable, commit to reserved capacity for discounts:

1-year: ~30% discount
3-year: ~50% discount

Step 10: Backup and Disaster Recovery

Automated Snapshots

Elastic Cloud takes automatic snapshots:

Every 30 minutes
Stored in Elastic’s secure repository
Retained based on your plan

Cross-Cluster Replication (CCR)

For true DR, replicate to another region:

Create a secondary deployment in another region
Stack Management → Remote Clusters
Add your primary cluster as remote
Set up follower indices:

curl -X PUT "https://secondary.es.region.aws.found.io:9243/logs-replica/_ccr/follow" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "remote_cluster": "primary-cluster",
  "leader_index": "logs-*"
}'

Manual Snapshots

For compliance or long-term retention:

# Create a custom repository (requires support to enable)
PUT _snapshot/my-s3-repo
{
  "type": "s3",
  "settings": {
    "bucket": "my-elasticsearch-backups",
    "region": "eu-west-1"
  }
}

# Take a snapshot
PUT _snapshot/my-s3-repo/snapshot-2024-01?wait_for_completion=true
{
  "indices": "logs-*",
  "include_global_state": false
}

Step 11: Common Integrations

AWS Integration

Collect CloudWatch logs and metrics:

Kibana → Integrations → AWS
Configure:
- Access Key / Secret Key (or IAM role)
- Regions to monitor
- Services: CloudWatch, S3, ELB, EC2, etc.

Kubernetes Integration

For K8s observability:

Deploy Elastic Agent as DaemonSet:

kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml

Or use Helm:

helm repo add elastic https://helm.elastic.co
helm install elastic-agent elastic/elastic-agent \
  --set kubernetes.enabled=true \
  --set outputs.default.type=elasticsearch \
  --set outputs.default.hosts='["https://your-deployment.es.region.aws.found.io:9243"]' \
  --set outputs.default.api_key='your-api-key'

APM (Application Performance Monitoring)

Kibana → APM → Add agent
Install agent for your language:

Node.js:

const apm = require('elastic-apm-node').start({
  serviceName: 'my-api',
  serverUrl: 'https://your-apm.apm.region.aws.found.io:443',
  secretToken: 'your-secret-token',
  environment: 'production'
});

Python:

import elasticapm

app = Flask(__name__)
apm = ElasticAPM(app,
    service_name='my-api',
    server_url='https://your-apm.apm.region.aws.found.io:443',
    secret_token='your-secret-token',
    environment='production'
)

Troubleshooting

Deployment Won’t Start

Check deployment activity log
Common causes:
- Invalid configuration
- Quota exceeded
- Region capacity issues

Can’t Connect

# Test connectivity
curl -v https://your-deployment.es.region.aws.found.io:9243

# Test auth
curl -u elastic:password https://your-deployment.es.region.aws.found.io:9243/_cluster/health

Common issues:

Wrong credentials
IP allowlist blocking you
Network/firewall issues

Slow Queries

Check Stack Monitoring → Indices
Look for:
- Large shards (>50GB)
- Many small shards
- Missing replicas

Fixes:

Add more hot tier capacity
Optimize ILM for faster rollover
Review query patterns

High Costs

Deployment → Usage
Identify cost drivers:
- Over-provisioned tiers
- Too many replicas
- Data not moving to cheaper tiers
- Retaining data too long

Production Checklist

## Initial Setup
- [ ] Create deployment with appropriate template
- [ ] Save elastic password securely
- [ ] Enable 2FA on Elastic Cloud account

## Security
- [ ] Create service accounts (don't use elastic user)
- [ ] Generate API keys for applications
- [ ] Configure SSO for team access
- [ ] Set up IP allowlist if needed
- [ ] Review and restrict default roles

## Data Management
- [ ] Configure ILM policies
- [ ] Set appropriate retention periods
- [ ] Enable data tiering (warm/cold/frozen)
- [ ] Test index rollover

## Ingestion
- [ ] Set up Elastic Agent or Beats
- [ ] Verify data is flowing
- [ ] Check index patterns/data views

## Monitoring
- [ ] Enable Stack Monitoring
- [ ] Set up alerting rules
- [ ] Configure notification channels

## Backup/DR
- [ ] Verify automated snapshots
- [ ] Test restore process
- [ ] Consider CCR for critical data

## Cost
- [ ] Enable autoscaling with reasonable bounds
- [ ] Review ILM to move data to cheaper tiers
- [ ] Consider reserved capacity for stable workloads

Key Takeaways

Start small, scale up - Elastic Cloud makes scaling easy
Use API keys, not passwords - More secure, easier to rotate
ILM is critical - Without it, costs spiral and performance degrades
Data tiering saves money - Hot data is expensive, archive aggressively
Monitor your monitoring - Set up alerts for your Elastic deployment itself
Autoscaling is your friend - Handles spikes without over-provisioning

Elastic Cloud removes the operational burden of running Elasticsearch, but you still need to configure it properly. Get security, ILM, and tiering right from the start, and you’ll have a production-ready observability platform.

Questions about Elastic Cloud? Find me on LinkedIn or GitHub.

Elastic Cloud Setup Guide - From Zero to Production

Why Elastic Cloud?

Step 1: Create Your Elastic Cloud Account

Step 2: Create Your First Deployment

Choosing a Deployment Template

Deployment Configuration

Advanced Settings

Step 3: Save Your Credentials

Step 4: Initial Kibana Setup

Access Kibana

Create Your First Space

Set Up Index Patterns (Data Views)

Step 5: Security Configuration

Create Service Accounts

API Keys (Recommended)

Enable SSO (Optional but Recommended)

Step 6: Index Lifecycle Management (ILM)

Understanding Data Tiers

Create an ILM Policy

Apply ILM to Index Templates

Step 7: Data Ingestion Setup

Option A: Elastic Agent (Recommended)

Option B: Filebeat (Logs Only)

Option C: Logstash (Complex Processing)

Step 8: Monitoring Your Deployment

Deployment Metrics

Stack Monitoring in Kibana

Set Up Alerts

Step 9: Cost Optimization

Right-Sizing Your Deployment

Autoscaling (Recommended)

Data Tiering Strategy

Reserved Capacity

Step 10: Backup and Disaster Recovery

Automated Snapshots

Cross-Cluster Replication (CCR)

Manual Snapshots

Step 11: Common Integrations

AWS Integration

Kubernetes Integration

APM (Application Performance Monitoring)

Troubleshooting

Deployment Won’t Start

Can’t Connect

Slow Queries

High Costs

Production Checklist

Key Takeaways

Related Posts

OpenTelemetry Changed How I Think About Observability

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

Comments