Skip to content
Back to blog Elastic Cloud Setup Guide - From Zero to Production

Elastic Cloud Setup Guide - From Zero to Production

ObservabilityAWS

Elastic Cloud Setup Guide - From Zero to Production

Running your own Elasticsearch cluster is powerful but operationally heavy. Upgrades, security patches, scaling, backups - it adds up. Elastic Cloud handles all of that, letting you focus on using the stack rather than managing it.

This guide walks through setting up Elastic Cloud properly - not just clicking through the wizard, but configuring it for real production use with proper security, lifecycle management, and cost optimization.

Why Elastic Cloud?

Before diving in, here’s why you might choose Elastic Cloud over self-managed:

Pros:

  • Fully managed upgrades (one-click)
  • Automated backups and snapshots
  • Built-in security (TLS, RBAC, SSO)
  • Cross-cloud deployment (AWS, GCP, Azure)
  • Autoscaling options
  • Elastic’s support team
  • Always latest features

Cons:

  • Higher cost than self-managed (roughly 2-3x)
  • Less control over infrastructure
  • Data residency concerns (though many regions available)
  • Vendor lock-in

For most teams, the operational overhead savings outweigh the cost difference.


Step 1: Create Your Elastic Cloud Account

  1. Go to cloud.elastic.co
  2. Sign up (email or SSO with Google/Microsoft)
  3. Verify your email
  4. You get a 14-day free trial with $400 credit

Step 2: Create Your First Deployment

Choosing a Deployment Template

Elastic Cloud offers several pre-configured templates:

TemplateBest ForComponents
General PurposeMost workloadsES + Kibana balanced
ObservabilityLogs, metrics, APMOptimized for time-series
SecuritySIEM, threat detectionElastic Security features
Vector SearchAI/ML, embeddingsML nodes included
Enterprise SearchWeb/app searchApp Search + Workplace Search

For this guide, we’ll use Observability - the most common use case.

Deployment Configuration

Click Create deployment and configure:

1. Name: Choose something meaningful

prod-logs-eu-west-1
staging-observability

2. Cloud Provider & Region:

  • Choose based on where your data sources are
  • Lower latency = better ingestion performance
  • Consider data residency requirements

3. Hardware Profile:

For a production observability deployment, I recommend starting with:

Elasticsearch:
  - Hot tier: 2 zones × 4GB RAM (8GB total)
  - Warm tier: 2 zones × 2GB RAM (4GB total) - optional initially
  - Cold tier: None initially

Kibana:
  - 1 zone × 1GB RAM

Integrations Server (APM + Fleet):
  - 1 zone × 1GB RAM

You can scale up later - Elastic Cloud makes this easy.

4. Version:

  • Always choose the latest stable version (8.x)
  • Avoid pre-release versions for production

Advanced Settings

Expand Advanced settings for more control:

Snapshot Repository:

  • Enabled by default (found repository)
  • Snapshots every 30 minutes
  • Retained for 100 snapshots or ~2 days

Plugins:

  • Most plugins are pre-installed
  • Custom plugins require support ticket

Click Create deployment and wait 5-10 minutes.


Step 3: Save Your Credentials

When deployment completes, you’ll see:

Elasticsearch endpoint: https://my-deployment.es.eu-west-1.aws.found.io:9243
Kibana endpoint: https://my-deployment.kb.eu-west-1.aws.found.io:9243

Username: elastic
Password: <generated-password>

Save these immediately - the password is only shown once.

If you lose it:

  1. Go to deployment → Security
  2. Reset the elastic user password

Step 4: Initial Kibana Setup

Access Kibana

  1. Click the Kibana link or navigate to your Kibana endpoint
  2. Log in with the elastic superuser
  3. You’ll see the Kibana home page

Create Your First Space

Spaces let you organize dashboards and access by team:

  1. Go to Stack Management → Kibana → Spaces
  2. Create spaces like:
    • production-logs
    • security-team
    • platform-team

Set Up Index Patterns (Data Views)

Before you can visualize data, you need data views:

  1. Go to Stack Management → Kibana → Data Views
  2. Click Create data view
  3. For logs: logs-* or filebeat-*
  4. For metrics: metrics-* or metricbeat-*
  5. Select @timestamp as the time field

Step 5: Security Configuration

Create Service Accounts

Never use the elastic superuser for applications. Create dedicated accounts:

Via Kibana:

  1. Stack Management → Security → Users
  2. Create users for each service:
Username: logstash_writer
Role: logstash_writer (built-in)
Password: <strong-password>

Username: beats_writer  
Role: beats_writer (built-in)
Password: <strong-password>

Username: apm_writer
Role: apm_user (built-in)
Password: <strong-password>

Via API (for automation):

# Create a custom role
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_security/role/logs_writer" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "cluster": ["monitor", "manage_index_templates", "manage_ilm"],
  "indices": [
    {
      "names": ["logs-*", "filebeat-*"],
      "privileges": ["create_index", "write", "create", "auto_configure"]
    }
  ]
}'

# Create a user with that role
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_security/user/logs_writer" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "password": "your-secure-password",
  "roles": ["logs_writer"],
  "full_name": "Logs Writer Service Account"
}'

For machine-to-machine auth, API keys are better than passwords:

# Create an API key for Filebeat
curl -X POST "https://your-deployment.es.region.aws.found.io:9243/_security/api_key" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "name": "filebeat-prod-servers",
  "role_descriptors": {
    "filebeat_writer": {
      "cluster": ["monitor", "read_ilm"],
      "indices": [
        {
          "names": ["filebeat-*", "logs-*"],
          "privileges": ["create_index", "create_doc", "auto_configure"]
        }
      ]
    }
  },
  "expiration": "365d"
}'

Response:

{
  "id": "abc123",
  "name": "filebeat-prod-servers",
  "api_key": "xyz789...",
  "encoded": "YWJjMTIzOnhejjc4OS4u"  // Base64(id:api_key) - use this
}

Use the encoded value in your Beats config:

output.elasticsearch:
  hosts: ["https://your-deployment.es.region.aws.found.io:9243"]
  api_key: "YWJjMTIzOnhejjc4OS4u"

For team access, configure SAML or OIDC:

  1. Deployment → Security → User authentication
  2. Configure your identity provider (Okta, Azure AD, Google)
  3. Map groups to Elastic roles

Step 6: Index Lifecycle Management (ILM)

ILM automatically manages index rollover, tiering, and deletion. This is critical for cost control.

Understanding Data Tiers

Hot Tier  →  Warm Tier  →  Cold Tier  →  Frozen Tier  →  Delete
(fast SSD)   (cheaper)    (cheapest)    (S3 backed)
 0-7 days    7-30 days    30-90 days    90-365 days    365+ days

Create an ILM Policy

Via Kibana:

  1. Stack Management → Index Lifecycle Policies
  2. Click Create policy

Via API:

curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_ilm/policy/logs-policy" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d",
            "max_docs": 100000000
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "allocate": {
            "number_of_replicas": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": {
            "priority": 0
          },
          "allocate": {
            "number_of_replicas": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

Apply ILM to Index Templates

Create an index template that uses your ILM policy:

curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_index_template/logs-template" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy",
      "index.lifecycle.rollover_alias": "logs"
    }
  },
  "composed_of": [],
  "priority": 200,
  "data_stream": {}
}'

Step 7: Data Ingestion Setup

Elastic Agent is the unified way to collect all data types:

  1. Kibana → Fleet → Add agent

  2. Create an agent policy (e.g., “Production Servers”)

  3. Add integrations:

    • System (CPU, memory, disk)
    • Custom logs
    • Docker/Kubernetes
    • Cloud provider metrics
  4. Install on your servers:

# Download
curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz
tar xzvf elastic-agent-8.12.0-linux-x86_64.tar.gz
cd elastic-agent-8.12.0-linux-x86_64

# Enroll (Fleet URL and token from Kibana)
sudo ./elastic-agent install \
  --url=https://your-fleet-server.es.region.aws.found.io:443 \
  --enrollment-token=YOUR_ENROLLMENT_TOKEN

Option B: Filebeat (Logs Only)

For simpler log collection:

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/nginx/*.log
    fields:
      environment: production
      service: nginx

output.elasticsearch:
  hosts: ["https://your-deployment.es.region.aws.found.io:9243"]
  api_key: "your-api-key"
  index: "logs-nginx-%{+yyyy.MM.dd}"

setup.ilm.enabled: true
setup.ilm.rollover_alias: "logs-nginx"
setup.ilm.policy_name: "logs-policy"

Option C: Logstash (Complex Processing)

For advanced transformations:

# logstash.conf
input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  
  geoip {
    source => "clientip"
  }
  
  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
  }
}

output {
  elasticsearch {
    hosts => ["https://your-deployment.es.region.aws.found.io:9243"]
    api_key => "your-api-key"
    data_stream => true
    data_stream_type => "logs"
    data_stream_dataset => "nginx"
    data_stream_namespace => "production"
  }
}

Step 8: Monitoring Your Deployment

Deployment Metrics

In Elastic Cloud Console:

  1. Click your deployment
  2. Go to Monitoring
  3. View:
    • CPU/Memory usage
    • Disk usage
    • Request rate
    • Search/Index latency

Stack Monitoring in Kibana

For deeper insights:

  1. Kibana → Stack Monitoring
  2. Enable self-monitoring if prompted
  3. View:
    • Cluster health
    • Node metrics
    • Index stats
    • Logstash pipeline metrics

Set Up Alerts

Via Kibana:

  1. Stack Management → Rules and Connectors
  2. Create rules for:
- Cluster health is not green
- Disk usage > 80%
- CPU usage > 90% for 5 minutes
- No data received in 10 minutes
- Search latency > 500ms

Notification channels:

  • Email
  • Slack
  • PagerDuty
  • Webhook

Step 9: Cost Optimization

Right-Sizing Your Deployment

Start small and scale up. Monitor for 2 weeks, then adjust:

If CPU consistently < 30%: Scale down
If CPU consistently > 70%: Scale up
If memory pressure high: Add more RAM
If disk > 80%: Add storage or review ILM

Enable autoscaling to handle traffic spikes:

  1. Deployment → Edit
  2. Enable autoscaling for hot tier
  3. Set min/max bounds

Example:

Hot tier: 
  Min: 4GB
  Max: 32GB
  Scale up when: Memory pressure > 75%
  Scale down when: Memory pressure < 50%

Data Tiering Strategy

Move old data to cheaper tiers:

AgeTierApproximate Cost
0-7 daysHot$$$$
7-30 daysWarm$$$
30-90 daysCold$$
90+ daysFrozen$

Frozen tier uses searchable snapshots - data lives in object storage (S3/GCS) but remains searchable.

Reserved Capacity

If your usage is predictable, commit to reserved capacity for discounts:

  • 1-year: ~30% discount
  • 3-year: ~50% discount

Step 10: Backup and Disaster Recovery

Automated Snapshots

Elastic Cloud takes automatic snapshots:

  • Every 30 minutes
  • Stored in Elastic’s secure repository
  • Retained based on your plan

Cross-Cluster Replication (CCR)

For true DR, replicate to another region:

  1. Create a secondary deployment in another region
  2. Stack Management → Remote Clusters
  3. Add your primary cluster as remote
  4. Set up follower indices:
curl -X PUT "https://secondary.es.region.aws.found.io:9243/logs-replica/_ccr/follow" \
  -u elastic:password \
  -H 'Content-Type: application/json' -d'
{
  "remote_cluster": "primary-cluster",
  "leader_index": "logs-*"
}'

Manual Snapshots

For compliance or long-term retention:

# Create a custom repository (requires support to enable)
PUT _snapshot/my-s3-repo
{
  "type": "s3",
  "settings": {
    "bucket": "my-elasticsearch-backups",
    "region": "eu-west-1"
  }
}

# Take a snapshot
PUT _snapshot/my-s3-repo/snapshot-2024-01?wait_for_completion=true
{
  "indices": "logs-*",
  "include_global_state": false
}

Step 11: Common Integrations

AWS Integration

Collect CloudWatch logs and metrics:

  1. Kibana → Integrations → AWS
  2. Configure:
    • Access Key / Secret Key (or IAM role)
    • Regions to monitor
    • Services: CloudWatch, S3, ELB, EC2, etc.

Kubernetes Integration

For K8s observability:

  1. Deploy Elastic Agent as DaemonSet:
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
  1. Or use Helm:
helm repo add elastic https://helm.elastic.co
helm install elastic-agent elastic/elastic-agent \
  --set kubernetes.enabled=true \
  --set outputs.default.type=elasticsearch \
  --set outputs.default.hosts='["https://your-deployment.es.region.aws.found.io:9243"]' \
  --set outputs.default.api_key='your-api-key'

APM (Application Performance Monitoring)

  1. Kibana → APM → Add agent
  2. Install agent for your language:

Node.js:

const apm = require('elastic-apm-node').start({
  serviceName: 'my-api',
  serverUrl: 'https://your-apm.apm.region.aws.found.io:443',
  secretToken: 'your-secret-token',
  environment: 'production'
});

Python:

import elasticapm

app = Flask(__name__)
apm = ElasticAPM(app,
    service_name='my-api',
    server_url='https://your-apm.apm.region.aws.found.io:443',
    secret_token='your-secret-token',
    environment='production'
)

Troubleshooting

Deployment Won’t Start

  1. Check deployment activity log
  2. Common causes:
    • Invalid configuration
    • Quota exceeded
    • Region capacity issues

Can’t Connect

# Test connectivity
curl -v https://your-deployment.es.region.aws.found.io:9243

# Test auth
curl -u elastic:password https://your-deployment.es.region.aws.found.io:9243/_cluster/health

Common issues:

  • Wrong credentials
  • IP allowlist blocking you
  • Network/firewall issues

Slow Queries

  1. Check Stack Monitoring → Indices
  2. Look for:
    • Large shards (>50GB)
    • Many small shards
    • Missing replicas

Fixes:

  • Add more hot tier capacity
  • Optimize ILM for faster rollover
  • Review query patterns

High Costs

  1. Deployment → Usage
  2. Identify cost drivers:
    • Over-provisioned tiers
    • Too many replicas
    • Data not moving to cheaper tiers
    • Retaining data too long

Production Checklist

## Initial Setup
- [ ] Create deployment with appropriate template
- [ ] Save elastic password securely
- [ ] Enable 2FA on Elastic Cloud account

## Security
- [ ] Create service accounts (don't use elastic user)
- [ ] Generate API keys for applications
- [ ] Configure SSO for team access
- [ ] Set up IP allowlist if needed
- [ ] Review and restrict default roles

## Data Management
- [ ] Configure ILM policies
- [ ] Set appropriate retention periods
- [ ] Enable data tiering (warm/cold/frozen)
- [ ] Test index rollover

## Ingestion
- [ ] Set up Elastic Agent or Beats
- [ ] Verify data is flowing
- [ ] Check index patterns/data views

## Monitoring
- [ ] Enable Stack Monitoring
- [ ] Set up alerting rules
- [ ] Configure notification channels

## Backup/DR
- [ ] Verify automated snapshots
- [ ] Test restore process
- [ ] Consider CCR for critical data

## Cost
- [ ] Enable autoscaling with reasonable bounds
- [ ] Review ILM to move data to cheaper tiers
- [ ] Consider reserved capacity for stable workloads

Key Takeaways

  1. Start small, scale up - Elastic Cloud makes scaling easy
  2. Use API keys, not passwords - More secure, easier to rotate
  3. ILM is critical - Without it, costs spiral and performance degrades
  4. Data tiering saves money - Hot data is expensive, archive aggressively
  5. Monitor your monitoring - Set up alerts for your Elastic deployment itself
  6. Autoscaling is your friend - Handles spikes without over-provisioning

Elastic Cloud removes the operational burden of running Elasticsearch, but you still need to configure it properly. Get security, ILM, and tiering right from the start, and you’ll have a production-ready observability platform.


Questions about Elastic Cloud? Find me on LinkedIn or GitHub.

Found this helpful?

Comments