Elastic Cloud Setup Guide - From Zero to Production
Running your own Elasticsearch cluster is powerful but operationally heavy. Upgrades, security patches, scaling, backups - it adds up. Elastic Cloud handles all of that, letting you focus on using the stack rather than managing it.
This guide walks through setting up Elastic Cloud properly - not just clicking through the wizard, but configuring it for real production use with proper security, lifecycle management, and cost optimization.
Why Elastic Cloud?
Before diving in, here’s why you might choose Elastic Cloud over self-managed:
Pros:
- Fully managed upgrades (one-click)
- Automated backups and snapshots
- Built-in security (TLS, RBAC, SSO)
- Cross-cloud deployment (AWS, GCP, Azure)
- Autoscaling options
- Elastic’s support team
- Always latest features
Cons:
- Higher cost than self-managed (roughly 2-3x)
- Less control over infrastructure
- Data residency concerns (though many regions available)
- Vendor lock-in
For most teams, the operational overhead savings outweigh the cost difference.
Step 1: Create Your Elastic Cloud Account
- Go to cloud.elastic.co
- Sign up (email or SSO with Google/Microsoft)
- Verify your email
- You get a 14-day free trial with $400 credit
Step 2: Create Your First Deployment
Choosing a Deployment Template
Elastic Cloud offers several pre-configured templates:
| Template | Best For | Components |
|---|---|---|
| General Purpose | Most workloads | ES + Kibana balanced |
| Observability | Logs, metrics, APM | Optimized for time-series |
| Security | SIEM, threat detection | Elastic Security features |
| Vector Search | AI/ML, embeddings | ML nodes included |
| Enterprise Search | Web/app search | App Search + Workplace Search |
For this guide, we’ll use Observability - the most common use case.
Deployment Configuration
Click Create deployment and configure:
1. Name: Choose something meaningful
prod-logs-eu-west-1
staging-observability
2. Cloud Provider & Region:
- Choose based on where your data sources are
- Lower latency = better ingestion performance
- Consider data residency requirements
3. Hardware Profile:
For a production observability deployment, I recommend starting with:
Elasticsearch:
- Hot tier: 2 zones × 4GB RAM (8GB total)
- Warm tier: 2 zones × 2GB RAM (4GB total) - optional initially
- Cold tier: None initially
Kibana:
- 1 zone × 1GB RAM
Integrations Server (APM + Fleet):
- 1 zone × 1GB RAM
You can scale up later - Elastic Cloud makes this easy.
4. Version:
- Always choose the latest stable version (8.x)
- Avoid pre-release versions for production
Advanced Settings
Expand Advanced settings for more control:
Snapshot Repository:
- Enabled by default (found repository)
- Snapshots every 30 minutes
- Retained for 100 snapshots or ~2 days
Plugins:
- Most plugins are pre-installed
- Custom plugins require support ticket
Click Create deployment and wait 5-10 minutes.
Step 3: Save Your Credentials
When deployment completes, you’ll see:
Elasticsearch endpoint: https://my-deployment.es.eu-west-1.aws.found.io:9243
Kibana endpoint: https://my-deployment.kb.eu-west-1.aws.found.io:9243
Username: elastic
Password: <generated-password>
Save these immediately - the password is only shown once.
If you lose it:
- Go to deployment → Security
- Reset the elastic user password
Step 4: Initial Kibana Setup
Access Kibana
- Click the Kibana link or navigate to your Kibana endpoint
- Log in with the elastic superuser
- You’ll see the Kibana home page
Create Your First Space
Spaces let you organize dashboards and access by team:
- Go to Stack Management → Kibana → Spaces
- Create spaces like:
production-logssecurity-teamplatform-team
Set Up Index Patterns (Data Views)
Before you can visualize data, you need data views:
- Go to Stack Management → Kibana → Data Views
- Click Create data view
- For logs:
logs-*orfilebeat-* - For metrics:
metrics-*ormetricbeat-* - Select
@timestampas the time field
Step 5: Security Configuration
Create Service Accounts
Never use the elastic superuser for applications. Create dedicated accounts:
Via Kibana:
- Stack Management → Security → Users
- Create users for each service:
Username: logstash_writer
Role: logstash_writer (built-in)
Password: <strong-password>
Username: beats_writer
Role: beats_writer (built-in)
Password: <strong-password>
Username: apm_writer
Role: apm_user (built-in)
Password: <strong-password>
Via API (for automation):
# Create a custom role
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_security/role/logs_writer" \
-u elastic:password \
-H 'Content-Type: application/json' -d'
{
"cluster": ["monitor", "manage_index_templates", "manage_ilm"],
"indices": [
{
"names": ["logs-*", "filebeat-*"],
"privileges": ["create_index", "write", "create", "auto_configure"]
}
]
}'
# Create a user with that role
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_security/user/logs_writer" \
-u elastic:password \
-H 'Content-Type: application/json' -d'
{
"password": "your-secure-password",
"roles": ["logs_writer"],
"full_name": "Logs Writer Service Account"
}'
API Keys (Recommended)
For machine-to-machine auth, API keys are better than passwords:
# Create an API key for Filebeat
curl -X POST "https://your-deployment.es.region.aws.found.io:9243/_security/api_key" \
-u elastic:password \
-H 'Content-Type: application/json' -d'
{
"name": "filebeat-prod-servers",
"role_descriptors": {
"filebeat_writer": {
"cluster": ["monitor", "read_ilm"],
"indices": [
{
"names": ["filebeat-*", "logs-*"],
"privileges": ["create_index", "create_doc", "auto_configure"]
}
]
}
},
"expiration": "365d"
}'
Response:
{
"id": "abc123",
"name": "filebeat-prod-servers",
"api_key": "xyz789...",
"encoded": "YWJjMTIzOnhejjc4OS4u" // Base64(id:api_key) - use this
}
Use the encoded value in your Beats config:
output.elasticsearch:
hosts: ["https://your-deployment.es.region.aws.found.io:9243"]
api_key: "YWJjMTIzOnhejjc4OS4u"
Enable SSO (Optional but Recommended)
For team access, configure SAML or OIDC:
- Deployment → Security → User authentication
- Configure your identity provider (Okta, Azure AD, Google)
- Map groups to Elastic roles
Step 6: Index Lifecycle Management (ILM)
ILM automatically manages index rollover, tiering, and deletion. This is critical for cost control.
Understanding Data Tiers
Hot Tier → Warm Tier → Cold Tier → Frozen Tier → Delete
(fast SSD) (cheaper) (cheapest) (S3 backed)
0-7 days 7-30 days 30-90 days 90-365 days 365+ days
Create an ILM Policy
Via Kibana:
- Stack Management → Index Lifecycle Policies
- Click Create policy
Via API:
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_ilm/policy/logs-policy" \
-u elastic:password \
-H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "1d",
"max_docs": 100000000
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"set_priority": {
"priority": 50
},
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
},
"allocate": {
"number_of_replicas": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": {
"priority": 0
},
"allocate": {
"number_of_replicas": 0
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}'
Apply ILM to Index Templates
Create an index template that uses your ILM policy:
curl -X PUT "https://your-deployment.es.region.aws.found.io:9243/_index_template/logs-template" \
-u elastic:password \
-H 'Content-Type: application/json' -d'
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy",
"index.lifecycle.rollover_alias": "logs"
}
},
"composed_of": [],
"priority": 200,
"data_stream": {}
}'
Step 7: Data Ingestion Setup
Option A: Elastic Agent (Recommended)
Elastic Agent is the unified way to collect all data types:
-
Kibana → Fleet → Add agent
-
Create an agent policy (e.g., “Production Servers”)
-
Add integrations:
- System (CPU, memory, disk)
- Custom logs
- Docker/Kubernetes
- Cloud provider metrics
-
Install on your servers:
# Download
curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.0-linux-x86_64.tar.gz
tar xzvf elastic-agent-8.12.0-linux-x86_64.tar.gz
cd elastic-agent-8.12.0-linux-x86_64
# Enroll (Fleet URL and token from Kibana)
sudo ./elastic-agent install \
--url=https://your-fleet-server.es.region.aws.found.io:443 \
--enrollment-token=YOUR_ENROLLMENT_TOKEN
Option B: Filebeat (Logs Only)
For simpler log collection:
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/*.log
fields:
environment: production
service: nginx
output.elasticsearch:
hosts: ["https://your-deployment.es.region.aws.found.io:9243"]
api_key: "your-api-key"
index: "logs-nginx-%{+yyyy.MM.dd}"
setup.ilm.enabled: true
setup.ilm.rollover_alias: "logs-nginx"
setup.ilm.policy_name: "logs-policy"
Option C: Logstash (Complex Processing)
For advanced transformations:
# logstash.conf
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
}
}
output {
elasticsearch {
hosts => ["https://your-deployment.es.region.aws.found.io:9243"]
api_key => "your-api-key"
data_stream => true
data_stream_type => "logs"
data_stream_dataset => "nginx"
data_stream_namespace => "production"
}
}
Step 8: Monitoring Your Deployment
Deployment Metrics
In Elastic Cloud Console:
- Click your deployment
- Go to Monitoring
- View:
- CPU/Memory usage
- Disk usage
- Request rate
- Search/Index latency
Stack Monitoring in Kibana
For deeper insights:
- Kibana → Stack Monitoring
- Enable self-monitoring if prompted
- View:
- Cluster health
- Node metrics
- Index stats
- Logstash pipeline metrics
Set Up Alerts
Via Kibana:
- Stack Management → Rules and Connectors
- Create rules for:
- Cluster health is not green
- Disk usage > 80%
- CPU usage > 90% for 5 minutes
- No data received in 10 minutes
- Search latency > 500ms
Notification channels:
- Slack
- PagerDuty
- Webhook
Step 9: Cost Optimization
Right-Sizing Your Deployment
Start small and scale up. Monitor for 2 weeks, then adjust:
If CPU consistently < 30%: Scale down
If CPU consistently > 70%: Scale up
If memory pressure high: Add more RAM
If disk > 80%: Add storage or review ILM
Autoscaling (Recommended)
Enable autoscaling to handle traffic spikes:
- Deployment → Edit
- Enable autoscaling for hot tier
- Set min/max bounds
Example:
Hot tier:
Min: 4GB
Max: 32GB
Scale up when: Memory pressure > 75%
Scale down when: Memory pressure < 50%
Data Tiering Strategy
Move old data to cheaper tiers:
| Age | Tier | Approximate Cost |
|---|---|---|
| 0-7 days | Hot | $$$$ |
| 7-30 days | Warm | $$$ |
| 30-90 days | Cold | $$ |
| 90+ days | Frozen | $ |
Frozen tier uses searchable snapshots - data lives in object storage (S3/GCS) but remains searchable.
Reserved Capacity
If your usage is predictable, commit to reserved capacity for discounts:
- 1-year: ~30% discount
- 3-year: ~50% discount
Step 10: Backup and Disaster Recovery
Automated Snapshots
Elastic Cloud takes automatic snapshots:
- Every 30 minutes
- Stored in Elastic’s secure repository
- Retained based on your plan
Cross-Cluster Replication (CCR)
For true DR, replicate to another region:
- Create a secondary deployment in another region
- Stack Management → Remote Clusters
- Add your primary cluster as remote
- Set up follower indices:
curl -X PUT "https://secondary.es.region.aws.found.io:9243/logs-replica/_ccr/follow" \
-u elastic:password \
-H 'Content-Type: application/json' -d'
{
"remote_cluster": "primary-cluster",
"leader_index": "logs-*"
}'
Manual Snapshots
For compliance or long-term retention:
# Create a custom repository (requires support to enable)
PUT _snapshot/my-s3-repo
{
"type": "s3",
"settings": {
"bucket": "my-elasticsearch-backups",
"region": "eu-west-1"
}
}
# Take a snapshot
PUT _snapshot/my-s3-repo/snapshot-2024-01?wait_for_completion=true
{
"indices": "logs-*",
"include_global_state": false
}
Step 11: Common Integrations
AWS Integration
Collect CloudWatch logs and metrics:
- Kibana → Integrations → AWS
- Configure:
- Access Key / Secret Key (or IAM role)
- Regions to monitor
- Services: CloudWatch, S3, ELB, EC2, etc.
Kubernetes Integration
For K8s observability:
- Deploy Elastic Agent as DaemonSet:
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
- Or use Helm:
helm repo add elastic https://helm.elastic.co
helm install elastic-agent elastic/elastic-agent \
--set kubernetes.enabled=true \
--set outputs.default.type=elasticsearch \
--set outputs.default.hosts='["https://your-deployment.es.region.aws.found.io:9243"]' \
--set outputs.default.api_key='your-api-key'
APM (Application Performance Monitoring)
- Kibana → APM → Add agent
- Install agent for your language:
Node.js:
const apm = require('elastic-apm-node').start({
serviceName: 'my-api',
serverUrl: 'https://your-apm.apm.region.aws.found.io:443',
secretToken: 'your-secret-token',
environment: 'production'
});
Python:
import elasticapm
app = Flask(__name__)
apm = ElasticAPM(app,
service_name='my-api',
server_url='https://your-apm.apm.region.aws.found.io:443',
secret_token='your-secret-token',
environment='production'
)
Troubleshooting
Deployment Won’t Start
- Check deployment activity log
- Common causes:
- Invalid configuration
- Quota exceeded
- Region capacity issues
Can’t Connect
# Test connectivity
curl -v https://your-deployment.es.region.aws.found.io:9243
# Test auth
curl -u elastic:password https://your-deployment.es.region.aws.found.io:9243/_cluster/health
Common issues:
- Wrong credentials
- IP allowlist blocking you
- Network/firewall issues
Slow Queries
- Check Stack Monitoring → Indices
- Look for:
- Large shards (>50GB)
- Many small shards
- Missing replicas
Fixes:
- Add more hot tier capacity
- Optimize ILM for faster rollover
- Review query patterns
High Costs
- Deployment → Usage
- Identify cost drivers:
- Over-provisioned tiers
- Too many replicas
- Data not moving to cheaper tiers
- Retaining data too long
Production Checklist
## Initial Setup
- [ ] Create deployment with appropriate template
- [ ] Save elastic password securely
- [ ] Enable 2FA on Elastic Cloud account
## Security
- [ ] Create service accounts (don't use elastic user)
- [ ] Generate API keys for applications
- [ ] Configure SSO for team access
- [ ] Set up IP allowlist if needed
- [ ] Review and restrict default roles
## Data Management
- [ ] Configure ILM policies
- [ ] Set appropriate retention periods
- [ ] Enable data tiering (warm/cold/frozen)
- [ ] Test index rollover
## Ingestion
- [ ] Set up Elastic Agent or Beats
- [ ] Verify data is flowing
- [ ] Check index patterns/data views
## Monitoring
- [ ] Enable Stack Monitoring
- [ ] Set up alerting rules
- [ ] Configure notification channels
## Backup/DR
- [ ] Verify automated snapshots
- [ ] Test restore process
- [ ] Consider CCR for critical data
## Cost
- [ ] Enable autoscaling with reasonable bounds
- [ ] Review ILM to move data to cheaper tiers
- [ ] Consider reserved capacity for stable workloads
Key Takeaways
- Start small, scale up - Elastic Cloud makes scaling easy
- Use API keys, not passwords - More secure, easier to rotate
- ILM is critical - Without it, costs spiral and performance degrades
- Data tiering saves money - Hot data is expensive, archive aggressively
- Monitor your monitoring - Set up alerts for your Elastic deployment itself
- Autoscaling is your friend - Handles spikes without over-provisioning
Elastic Cloud removes the operational burden of running Elasticsearch, but you still need to configure it properly. Get security, ILM, and tiering right from the start, and you’ll have a production-ready observability platform.
Questions about Elastic Cloud? Find me on LinkedIn or GitHub.