Skip to content
Back to blog Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

AWSPlatform Engineering

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

Your AWS bill tells you that EKS costs £50,000/month and Aurora costs £15,000/month. But what does Customer A cost? What about Customer B who does 10x the transactions? Traditional cloud billing shows you spend by service - it doesn’t show you spend by customer, transaction, or business unit.

This is the unit economics problem, and for multi-tenant SaaS platforms, it’s critical. Without it, you can’t answer:

  • Which customers are profitable?
  • What’s the true margin on each deal?
  • Where should we optimise?
  • How should we price?

I recently helped a client solve this for their multi-tenant platform running on EKS with shared Aurora, DynamoDB, MSK, and KeySpaces backends. This post covers the approach, the tooling, and the gotchas.

The Problem: Shared Infrastructure, Unknown Attribution

Consider this typical multi-tenant architecture:

┌─────────────────────────────────────────────────────────────────────┐
│                           Customers                                  │
│  ┌────────┐    ┌────────┐    ┌────────┐                            │
│  │ Cust A │    │ Cust B │    │ Cust C │                            │
│  └───┬────┘    └───┬────┘    └───┬────┘                            │
│      │             │             │                                   │
│      └─────────────┼─────────────┘                                   │
│                    │                                                  │
│                    ▼                                                  │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                      CloudFront                              │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                    │                                                  │
│                    ▼                                                  │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    EKS Cluster                               │    │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐        │    │
│  │  │ Login   │  │ Orders  │  │ Payment │  │ Common  │        │    │
│  │  │ Service │  │ Service │  │ Service │  │ Services│        │    │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘        │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                    │                                                  │
│      ┌─────────────┼─────────────────────────────┐                   │
│      │             │             │               │                   │
│      ▼             ▼             ▼               ▼                   │
│  ┌────────┐   ┌────────┐   ┌──────────┐   ┌──────────┐             │
│  │ Aurora │   │DynamoDB│   │ KeySpaces│   │   MSK    │             │
│  │(shared)│   │(shared)│   │ (shared) │   │ (shared) │             │
│  └────────┘   └────────┘   └──────────┘   └──────────┘             │
└─────────────────────────────────────────────────────────────────────┘

The challenge:

  • All customers hit the same EKS pods
  • All customers share the same Aurora cluster
  • All customers write to the same DynamoDB tables
  • Tenants are isolated at the data level, not the infrastructure level

AWS Cost Explorer will tell you Aurora costs £15k/month. It won’t tell you that Customer A costs £8k and Customer B costs £2k.

Unit Economics Defined

Unit economics = Cost to serve one unit of business value

Common units:

  • Cost per customer - Total cost / number of customers
  • Cost per transaction - Total cost / number of transactions
  • Cost per API call - Total cost / number of API requests
  • Cost per user - Total cost / active users
  • Cost per order - Total cost / orders processed

The “right” unit depends on your business model:

  • Per-seat SaaS → Cost per user
  • Transaction platform → Cost per transaction
  • API business → Cost per 1M requests
  • E-commerce → Cost per order

The Solution: Multi-Dimensional Cost Attribution

To solve this, we need to:

  1. Tag everything possible at the AWS level
  2. Instrument applications to emit tenant context
  3. Collect resource usage at the tenant level
  4. Allocate shared costs proportionally
  5. Build a cost model that combines direct and allocated costs

Step 1: AWS Tagging Strategy

Start with consistent tagging. Every resource needs:

tenant_id: customer-123        # Direct tenant if applicable
service: orders-api            # Which service
environment: production        # Environment
cost_center: platform          # Business allocation

For shared resources, tag with:

allocation_type: shared
allocation_basis: request_count  # How to split the cost

The problem: Most shared resources can’t be tagged per-tenant because multiple tenants use them simultaneously.

Step 2: Kubernetes Cost Attribution with OpenCost

OpenCost is the CNCF project for Kubernetes cost monitoring. It allocates cluster costs to namespaces, deployments, and labels.

Install OpenCost:

helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm install opencost opencost/opencost \
  --namespace opencost \
  --set opencost.prometheus.internal.enabled=true \
  --set opencost.ui.enabled=true

Configure for tenant attribution:

The key is labeling your pods with tenant information when possible, or tracking tenant metrics separately.

For shared pods (most multi-tenant setups), OpenCost gives you cost-per-pod, but you need application-level metrics to split by tenant.

# Example: Pod with tenant label (for tenant-dedicated resources)
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: orders-api
    tenant: customer-123  # Only works for tenant-dedicated pods

For shared pods serving multiple tenants, you need a different approach.

Step 3: Application-Level Tenant Metrics

This is where most cost attribution projects fail. You need your application to emit tenant-tagged metrics.

Instrument your services:

# Python example with Prometheus metrics
from prometheus_client import Counter, Histogram

# Request counter by tenant
requests_total = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['service', 'tenant_id', 'endpoint']
)

# Request duration by tenant
request_duration = Histogram(
    'http_request_duration_seconds',
    'Request duration',
    ['service', 'tenant_id', 'endpoint']
)

# In your request handler
@app.route('/api/orders')
def handle_orders():
    tenant_id = get_tenant_from_request()  # Extract from JWT, header, etc.
    
    with request_duration.labels(
        service='orders-api',
        tenant_id=tenant_id,
        endpoint='/api/orders'
    ).time():
        # Process request
        result = process_order()
    
    requests_total.labels(
        service='orders-api',
        tenant_id=tenant_id,
        endpoint='/api/orders'
    ).inc()
    
    return result

Key metrics to collect per tenant:

  • Request count
  • CPU time consumed
  • Memory high-water mark
  • Database queries executed
  • Storage bytes read/written
  • Kafka messages produced/consumed

Step 4: Database Cost Attribution

Shared databases are the hardest to attribute. Tenants are isolated at the row/table level, not the instance level.

Aurora/RDS Attribution

Aurora costs have multiple components:

  • Instance hours (compute)
  • Storage (GB-months)
  • I/O requests
  • Backup storage

Attribution approach:

-- Track storage per tenant
SELECT 
    tenant_id,
    SUM(pg_total_relation_size(schemaname || '.' || tablename)) as bytes
FROM pg_tables
JOIN your_data_table ON table_id = tablename
GROUP BY tenant_id;

-- Track query activity per tenant (requires pg_stat_statements)
SELECT 
    -- Extract tenant from query or use application tags
    tenant_id,
    SUM(total_time) as query_time_ms,
    SUM(calls) as query_count,
    SUM(shared_blks_read + shared_blks_hit) as blocks_accessed
FROM pg_stat_statements
JOIN tenant_query_log ON query_hash = queryid
GROUP BY tenant_id;

For Aurora I/O costs:

  • Track read/write IOPS per tenant via application metrics
  • Use CloudWatch VolumeReadIOPs and VolumeWriteIOPs for total
  • Allocate proportionally based on application-tracked I/O

DynamoDB Attribution

DynamoDB billing is simpler - it’s based on:

  • Read Capacity Units (RCU)
  • Write Capacity Units (WCU)
  • Storage (GB)

Enable DynamoDB Contributor Insights:

aws dynamodb update-contributor-insights \
    --table-name YourTable \
    --contributor-insights-action ENABLE

This shows top partition keys (often tenant IDs) and their access patterns.

Custom attribution via application:

# Track DynamoDB operations per tenant
dynamodb_reads = Counter(
    'dynamodb_read_units_total',
    'DynamoDB consumed read units',
    ['table', 'tenant_id']
)

dynamodb_writes = Counter(
    'dynamodb_write_units_total',
    'DynamoDB consumed write units',
    ['table', 'tenant_id']
)

# After each DynamoDB operation
response = dynamodb.query(
    TableName='Orders',
    KeyConditionExpression='tenant_id = :tid',
    ExpressionAttributeValues={':tid': {'S': tenant_id}}
)

consumed_rcu = response['ConsumedCapacity']['ReadCapacityUnits']
dynamodb_reads.labels(table='Orders', tenant_id=tenant_id).inc(consumed_rcu)

Step 5: The Cost Attribution Pipeline

Now we combine everything into an attribution pipeline:

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  AWS Cost    │    │  OpenCost    │    │ Application  │
│  & Usage     │    │  (K8s costs) │    │   Metrics    │
│   Report     │    │              │    │  (Prometheus)│
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                   │
       └───────────────────┼───────────────────┘


                  ┌────────────────┐
                  │  Cost          │
                  │  Attribution   │
                  │  Engine        │
                  └────────┬───────┘


                  ┌────────────────┐
                  │  Tenant Cost   │
                  │  Dashboard     │
                  └────────────────┘

Example attribution logic:

def calculate_tenant_costs(period):
    # 1. Get total AWS costs from Cost & Usage Report
    aws_costs = get_cur_costs(period)  # {'eks': 50000, 'aurora': 15000, ...}
    
    # 2. Get tenant resource usage from Prometheus
    tenant_metrics = query_prometheus(f'''
        sum by (tenant_id) (
            rate(http_requests_total{{service=~".+"}}[{period}])
        )
    ''')
    
    total_requests = sum(tenant_metrics.values())
    
    # 3. Get tenant-specific metrics where available
    tenant_db_usage = get_database_usage_by_tenant(period)
    tenant_storage = get_storage_by_tenant(period)
    
    # 4. Calculate allocation ratios
    tenant_costs = {}
    for tenant_id, request_count in tenant_metrics.items():
        request_ratio = request_count / total_requests
        db_ratio = tenant_db_usage.get(tenant_id, 0) / sum(tenant_db_usage.values())
        storage_ratio = tenant_storage.get(tenant_id, 0) / sum(tenant_storage.values())
        
        tenant_costs[tenant_id] = {
            # Allocate EKS costs by request ratio
            'eks': aws_costs['eks'] * request_ratio,
            
            # Allocate Aurora by DB usage
            'aurora': aws_costs['aurora'] * db_ratio,
            
            # Allocate storage by storage ratio
            's3': aws_costs['s3'] * storage_ratio,
            
            # Direct costs (if any tenant-specific resources)
            'direct': get_direct_tenant_costs(tenant_id, period),
        }
        
        tenant_costs[tenant_id]['total'] = sum(tenant_costs[tenant_id].values())
    
    return tenant_costs

Tools Comparison

Several tools can help with this:

OpenCost

  • What: Open-source Kubernetes cost monitoring
  • Good for: Pod/namespace/label cost allocation
  • Limitation: Doesn’t handle non-K8s resources, needs app metrics for tenant split
  • Cost: Free

CloudZero

  • What: SaaS unit economics platform
  • Good for: End-to-end unit cost tracking, pre-built integrations
  • Limitation: SaaS pricing can be high, less customisable
  • Cost: $$$

Kubecost

  • What: Commercial K8s cost monitoring (OpenCost fork)
  • Good for: K8s-focused with better UI, alerting
  • Limitation: Still K8s-centric
  • Cost: Free tier, paid for advanced features

Attrb.io

  • What: Cost attribution sensors for K8s
  • Good for: Works with Karpenter, fine-grained attribution
  • Limitation: Newer tool, less mature
  • Cost: Check pricing

Custom Build

  • What: Build your own with CUR + Prometheus + custom logic
  • Good for: Full control, handles edge cases
  • Limitation: Engineering effort, maintenance burden
  • Cost: Engineering time

Our Recommendation

For most multi-tenant platforms:

  1. Start with OpenCost for K8s visibility
  2. Add application-level tenant metrics (non-negotiable)
  3. Build a custom attribution layer for shared resources
  4. Consider CloudZero if you need quick time-to-value and can afford it

Implementation Checklist

## Tagging
- [ ] Define tenant tagging strategy
- [ ] Tag all AWS resources
- [ ] Label all K8s resources

## Instrumentation
- [ ] Add tenant_id to all application metrics
- [ ] Instrument request counts per tenant
- [ ] Instrument database operations per tenant
- [ ] Instrument storage usage per tenant
- [ ] Instrument queue operations per tenant

## Collection
- [ ] Deploy OpenCost for K8s costs
- [ ] Configure Cost & Usage Report
- [ ] Set up Prometheus for application metrics
- [ ] Enable database monitoring (pg_stat_statements, DynamoDB Contributor Insights)

## Attribution
- [ ] Define cost allocation rules
- [ ] Build attribution pipeline
- [ ] Handle shared resource allocation
- [ ] Handle idle/unattributed costs

## Reporting
- [ ] Build tenant cost dashboard
- [ ] Set up cost anomaly alerting
- [ ] Create margin reports
- [ ] Enable drill-down by service/time/tenant

Common Pitfalls

1. Ignoring Idle Costs

Not all costs map to tenant activity. Idle EKS nodes, standby Aurora replicas, unused reserved capacity - these need a policy:

  • Spread evenly: Divide among all tenants
  • Spread by usage: Allocate proportionally to active tenants
  • Keep separate: Track as “platform overhead”

2. Point-in-Time vs. Averaged

Tenant usage varies. A tenant might spike to 50% of capacity for an hour, then drop to 5%.

Don’t: Take a single measurement Do: Average over the billing period, or use peak-based allocation for reserved capacity

3. Forgetting Support and People Costs

Cloud costs aren’t the full picture:

  • Support tickets per tenant
  • Engineering time per tenant
  • Onboarding costs
  • Account management

For true unit economics, you need these too.

4. Over-Engineering Early

Start simple:

  1. Track total costs
  2. Track tenant request counts
  3. Allocate by request ratio

Add complexity (DB-level, storage-level, network-level) only when the simple model is insufficient.

Example Dashboard

A good unit economics dashboard shows:

┌─────────────────────────────────────────────────────────────────┐
│                    Unit Economics Dashboard                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  SUMMARY                           TREND (Last 6 Months)        │
│  ─────────────────────────         ────────────────────────     │
│  Total Cost:     £85,000           [Line chart showing          │
│  Customers:      150                cost per customer trend]    │
│  Avg Cost/Cust:  £567                                           │
│  Cost/1K Trans:  £12.40                                         │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  TOP 10 CUSTOMERS BY COST          COST BREAKDOWN BY SERVICE    │
│  ─────────────────────────         ────────────────────────     │
│  1. BigCorp Inc     £12,400        EKS:        58%             │
│  2. MegaTech Ltd    £8,200         Aurora:     18%             │
│  3. StartupXYZ      £6,100         DynamoDB:   12%             │
│  4. Enterprise Co   £5,800         MSK:         7%             │
│  5. Growth Inc      £4,200         Other:       5%             │
│  ...                                                            │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  MARGIN ANALYSIS                                                 │
│  ─────────────────                                              │
│  Customer     Revenue    Cost    Margin    Margin %             │
│  BigCorp      £25,000    £12,400  £12,600    50.4%             │
│  MegaTech     £10,000    £8,200   £1,800     18.0%  ⚠️         │
│  StartupXYZ   £15,000    £6,100   £8,900     59.3%             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Key Takeaways

  1. AWS billing ≠ business visibility - You need tenant-level attribution
  2. Tag everything - But know that tagging alone isn’t enough for shared resources
  3. Instrument applications - Tenant-aware metrics are essential
  4. Start simple - Request-based allocation is a good first step
  5. Handle shared costs explicitly - Define allocation rules upfront
  6. Include non-cloud costs - Support, engineering, sales for true unit economics
  7. Iterate - Your first model will be wrong; refine based on learnings

Unit economics turns your cloud bill from a mystery into a business tool. You’ll finally know which customers are profitable, where to optimise, and how to price your product.


Building unit economics for your platform? Questions about the approach? Find me on LinkedIn or GitHub.

Found this helpful?

Comments