FinOps for Engineering Teams - Making Cost Everyone’s Problem

“The cloud bill is too high.”

If you’ve heard this from finance but don’t know what your team specifically costs, you’re not alone. Most engineering teams have zero visibility into their cloud spend. They provision resources, ship features, and assume someone else worries about the bill.

That disconnect is expensive. The people making architectural decisions (engineers) are separated from the financial impact of those decisions. Meanwhile, finance sees a massive AWS bill but can’t tell which team or service is responsible.

FinOps bridges that gap. It’s not about cost-cutting - it’s about making informed trade-offs.

TL;DR

Engineers make decisions that drive 80%+ of cloud costs
Cost visibility must be at the team/service level, not just account level
Tagging is the foundation - enforce it ruthlessly
Build cost awareness into CI/CD and code review
Start with the big wins: right-sizing, unused resources, reserved capacity

Why Engineering Owns Cloud Costs

Finance can negotiate contracts and pay invoices. They can’t:

Choose between Lambda and ECS
Decide if you need 3 replicas or 10
Pick the right instance type for your workload
Design efficient data pipelines
Avoid the N+1 query that scans terabytes

These are engineering decisions with financial consequences. A single architectural choice can be the difference between $1,000/month and $100,000/month.

The old model - engineering builds, finance pays - doesn’t work in the cloud. You need engineers who understand cost as a feature, not an afterthought.

The Foundation: Tagging Strategy

You can’t optimise what you can’t measure. Tagging is how you measure.

Required Tags

# Minimum viable tagging strategy
tags:
  team: "platform"           # Who owns this?
  service: "api-gateway"     # What is it part of?
  environment: "production"  # prod/staging/dev
  cost-center: "eng-001"     # Finance's identifier
  managed-by: "terraform"    # How was it created?

Enforce Tags with Terraform

# modules/required-tags/main.tf
variable "required_tags" {
  type = map(string)
  validation {
    condition = alltrue([
      contains(keys(var.required_tags), "team"),
      contains(keys(var.required_tags), "service"),
      contains(keys(var.required_tags), "environment"),
    ])
    error_message = "Required tags: team, service, environment"
  }
}

# Use in all resources
resource "aws_instance" "example" {
  # ... config ...
  
  tags = merge(var.required_tags, {
    Name = "my-instance"
  })
}

Enforce Tags with AWS SCPs

Block untagged resource creation:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireTags",
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances",
        "rds:CreateDBInstance",
        "elasticloadbalancing:CreateLoadBalancer"
      ],
      "Resource": "*",
      "Condition": {
        "Null": {
          "aws:RequestTag/team": "true",
          "aws:RequestTag/service": "true"
        }
      }
    }
  ]
}

Visibility: Cost Dashboards

Tags are useless without dashboards. Engineers need to see their costs.

AWS Cost Explorer by Tag

# Get last month's cost by team
aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-02-01 \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --group-by Type=TAG,Key=team \
  --output table

Automated Slack Reports

# Lambda function for weekly cost reports
import boto3
import json
import requests

def lambda_handler(event, context):
    ce = boto3.client('ce')
    
    response = ce.get_cost_and_usage(
        TimePeriod={
            'Start': '2026-01-27',
            'End': '2026-02-03'
        },
        Granularity='DAILY',
        Metrics=['UnblendedCost'],
        GroupBy=[
            {'Type': 'TAG', 'Key': 'team'}
        ]
    )
    
    # Format for Slack
    costs_by_team = {}
    for result in response['ResultsByTime']:
        for group in result['Groups']:
            team = group['Keys'][0].replace('team$', '') or 'untagged'
            cost = float(group['Metrics']['UnblendedCost']['Amount'])
            costs_by_team[team] = costs_by_team.get(team, 0) + cost
    
    message = "📊 *Weekly Cloud Costs by Team*\n"
    for team, cost in sorted(costs_by_team.items(), key=lambda x: -x[1]):
        message += f"• {team}: ${cost:,.2f}\n"
    
    # Post to Slack
    requests.post(
        SLACK_WEBHOOK_URL,
        json={"text": message}
    )

Grafana Dashboard

# Prometheus/CloudWatch metrics for real-time cost visibility
# Use AWS Cost and Usage Reports exported to S3/Athena

# Example Athena query for Grafana
SELECT 
  line_item_usage_account_id as account,
  resource_tags_user_team as team,
  resource_tags_user_service as service,
  SUM(line_item_unblended_cost) as cost
FROM cost_and_usage_report
WHERE 
  month = '1' 
  AND year = '2026'
GROUP BY 1, 2, 3
ORDER BY cost DESC

Build Cost into CI/CD

Infracost in Pull Requests

Show cost impact before merging:

# .github/workflows/infracost.yml
name: Infracost
on:
  pull_request:
    paths:
      - '**/*.tf'
      - '**/*.tfvars'

jobs:
  infracost:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
      
      - name: Generate Infracost diff
        run: |
          infracost diff \
            --path=. \
            --format=json \
            --out-file=/tmp/infracost.json
      
      - name: Post Infracost comment
        uses: infracost/actions/comment@v1
        with:
          path: /tmp/infracost.json
          behavior: update

This posts comments like:

💰 Monthly cost will increase by $1,234 (15%)

| Resource | Before | After | Change |
|----------|--------|-------|--------|
| aws_instance.api | $50 | $200 | +$150 |
| aws_rds_instance.db | $100 | $500 | +$400 |

Cost Budgets as Code

# Terraform budget alerts
resource "aws_budgets_budget" "team_platform" {
  name              = "team-platform-monthly"
  budget_type       = "COST"
  limit_amount      = "5000"
  limit_unit        = "USD"
  time_unit         = "MONTHLY"

  cost_filter {
    name   = "TagKeyValue"
    values = ["user:team$platform"]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_email_addresses = ["platform-team@company.com"]
    subscriber_sns_topic_arns  = [aws_sns_topic.budget_alerts.arn]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 100
    threshold_type             = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = ["platform-team@company.com", "finance@company.com"]
  }
}

Quick Wins: The 80/20 of Cost Optimisation

1. Right-Size Instances

Most instances are over-provisioned. Check utilisation:

# Find under-utilized EC2 instances
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 86400 \
  --statistics Average \
  --output table

If average CPU is under 20%, you’re over-provisioned.

Automated right-sizing with AWS Compute Optimizer:

# Enable Compute Optimizer
resource "aws_computeoptimizer_enrollment_status" "main" {
  status = "Active"
}

# Query recommendations via CLI
# aws compute-optimizer get-ec2-instance-recommendations

2. Delete Unused Resources

The most expensive resource is one you’re not using:

# Unattached EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

# Old snapshots (> 90 days)
aws ec2 describe-snapshots \
  --owner-ids self \
  --query 'Snapshots[?StartTime<=`2025-11-01`].[SnapshotId,VolumeSize,StartTime]' \
  --output table

# Unused Elastic IPs
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \
  --output table

# Old AMIs
aws ec2 describe-images \
  --owners self \
  --query 'Images[?CreationDate<=`2025-01-01`].[ImageId,Name,CreationDate]' \
  --output table

3. Reserved Instances / Savings Plans

If you have stable baseline usage, commit to it:

On-demand m5.xlarge: $0.192/hour = $140/month
1-year reserved (no upfront): $0.122/hour = $89/month
Savings: 36%

3-year reserved (all upfront): $0.076/hour = $55/month
Savings: 60%

When to reserve:

Baseline load that’s always running
Databases (usually 24/7)
Core infrastructure (NAT, bastion, monitoring)

When NOT to reserve:

Auto-scaled workloads (use Savings Plans instead)
Workloads you might eliminate
Anything you’re not sure about

4. Spot Instances for Fault-Tolerant Workloads

# EKS node group with spot instances
resource "aws_eks_node_group" "spot" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "spot-workers"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids
  capacity_type   = "SPOT"
  
  instance_types = ["m5.large", "m5a.large", "m5n.large", "m4.large"]
  
  scaling_config {
    desired_size = 3
    max_size     = 10
    min_size     = 1
  }
}

Spot can save 60-90% on compute, but instances can be terminated with 2 minutes notice. Use for:

CI/CD runners
Batch processing
Stateless web servers (with proper load balancing)
Dev/test environments

5. Data Transfer Costs

Data transfer is the hidden killer:

Same AZ: Free
Cross-AZ: $0.01/GB each way ($0.02 round trip)
To internet: $0.09/GB (first 10TB)
Cross-region: $0.02/GB

Optimizations:

Keep chattier services in the same AZ
Use VPC endpoints (avoid NAT for AWS services)
Compress data before transfer
Cache aggressively (CloudFront, ElastiCache)

Team Cost Reviews

Make cost a regular topic, not a crisis response.

Monthly Cost Review Format

# Platform Team - January 2026 Cost Review

## Summary
- Total spend: $45,231 (+12% from December)
- Budget: $50,000 (90% utilized)
- Forecast: $48,500

## Top 5 Cost Drivers
1. EKS cluster compute: $18,000 (40%)
2. RDS databases: $12,000 (27%)
3. Data transfer: $6,000 (13%)
4. S3 storage: $4,000 (9%)
5. CloudWatch: $2,500 (6%)

## What Changed
- New ML pipeline added $3,000/month
- Scaled API servers for holiday traffic (+$2,000)
- Fixed NAT gateway redundancy (-$1,500)

## Action Items
- [ ] Right-size RDS dev instances (est. savings: $800/month)
- [ ] Enable S3 Intelligent-Tiering (est. savings: $400/month)
- [ ] Investigate CloudWatch costs spike

## Next Month Forecast
- Expecting $48,000 (holiday traffic normalizing)
- New feature launch may add $2,000

Cost Anomaly Detection

Set up automated alerts for unexpected changes:

resource "aws_ce_anomaly_monitor" "service" {
  name              = "service-anomaly-monitor"
  monitor_type      = "DIMENSIONAL"
  monitor_dimension = "SERVICE"
}

resource "aws_ce_anomaly_subscription" "alerts" {
  name      = "cost-anomaly-alerts"
  frequency = "DAILY"
  
  monitor_arn_list = [
    aws_ce_anomaly_monitor.service.arn
  ]
  
  subscriber {
    type    = "EMAIL"
    address = "platform-team@company.com"
  }
  
  subscriber {
    type    = "SNS"
    address = aws_sns_topic.cost_alerts.arn
  }

  threshold_expression {
    dimension {
      key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
      values        = ["100"]  # Alert if anomaly > $100
      match_options = ["GREATER_THAN_OR_EQUAL"]
    }
  }
}

Culture: Making Cost Part of Engineering

Code Review Checklist

Add cost considerations to your PR template:

## Cost Impact
- [ ] No new AWS resources
- [ ] New resources are right-sized
- [ ] Resources have required tags
- [ ] Considered spot/preemptible instances
- [ ] No hardcoded instance types (use variables)
- [ ] Infracost estimate reviewed

Engineering Scorecards

Include cost metrics alongside reliability and velocity:

Metric	Target	Actual
Deployment frequency	10/week	12/week ✅
Change failure rate	<5%	3% ✅
Mean time to recovery	<1hr	45min ✅
Cost efficiency	<$5/1K requests	$4.20/1K ✅
Resource utilization	>50% CPU avg	62% ✅

Gamification (Use Carefully)

Some teams create friendly competition:

Monthly “Cost Cutter” award for biggest optimisation
Leaderboard of cost per team (normalised by traffic/value)
Share war stories of wasteful resources found

But don’t over-index on cost at the expense of velocity or reliability.

Tools to Consider

Tool	Purpose	Cost
Infracost	Cost estimates in PRs	Free tier available
AWS Cost Explorer	Native AWS cost analysis	Free
Kubecost	Kubernetes cost allocation	Free tier available
CloudHealth	Multi-cloud FinOps platform	Enterprise
Spot.io	Automated spot instance management	Percentage of savings
AWS Compute Optimizer	Right-sizing recommendations	Free

Conclusion

FinOps isn’t about spending less - it’s about spending intentionally. Engineers should know:

What their services cost
Why they cost that much
Whether that cost is reasonable for the value delivered

The goal isn’t the cheapest infrastructure. It’s infrastructure where every dollar is a conscious choice, not an accident.

Start with tagging and visibility. Everything else follows.

FinOps for Engineering Teams - Making Cost Everyone's Problem

FinOps for Engineering Teams - Making Cost Everyone’s Problem

TL;DR

Why Engineering Owns Cloud Costs

The Foundation: Tagging Strategy

Required Tags

Enforce Tags with Terraform

Enforce Tags with AWS SCPs

Visibility: Cost Dashboards

AWS Cost Explorer by Tag

Automated Slack Reports

Grafana Dashboard

Build Cost into CI/CD

Infracost in Pull Requests

Cost Budgets as Code

Quick Wins: The 80/20 of Cost Optimisation

1. Right-Size Instances

2. Delete Unused Resources

3. Reserved Instances / Savings Plans

4. Spot Instances for Fault-Tolerant Workloads

5. Data Transfer Costs

Team Cost Reviews

Monthly Cost Review Format

Cost Anomaly Detection

Culture: Making Cost Part of Engineering

Code Review Checklist

Engineering Scorecards

Gamification (Use Carefully)

Tools to Consider

Conclusion

References

Comments

FinOps for Engineering Teams - Making Cost Everyone’s Problem

TL;DR

Why Engineering Owns Cloud Costs

The Foundation: Tagging Strategy

Required Tags

Enforce Tags with Terraform

Enforce Tags with AWS SCPs

Visibility: Cost Dashboards

AWS Cost Explorer by Tag

Automated Slack Reports

Grafana Dashboard

Build Cost into CI/CD

Infracost in Pull Requests

Cost Budgets as Code

Quick Wins: The 80/20 of Cost Optimisation

1. Right-Size Instances

2. Delete Unused Resources

3. Reserved Instances / Savings Plans

4. Spot Instances for Fault-Tolerant Workloads

5. Data Transfer Costs

Team Cost Reviews

Monthly Cost Review Format

Cost Anomaly Detection

Culture: Making Cost Part of Engineering

Code Review Checklist

Engineering Scorecards

Gamification (Use Carefully)

Tools to Consider

Conclusion

References

Related Posts

OpenTelemetry Changed How I Think About Observability

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

Comments