Blog Tags | Mo Abukar

#devops 64 posts

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#opentelemetry #observability #kubernetes #aws #platform-engineering #monitoring

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #security

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #rego #iac #gitops #platform-engineering #modules #policy-as-code

9 Feb 2026

Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

S3 backup/restore, direct connectivity, Parquet exports - none of them worked cleanly. Here's the full war story of migrating a production ClickHouse instance to Cloud, the version mismatch that broke everything, and the dumb-simple approach that actually got the job done.

#clickhouse #aws #migration #database #production

3 Feb 2026

Platform Engineering in 2026 - It's About the Discipline, Not the Tools

Platform engineering has become the most misunderstood role in tech. Everyone's building 'platforms' but few understand what actually makes one successful. Here's what I've learned building platforms for teams of 10 to 500.

#platform-engineering #developer-experience #internal-platforms #idp

1 Feb 2026

Terraform State Surgery - Splitting, Moving, and Refactoring Without Downtime

A practical guide to breaking up monolithic Terraform state files, moving resources between states, and refactoring infrastructure safely. Includes real examples, scripts, and the exact commands we use.

#terraform #state #migration #refactoring #iac

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#clawdbot #hetzner #terraform #vps #security #automation

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#clawdbot #hetzner #vps #whatsapp #security #tutorial

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #kubernetes #aks #azure #helm #self-hosted #startup

15 Jan 2026

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

#dora #metrics #engineering-culture #cicd #platform-engineering

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #infrastructure #platform-engineering #lambda #ecs #terraform #networking

10 Jan 2026

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

#mlops #kubernetes #machine-learning #platform-engineering #infrastructure

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #ec2 #debugging #performance #memory #threads #linux

18 Dec 2025

Pod Topology Spread Constraints - Distributing Workloads Intelligently

Control how pods spread across nodes, zones, and regions. A deep dive into topology spread constraints for high availability and efficient resource utilization.

#kubernetes #scheduling #high-availability #pods

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #fargate #docker #aws #containers #migration #terraform

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#kind #localstack #act #kubernetes #aws #development

20 Nov 2025

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

#localstack #aws #testing #development #docker

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#github-actions #oidc #aws #iam #security #cicd

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #account-factory #terraform #service-catalog #organizations #sso #platform-engineering

8 Nov 2025

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

#github-actions #ci-cd #act #testing #automation

25 Oct 2025

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

#aws #terraform #tagging #finops #governance

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#github-actions #jenkins #cicd #migration #aws #oidc

12 Oct 2025

Container Image Signing with Cosign - A Practical Guide

Sign and verify container images without managing keys. A hands-on guide to Cosign, keyless signing, and enforcing signatures in Kubernetes.

#security #cosign #containers #sigstore #kubernetes

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #ecs #rds #cognito #terraform #docker #platform-engineering

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #postgresql #s3 #backup #cronjob #localstack #databases

28 Sept 2025

Terraform Best Practices (Part 2) - Testing, CI/CD, Security, and Team Workflows

Advanced Terraform practices covering testing strategies, CI/CD pipelines, security hardening, drift detection, and team collaboration patterns for infrastructure as code at scale.

#terraform #iac #cicd #testing #security

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#etl #python #airflow #postgresql #docker #data-engineering

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #soc #elasticsearch #cribl #docker #homelab #siem

20 Sept 2025

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

A comprehensive guide to Terraform best practices covering project organisation, state management, module design, and foundational patterns for scalable infrastructure as code.

#terraform #iac #aws #best-practices

15 Sept 2025

K3s Homelab Setup Guide - Running Kubernetes on Raspberry Pi 5

Build a lightweight Kubernetes cluster on three Raspberry Pi 5 devices. Step-by-step guide covering K3s installation, cluster configuration, and deployment testing.

#kubernetes #k3s #raspberry-pi #homelab #containers

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #migration #aws #dms #terraform #event-sourcing #platform-engineering

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#security #supply-chain #sigstore #slsa #sbom #kubernetes

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#serverless #containers #aws #lambda #fargate #docker

12 Aug 2025

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

#sre #reliability #on-call #monitoring #incident-management

10 Aug 2025

FinOps for Engineering Teams - Making Cost Everyone's Problem

Cloud cost management isn't just for finance. Here's how engineering teams can build cost awareness into their workflow without slowing down delivery.

#finops #cloud #aws #cost-optimization #engineering

18 Jul 2025

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

#kubernetes #debugging #containers #production #kubectl

19 Jun 2025

Kubernetes Sidecar Startup Order - Making Your Main App Wait

How to ensure sidecar containers are ready before your main app starts. Covers startupProbe, postStart hooks, and why readinessProbe doesn't do what you think.

#kubernetes #sidecars #pods #containers

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#interviews #career #kubernetes #aws #terraform #sre

15 May 2025

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

#incident-management #sre #on-call #post-mortems

15 May 2025

Kubernetes Cluster Upgrades: Production-Ready Guide

Technical guide for upgrading managed Kubernetes clusters across GKE, EKS, and AKS

#kubernetes #gke #eks #aks #cluster-management

15 Apr 2025

GKE Upgrade Guide and Rollback Strategy: A Production-Ready Approach

Comprehensive guide for safely upgrading GKE clusters with minimal downtime and robust rollback procedures

#kubernetes #gke #google-cloud #cluster-management

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #aws #blue-green #deployments #task-sets #fargate #terraform

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#kubernetes #eks #aws #production #karpenter #observability

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#kubernetes #postgresql #database #kubectl #socks5 #pg_dump

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #incident #log-rotation #linux #on-call #production #war-stories

15 Dec 2024

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

#kubernetes #cost-optimization #resource-management #cloud #finops

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#kubernetes #service-mesh #istio #linkerd #cilium #networking

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #ami #aws #terraform #ci-cd #immutable-infrastructure #security

22 Jul 2024

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

#platform-engineering #idp #backstage #developer-experience

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #ecs #cloudmap #traefik #service-discovery #aws #networking

5 Apr 2023

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

#kubernetes #startups #architecture #infrastructure #hot-takes

15 Mar 2023

Container Networking Deep Dive Part 1: Single Network Namespace on a VM

In the first part of our Container Networking Deep Dive, we explore how to set up a single network namespace inside a VM and connect it to the host using a veth pair.

#linux #networking #namespaces #containers

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #kubectl #api-server #etcd #controllers #scheduler #kubelet

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#aws #ebs #ec2 #terraform #disk #storage

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #blue-green #canary #deployments #aws #sigv4 #ci-cd

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #traefik #tls #security #certificates #smallstep #pki

15 Feb 2020

The Ultimate Pathway to DevOps Revamped

A practical roadmap into DevOps for engineers starting out — what to learn, in what order, and where the genuine value is vs the hype.

#roadmap #aws #platform #engineering

15 Jan 2020

Deploying Vault with a Custom AMI

An end-to-end guide for baking a Vault AMI using Packer and deploying a Vault EC2 instance on AWS.

#vault #aws #packer #ami

15 Aug 2019

ECS Fargate Deep Dive Part 1: How Fargate Really Works

In the first part of our ECS Fargate Deep Dive, we break down what happens behind the scenes when you run a task on Fargate — Firecracker microVMs, ENIs, IAM and the hidden host fleet.

#aws #ecs #fargate #containers

15 Jul 2019

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

#aws #ecs #fargate #firecracker #containers

15 Apr 2019

Helm Atomics: The Flag That Saves Your Production Deploys (And Its Hidden Gotchas)

Deep dive into Helm's --atomic, --wait, and --cleanup-on-fail flags. How they work, when to use them, the CI/CD pipeline trap that catches everyone, and production-ready deployment patterns.

#helm #kubernetes #cicd #deployments #rollback

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#terraform #aws #s3 #dynamodb #infrastructure #state-management

15 Feb 2019

Creating a Lab Container

An end-to-end guide for creating a lab container for DevOps training.

#docker #lab #container

15 Jan 2019

The ever-changing landscape of modern applications

A look at the ever-changing landscape of modern applications

#kubernetes #docker #containers

#kubernetes 61 posts

7 Mar 2026

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

How I built a fully GitOps-managed Kubernetes homelab on a single mini PC - from unboxing to production. Proxmox bare metal install, K3s cluster, HashiCorp Vault secrets, full observability, and Cloudflare Tunnel.

#k3s #homelab #gitops #fluxcd #hashicorp-vault

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#opentelemetry #observability #aws #devops #platform-engineering #monitoring

6 Feb 2026

Identity Aware Proxy: Zero Trust Access for Internal Applications

Deep dive into Identity Aware Proxies - what they are, how they work, and how to implement them with GCP IAP, Pomerium, and OAuth2-Proxy. Includes Terraform and Kubernetes examples.

#identity-aware-proxy #zero-trust #security #terraform #oauth2

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #aks #azure #helm #devops #self-hosted #startup

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #cloud-costs #eks #multi-tenant #saas #unit-economics #aws

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#aws #infrastructure #devops #platform-engineering #lambda #ecs #terraform #networking

10 Jan 2026

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

#mlops #devops #machine-learning #platform-engineering #infrastructure

31 Dec 2025

Dragonfly vs Redis: Modern In-Memory Store Comparison

Compare Dragonfly and Redis for caching and data storage. Dragonfly's multi-threaded architecture vs Redis single-threaded model.

#dragonfly #redis #caching #database #performance

28 Dec 2025

Vitess for MySQL: Horizontal Sharding Done Right

Scale MySQL horizontally with Vitess. Automatic sharding, online schema changes, and Kubernetes-native deployment for massive scale.

#vitess #mysql #database #sharding #scaling

24 Dec 2025

NATS JetStream: Lightweight Alternative to Kafka

Deploy NATS JetStream for messaging and streaming. Simpler than Kafka, faster than RabbitMQ, with persistence and exactly-once delivery.

#nats #jetstream #messaging #streaming #microservices

20 Dec 2025

VPA + HPA Together: The Right Way to Autoscale Both

Use Vertical Pod Autoscaler and Horizontal Pod Autoscaler together without conflicts. Includes KEDA integration and best practices.

#autoscaling #vpa #hpa #keda #performance

18 Dec 2025

Pod Topology Spread Constraints - Distributing Workloads Intelligently

Control how pods spread across nodes, zones, and regions. A deep dive into topology spread constraints for high availability and efficient resource utilization.

#scheduling #high-availability #pods #devops

16 Dec 2025

FinOps Automation: Kubecost, OpenCost, and Automated Rightsizing

Implement automated cloud cost optimization with Kubecost and OpenCost. Track costs per team, rightsize resources, and automate savings.

#finops #kubecost #opencost #cost-optimization #observability

12 Dec 2025

Spot Instance Patterns: Graceful Handling and Cost Savings

Master AWS Spot Instances in production. Handle interruptions gracefully, use mixed instance groups, and save 60-90% on compute costs.

#aws #spot-instances #cost-optimization #eks #reliability

8 Dec 2025

Karpenter Deep Dive: Node Provisioning That Actually Works

Master Karpenter for Kubernetes node autoscaling. Replace Cluster Autoscaler with faster, smarter provisioning. Includes cost optimization patterns.

#karpenter #autoscaling #aws #eks #cost-optimization

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#devops #kind #localstack #act #aws #development

4 Dec 2025

Progressive Delivery with Flagger: Automated Canary Deployments

Implement automated canary deployments with Flagger. Metrics-based promotion, automated rollback, and integration with Istio, Linkerd, and Gateway API.

#flagger #canary #progressive-delivery #gitops #deployment

22 Nov 2025

Chaos Engineering with Litmus: Controlled Failure Injection

Implement chaos engineering in Kubernetes with LitmusChaos. Run pod failures, network chaos, and stress tests to validate system resilience.

#chaos-engineering #litmus #reliability #sre #testing

10 Nov 2025

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

#kyverno #opa #gatekeeper #policy #security

6 Nov 2025

Crossplane Compositions: Build Your Own Cloud API

Create custom cloud APIs with Crossplane Compositions. Abstract away complexity and give developers self-service infrastructure with guardrails.

#crossplane #platform-engineering #infrastructure #gitops

29 Oct 2025

Gateway API Advanced Patterns: Beyond Basic Ingress

Master Gateway API with traffic splitting, header-based routing, cross-namespace references, and TLS passthrough. The future of Kubernetes ingress.

#gateway-api #ingress #networking #traffic-management

21 Oct 2025

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

#cilium #service-mesh #ebpf #mtls #networking

17 Oct 2025

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

#secretless #security #zero-trust #secrets-management #sidecar

12 Oct 2025

Container Image Signing with Cosign - A Practical Guide

Sign and verify container images without managing keys. A hands-on guide to Cosign, keyless signing, and enforcing signatures in Kubernetes.

#security #cosign #containers #sigstore #devops

12 Oct 2025

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

#opa #gatekeeper #policy-as-code #security #admission-control

8 Oct 2025

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

#databases #postgresql #stateful #operators #storage

7 Oct 2025

eBPF for Security: Kernel-Level Observability Without Agents

Deep dive into eBPF-based security tools - Cilium, Falco, and Tetragon. Learn how to implement runtime security, network policies, and threat detection at the kernel level.

#ebpf #security #cilium #falco #tetragon

3 Oct 2025

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

#spiffe #spire #zero-trust #security #mtls

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#postgresql #s3 #backup #cronjob #localstack #devops #databases

15 Sept 2025

K3s Homelab Setup Guide - Running Kubernetes on Raspberry Pi 5

Build a lightweight Kubernetes cluster on three Raspberry Pi 5 devices. Step-by-step guide covering K3s installation, cluster configuration, and deployment testing.

#k3s #raspberry-pi #homelab #devops #containers

8 Sept 2025

NetworkPolicy Default Deny – The One Rule We Add to Every Namespace

Why your Kubernetes cluster is wide open by default, and the single NetworkPolicy that changes everything. Copy, paste, deploy, sleep better.

#security #networkpolicy #networking #zero-trust

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#security #supply-chain #sigstore #slsa #sbom #devops

10 Aug 2025

Pod Security Standards Enforcement - The PSP Replacement That Actually Works

How to enforce Pod Security Standards using the built-in Pod Security Admission controller. Covers Privileged, Baseline, and Restricted profiles, migration from PSPs, namespace labeling, and exemptions.

#security #pod-security #psp #admission-controller #hardening

18 Jul 2025

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

#debugging #containers #production #kubectl #devops

15 Jul 2025

External Secrets Operator with AWS Secrets Manager - Stop Mounting Secrets in ConfigMaps

How to use External Secrets Operator to sync AWS Secrets Manager secrets to Kubernetes. Covers SecretStore, ExternalSecret, IAM with IRSA, templating, and production patterns.

#external-secrets #aws #secrets-manager #security #gitops

22 Jun 2025

The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

A deep dive into why external DNS resolution in Kubernetes can be painfully slow, how the default ndots:5 setting causes unnecessary lookups, and practical fixes that actually work.

#dns #networking #coredns #performance #debugging

19 Jun 2025

Kubernetes Sidecar Startup Order - Making Your Main App Wait

How to ensure sidecar containers are ready before your main app starts. Covers startupProbe, postStart hooks, and why readinessProbe doesn't do what you think.

#sidecars #pods #containers #devops

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#devops #interviews #career #aws #terraform #sre

15 May 2025

Kubernetes Cluster Upgrades: Production-Ready Guide

Technical guide for upgrading managed Kubernetes clusters across GKE, EKS, and AKS

#gke #eks #aks #cluster-management #devops

15 Apr 2025

GKE Upgrade Guide and Rollback Strategy: A Production-Ready Approach

Comprehensive guide for safely upgrading GKE clusters with minimal downtime and robust rollback procedures

#gke #google-cloud #devops #cluster-management

18 Mar 2025

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

#opentelemetry #observability #tracing #metrics #logging

15 Mar 2025

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

#gateway-api #ingress #networking #traffic-management #k8s

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#eks #aws #production #devops #karpenter #observability

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#postgresql #database #kubectl #devops #socks5 #pg_dump

15 Dec 2024

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

#cost-optimization #resource-management #devops #cloud #finops

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#service-mesh #istio #linkerd #cilium #networking #devops

18 Mar 2024

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

#gitops #argocd #cicd #deployment #automation

15 Apr 2023

Cilium in Kubernetes

Hands-on with Cilium CNI on a local kind cluster — installation, eBPF datapath verification, network policies and Hubble observability.

#cilium

5 Apr 2023

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

#startups #architecture #infrastructure #devops #hot-takes

15 Oct 2022

EKS Private Network with Twingate

How to setup a private network for your EKS cluster with Twingate

#eks #twingate #private #network

15 Aug 2022

Secure Gateways: Configuring Mutual TLS using Gateway API on GKE

In this blog, we configure mutual TLS (mTLS) using Gateway API on GKE, securing ingress traffic with client certificate validation.

#gateway-api #mtls #gke #security

15 May 2022

SPIFFE and SPIRE in Kubernetes

Secure Your Kubernetes with SPIFFE + SPIRE: Zero-Trust Identity for Workloads

#spiffe #spire

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#dns #security #coredns #arp #net_raw #mitm

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#azure #aks #vpn #twingate #private-cluster #networking

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#apache-pulsar #kind #helm #messaging #pubsub #devtools

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubectl #api-server #etcd #controllers #scheduler #kubelet #devops

15 May 2021

AWS Controllers for Kubernetes

Manage AWS resources from Kubernetes manifests using AWS Controllers for Kubernetes (ACK). End-to-end demo on kind covering setup, RDS provisioning and the trade-offs vs Terraform.

#aws #ack

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#networking #eks #aws #cni #cilium #calico

15 Mar 2020

Serverless containers in Kubernetes with Fargate (Part 2) — Hands-on

A hands-on article on deploying an application on Kubernetes with Fargate.

#fargate #eks #aws

15 Apr 2019

Helm Atomics: The Flag That Saves Your Production Deploys (And Its Hidden Gotchas)

Deep dive into Helm's --atomic, --wait, and --cleanup-on-fail flags. How they work, when to use them, the CI/CD pipeline trap that catches everyone, and production-ready deployment patterns.

#helm #devops #cicd #deployments #rollback

15 Jan 2019

The ever-changing landscape of modern applications

A look at the ever-changing landscape of modern applications

#devops #docker #containers

#aws 55 posts

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#opentelemetry #observability #kubernetes #devops #platform-engineering #monitoring

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#control-tower #terraform #service-catalog #iam #platform-engineering #multi-account

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#control-tower #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #security #devops

9 Feb 2026

Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

S3 backup/restore, direct connectivity, Parquet exports - none of them worked cleanly. Here's the full war story of migrating a production ClickHouse instance to Cloud, the version mismatch that broke everything, and the dumb-simple approach that actually got the job done.

#clickhouse #migration #database #devops #production

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#aurora #rds #lambda #autoscaling #terraform #serverless

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#terraform #iac #migration #s3 #state-management #hcl2

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #cloud-costs #kubernetes #eks #multi-tenant #saas #unit-economics

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #infrastructure #devops #platform-engineering #lambda #ecs #terraform #networking

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #fargate #docker #containers #migration #terraform #devops

12 Dec 2025

Spot Instance Patterns: Graceful Handling and Cost Savings

Master AWS Spot Instances in production. Handle interruptions gracefully, use mixed instance groups, and save 60-90% on compute costs.

#spot-instances #kubernetes #cost-optimization #eks #reliability

8 Dec 2025

Karpenter Deep Dive: Node Provisioning That Actually Works

Master Karpenter for Kubernetes node autoscaling. Replace Cluster Autoscaler with faster, smarter provisioning. Includes cost optimization patterns.

#karpenter #kubernetes #autoscaling #eks #cost-optimization

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#devops #kind #localstack #act #kubernetes #development

20 Nov 2025

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

#localstack #testing #development #devops #docker

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#github-actions #oidc #iam #security #cicd #devops

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#control-tower #account-factory #terraform #service-catalog #organizations #sso #platform-engineering #devops

2 Nov 2025

AWS PrivateLink Deep Dive: Private Connectivity Patterns

Master AWS PrivateLink for private API access, cross-account connectivity, and SaaS integrations. Includes Terraform examples and multi-region patterns.

#privatelink #networking #vpc #terraform #security

25 Oct 2025

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

#terraform #tagging #finops #governance #devops

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#github-actions #jenkins #cicd #devops #migration #oidc

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #ecs #rds #cognito #terraform #docker #devops #platform-engineering

20 Sept 2025

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

A comprehensive guide to Terraform best practices covering project organisation, state management, module design, and foundational patterns for scalable infrastructure as code.

#terraform #iac #devops #best-practices

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #migration #dms #terraform #event-sourcing #platform-engineering #devops

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#serverless #containers #lambda #fargate #docker #devops

10 Aug 2025

FinOps for Engineering Teams - Making Cost Everyone's Problem

Cloud cost management isn't just for finance. Here's how engineering teams can build cost awareness into their workflow without slowing down delivery.

#finops #cloud #cost-optimization #devops #engineering

15 Jul 2025

External Secrets Operator with AWS Secrets Manager - Stop Mounting Secrets in ConfigMaps

How to use External Secrets Operator to sync AWS Secrets Manager secrets to Kubernetes. Covers SecretStore, ExternalSecret, IAM with IRSA, templating, and production patterns.

#kubernetes #external-secrets #secrets-manager #security #gitops

15 Jul 2025

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

#nat #gateway #instance #cost #savings

22 Jun 2025

NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep

NAT Gateways are the silent budget killer in AWS. Here's how to reduce costs with NAT instances, VPC endpoints, IPv6, and architectural changes - with real numbers and trade-offs.

#nat #networking #cost-optimization #vpc #finops

15 Jun 2025

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

#eks #networking #cni #ip-exhaustion #prefix-delegation

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#vpc #privatelink #networking #security #endpoints #terraform

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#devops #interviews #career #kubernetes #terraform #sre

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#organizations #scps #security #iam #governance #terraform

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws-config #compliance #security #automation #ssm #terraform

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #blue-green #deployments #task-sets #fargate #terraform #devops

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#lambda #rds #rds-proxy #serverless #databases #terraform #connection-pooling

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#kubernetes #eks #production #devops #karpenter #observability

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#terraform #security #networking #prefix-lists #security-groups #vpc

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #ami #terraform #ci-cd #devops #immutable-infrastructure #security

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #ecs #cloudmap #traefik #service-discovery #networking #devops

15 Jul 2023

How we migrated our CDN to AWS CloudFront at Trainline

Migrating the Trainline CDN to AWS CloudFront — traffic shaping with Lambda@Edge, the cache-hit ratio we landed on, and the production gotchas behind the cutover.

#cdn #cloudfront #trainline

15 Jun 2023

Private API Gateway - Part 2: Secure Cross-VPC Access with PrivateLink and IAM Authentication

Extend your private API Gateway with secure access from other VPCs using PrivateLink and enforce IAM-based authentication.

#api-gateway #vpc #privatelink #security #iam

15 Jun 2022

Route 53 Deep Dive: Multi-Region Latency Routing with Health-Based Failover

A hands-on guide to configuring AWS Route 53 for latency-based routing across multiple regions, incorporating health checks for automatic failover.

#route53 #dns #terraform #failover #latency-routing

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#eks #calico #cni #networking #bgp #ipip

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#ebs #ec2 #terraform #disk #storage #devops

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #blue-green #canary #deployments #sigv4 #devops #ci-cd

15 May 2021

AWS Controllers for Kubernetes

Manage AWS resources from Kubernetes manifests using AWS Controllers for Kubernetes (ACK). End-to-end demo on kind covering setup, RDS provisioning and the trade-offs vs Terraform.

#kubernetes #ack

15 Jun 2020

Securing APIs in AWS: Private API Gateway + VPC Endpoint Deep Dive

Learn how to deploy a secure, private-only API Gateway inside your VPC using interface endpoints, resource policies, and VPC integration.

#api-gateway #vpc #networking #security #terraform

15 May 2020

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

#privatelink #vpc #terraform #networking #security

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#kubernetes #networking #eks #cni #cilium #calico

15 Mar 2020

Serverless containers in Kubernetes with Fargate (Part 2) — Hands-on

A hands-on article on deploying an application on Kubernetes with Fargate.

#kubernetes #fargate #eks

15 Feb 2020

The Ultimate Pathway to DevOps Revamped

A practical roadmap into DevOps for engineers starting out — what to learn, in what order, and where the genuine value is vs the hype.

#devops #roadmap #platform #engineering

15 Jan 2020

Deploying Vault with a Custom AMI

An end-to-end guide for baking a Vault AMI using Packer and deploying a Vault EC2 instance on AWS.

#vault #packer #ami #devops

15 Aug 2019

ECS Fargate Deep Dive Part 1: How Fargate Really Works

In the first part of our ECS Fargate Deep Dive, we break down what happens behind the scenes when you run a task on Fargate — Firecracker microVMs, ENIs, IAM and the hidden host fleet.

#ecs #fargate #containers #devops

15 Jul 2019

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

#ecs #fargate #firecracker #containers #devops

15 Jun 2019

Solving the AWS OIDC Chicken-and-Egg Problem with GitHub Actions

#github #oidc

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#bgp #direct-connect #networking #hybrid-cloud #production #routing

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#terraform #s3 #dynamodb #infrastructure #devops #state-management

#terraform 31 posts

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#aws #control-tower #service-catalog #iam #platform-engineering #multi-account

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #security #devops

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #opa #rego #iac #gitops #platform-engineering #devops #modules #policy-as-code

6 Feb 2026

Identity Aware Proxy: Zero Trust Access for Internal Applications

Deep dive into Identity Aware Proxies - what they are, how they work, and how to implement them with GCP IAP, Pomerium, and OAuth2-Proxy. Includes Terraform and Kubernetes examples.

#identity-aware-proxy #zero-trust #security #kubernetes #oauth2

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#aurora #rds #aws #lambda #autoscaling #serverless

1 Feb 2026

Terraform State Surgery - Splitting, Moving, and Refactoring Without Downtime

A practical guide to breaking up monolithic Terraform state files, moving resources between states, and refactoring infrastructure safely. Includes real examples, scripts, and the exact commands we use.

#state #migration #refactoring #iac #devops

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#iac #migration #aws #s3 #state-management #hcl2

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#clawdbot #hetzner #vps #security #devops #automation

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #infrastructure #devops #platform-engineering #lambda #ecs #networking

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #fargate #docker #aws #containers #migration #devops

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #account-factory #service-catalog #organizations #sso #platform-engineering #devops

2 Nov 2025

AWS PrivateLink Deep Dive: Private Connectivity Patterns

Master AWS PrivateLink for private API access, cross-account connectivity, and SaaS integrations. Includes Terraform examples and multi-region patterns.

#aws #privatelink #networking #vpc #security

25 Oct 2025

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

#aws #tagging #finops #governance #devops

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #ecs #rds #cognito #docker #devops #platform-engineering

28 Sept 2025

Terraform Best Practices (Part 2) - Testing, CI/CD, Security, and Team Workflows

Advanced Terraform practices covering testing strategies, CI/CD pipelines, security hardening, drift detection, and team collaboration patterns for infrastructure as code at scale.

#iac #devops #cicd #testing #security

20 Sept 2025

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

A comprehensive guide to Terraform best practices covering project organisation, state management, module design, and foundational patterns for scalable infrastructure as code.

#iac #devops #aws #best-practices

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #migration #aws #dms #event-sourcing #platform-engineering #devops

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#aws #vpc #privatelink #networking #security #endpoints

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#devops #interviews #career #kubernetes #aws #sre

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#aws #organizations #scps #security #iam #governance

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws #aws-config #compliance #security #automation #ssm

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #aws #blue-green #deployments #task-sets #fargate #devops

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #lambda #rds #rds-proxy #serverless #databases #connection-pooling

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#aws #security #networking #prefix-lists #security-groups #vpc

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #ami #aws #ci-cd #devops #immutable-infrastructure #security

15 Jan 2023

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

#k8s #kafka #strimzi #operator #kind

15 Jun 2022

Route 53 Deep Dive: Multi-Region Latency Routing with Health-Based Failover

A hands-on guide to configuring AWS Route 53 for latency-based routing across multiple regions, incorporating health checks for automatic failover.

#aws #route53 #dns #failover #latency-routing

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#aws #ebs #ec2 #disk #storage #devops

15 Jun 2020

Securing APIs in AWS: Private API Gateway + VPC Endpoint Deep Dive

Learn how to deploy a secure, private-only API Gateway inside your VPC using interface endpoints, resource policies, and VPC integration.

#aws #api-gateway #vpc #networking #security

15 May 2020

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

#aws #privatelink #vpc #networking #security

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#aws #s3 #dynamodb #infrastructure #devops #state-management

#security 31 posts

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #devops

6 Feb 2026

Identity Aware Proxy: Zero Trust Access for Internal Applications

Deep dive into Identity Aware Proxies - what they are, how they work, and how to implement them with GCP IAP, Pomerium, and OAuth2-Proxy. Includes Terraform and Kubernetes examples.

#identity-aware-proxy #zero-trust #kubernetes #terraform #oauth2

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#clawdbot #hetzner #terraform #vps #devops #automation

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#clawdbot #hetzner #vps #whatsapp #devops #tutorial

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#github-actions #oidc #aws #iam #cicd #devops

10 Nov 2025

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

#kyverno #opa #gatekeeper #kubernetes #policy

2 Nov 2025

AWS PrivateLink Deep Dive: Private Connectivity Patterns

Master AWS PrivateLink for private API access, cross-account connectivity, and SaaS integrations. Includes Terraform examples and multi-region patterns.

#aws #privatelink #networking #vpc #terraform

17 Oct 2025

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

#secretless #kubernetes #zero-trust #secrets-management #sidecar

12 Oct 2025

Container Image Signing with Cosign - A Practical Guide

Sign and verify container images without managing keys. A hands-on guide to Cosign, keyless signing, and enforcing signatures in Kubernetes.

#cosign #containers #sigstore #kubernetes #devops

12 Oct 2025

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

#opa #gatekeeper #kubernetes #policy-as-code #admission-control

7 Oct 2025

eBPF for Security: Kernel-Level Observability Without Agents

Deep dive into eBPF-based security tools - Cilium, Falco, and Tetragon. Learn how to implement runtime security, network policies, and threat detection at the kernel level.

#ebpf #cilium #falco #tetragon #kubernetes

3 Oct 2025

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

#spiffe #spire #zero-trust #kubernetes #mtls

28 Sept 2025

Terraform Best Practices (Part 2) - Testing, CI/CD, Security, and Team Workflows

Advanced Terraform practices covering testing strategies, CI/CD pipelines, security hardening, drift detection, and team collaboration patterns for infrastructure as code at scale.

#terraform #iac #devops #cicd #testing

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#soc #elasticsearch #cribl #docker #homelab #devops #siem

8 Sept 2025

NetworkPolicy Default Deny – The One Rule We Add to Every Namespace

Why your Kubernetes cluster is wide open by default, and the single NetworkPolicy that changes everything. Copy, paste, deploy, sleep better.

#kubernetes #networkpolicy #networking #zero-trust

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#supply-chain #sigstore #slsa #sbom #kubernetes #devops

10 Aug 2025

Pod Security Standards Enforcement - The PSP Replacement That Actually Works

How to enforce Pod Security Standards using the built-in Pod Security Admission controller. Covers Privileged, Baseline, and Restricted profiles, migration from PSPs, namespace labeling, and exemptions.

#kubernetes #pod-security #psp #admission-controller #hardening

15 Jul 2025

External Secrets Operator with AWS Secrets Manager - Stop Mounting Secrets in ConfigMaps

How to use External Secrets Operator to sync AWS Secrets Manager secrets to Kubernetes. Covers SecretStore, ExternalSecret, IAM with IRSA, templating, and production patterns.

#kubernetes #external-secrets #aws #secrets-manager #gitops

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#aws #vpc #privatelink #networking #endpoints #terraform

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#aws #organizations #scps #iam #governance #terraform

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws #aws-config #compliance #automation #ssm #terraform

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #google-workspace #github #notion #integrations #oauth #tutorial

10 Feb 2025

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

#ebpf #linux #networking #observability #kernel

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#aws #terraform #networking #prefix-lists #security-groups #vpc

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #ami #aws #terraform #ci-cd #devops #immutable-infrastructure

15 Jun 2023

Private API Gateway - Part 2: Secure Cross-VPC Access with PrivateLink and IAM Authentication

Extend your private API Gateway with secure access from other VPCs using PrivateLink and enforce IAM-based authentication.

#aws #api-gateway #vpc #privatelink #iam

15 Aug 2022

Secure Gateways: Configuring Mutual TLS using Gateway API on GKE

In this blog, we configure mutual TLS (mTLS) using Gateway API on GKE, securing ingress traffic with client certificate validation.

#kubernetes #gateway-api #mtls #gke

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#kubernetes #dns #coredns #arp #net_raw #mitm

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #traefik #tls #certificates #smallstep #pki #devops

15 Jun 2020

Securing APIs in AWS: Private API Gateway + VPC Endpoint Deep Dive

Learn how to deploy a secure, private-only API Gateway inside your VPC using interface endpoints, resource policies, and VPC integration.

#aws #api-gateway #vpc #networking #terraform

15 May 2020

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

#aws #privatelink #vpc #terraform #networking

#networking 25 posts

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #infrastructure #devops #platform-engineering #lambda #ecs #terraform

2 Nov 2025

AWS PrivateLink Deep Dive: Private Connectivity Patterns

Master AWS PrivateLink for private API access, cross-account connectivity, and SaaS integrations. Includes Terraform examples and multi-region patterns.

#aws #privatelink #vpc #terraform #security

29 Oct 2025

Gateway API Advanced Patterns: Beyond Basic Ingress

Master Gateway API with traffic splitting, header-based routing, cross-namespace references, and TLS passthrough. The future of Kubernetes ingress.

#gateway-api #kubernetes #ingress #traffic-management

25 Oct 2025

Tailscale in Production: WireGuard Mesh for Hybrid Cloud

Deploy Tailscale for secure connectivity across clouds, offices, and Kubernetes clusters. Zero-config VPN mesh with SSO integration and ACLs.

#tailscale #wireguard #vpn #hybrid-cloud #zero-trust

21 Oct 2025

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

#cilium #service-mesh #ebpf #kubernetes #mtls

8 Sept 2025

NetworkPolicy Default Deny – The One Rule We Add to Every Namespace

Why your Kubernetes cluster is wide open by default, and the single NetworkPolicy that changes everything. Copy, paste, deploy, sleep better.

#kubernetes #security #networkpolicy #zero-trust

22 Jun 2025

The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

A deep dive into why external DNS resolution in Kubernetes can be painfully slow, how the default ndots:5 setting causes unnecessary lookups, and practical fixes that actually work.

#kubernetes #dns #coredns #performance #debugging

22 Jun 2025

NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep

NAT Gateways are the silent budget killer in AWS. Here's how to reduce costs with NAT instances, VPC endpoints, IPv6, and architectural changes - with real numbers and trade-offs.

#aws #nat #cost-optimization #vpc #finops

15 Jun 2025

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

#aws #eks #cni #ip-exhaustion #prefix-delegation

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#aws #vpc #privatelink #security #endpoints #terraform

15 Mar 2025

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

#kubernetes #gateway-api #ingress #traffic-management #k8s

10 Feb 2025

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

#ebpf #linux #security #observability #kernel

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#kubernetes #service-mesh #istio #linkerd #cilium #devops

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#aws #terraform #security #prefix-lists #security-groups #vpc

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #ecs #cloudmap #traefik #service-discovery #aws #devops

15 Mar 2023

Container Networking Deep Dive Part 1: Single Network Namespace on a VM

In the first part of our Container Networking Deep Dive, we explore how to set up a single network namespace inside a VM and connect it to the host using a veth pair.

#linux #namespaces #containers #devops

15 Feb 2023

Container Networking Deep Dive Part 2: Two Namespaces on the Same Host

In the second part of our Container Networking Deep Dive, we connect two network namespaces via a bridge on the same Linux host.

#linux #netns #containers #bridge

15 Nov 2022

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

#ec2 #eni #ip #deployment #architecture

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#aws #eks #calico #cni #bgp #ipip

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#azure #aks #kubernetes #vpn #twingate #private-cluster

15 Jul 2020

Networking Tools

#tools

15 Jun 2020

Securing APIs in AWS: Private API Gateway + VPC Endpoint Deep Dive

Learn how to deploy a secure, private-only API Gateway inside your VPC using interface endpoints, resource policies, and VPC integration.

#aws #api-gateway #vpc #security #terraform

15 May 2020

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

#aws #privatelink #vpc #terraform #security

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#kubernetes #eks #aws #cni #cilium #calico

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#bgp #aws #direct-connect #hybrid-cloud #production #routing

#platform-engineering 15 posts

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#opentelemetry #observability #kubernetes #aws #devops #monitoring

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#aws #control-tower #terraform #service-catalog #iam #multi-account

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #spacelift #security #devops

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #rego #iac #gitops #devops #modules #policy-as-code

3 Feb 2026

Platform Engineering in 2026 - It's About the Discipline, Not the Tools

Platform engineering has become the most misunderstood role in tech. Everyone's building 'platforms' but few understand what actually makes one successful. Here's what I've learned building platforms for teams of 10 to 500.

#devops #developer-experience #internal-platforms #idp

15 Jan 2026

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

#dora #devops #metrics #engineering-culture #cicd

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #infrastructure #devops #lambda #ecs #terraform #networking

10 Jan 2026

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

#mlops #devops #kubernetes #machine-learning #infrastructure

18 Nov 2025

Port and Kratix: Internal Developer Platforms Beyond Backstage

Explore Port and Kratix for building internal developer platforms. Self-service infrastructure, developer workflows, and platform engineering patterns.

#port #kratix #developer-experience #self-service

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #account-factory #terraform #service-catalog #organizations #sso #devops

14 Nov 2025

Backstage Plugins: Building Custom Developer Portal Features

Build custom Backstage plugins for your internal developer portal. Create frontend components, backend APIs, and integrate with your existing tools.

#backstage #developer-portal #react #typescript

6 Nov 2025

Crossplane Compositions: Build Your Own Cloud API

Create custom cloud APIs with Crossplane Compositions. Abstract away complexity and give developers self-service infrastructure with guardrails.

#crossplane #kubernetes #infrastructure #gitops

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #ecs #rds #cognito #terraform #docker #devops

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #migration #aws #dms #terraform #event-sourcing #devops

22 Jul 2024

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

#idp #backstage #developer-experience #devops

#engineering-culture 13 posts

4 Feb 2026

10 Rules for Negotiating Your Job Offer (From 7 Years of Engineering)

Most engineers massively undervalue themselves because no one taught them how to negotiate. Here's everything I've learned from negotiating salaries, contracts, titles, and more.

#career #negotiation #salary #advice

15 Jan 2026

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

#dora #devops #metrics #cicd #platform-engineering

10 Dec 2025

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

#career #leadership #principal-engineer #advice

5 Dec 2025

The Principal Engineer Trap

The IC ladder looks appealing until you're at the top. Many senior engineers chase Principal titles without understanding what they're signing up for. Here's what nobody tells you.

#career #leadership #principal-engineer

2 Dec 2025

Startup vs Scale-Up vs Enterprise: Where You'll Actually Learn the Most

After working across all three - tiny startups, hypergrowth scale-ups, and massive enterprises - I can tell you they're completely different jobs. Same title, same tech, completely different experience. Here's what each teaches you.

#career #startups #advice #leadership

22 Nov 2025

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

#post-mortems #incident-management #leadership #psychological-safety

18 Nov 2025

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

#career #contracting #salary #advice

18 Sept 2025

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

#remote-work #productivity #management #career

8 Jul 2025

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

#documentation #leadership #career #technical-writing

15 Jun 2025

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

#teams #leadership #career #productivity

3 Apr 2025

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

#meetings #productivity #remote-work #documentation

8 Jan 2025

Stop Chasing Certifications

Certifications have become a checkbox exercise. They don't prove competence, and they often distract from what actually matters: building things and solving real problems.

#career #certifications #learning

12 Sept 2023

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

#productivity #agile #remote-work #team-management

#career 11 posts

4 Feb 2026

10 Rules for Negotiating Your Job Offer (From 7 Years of Engineering)

Most engineers massively undervalue themselves because no one taught them how to negotiate. Here's everything I've learned from negotiating salaries, contracts, titles, and more.

#negotiation #salary #engineering-culture #advice

5 Jan 2026

That Time I Gave Away £50k Worth of Consulting for Free (And What It Taught Me About the Industry)

On interview take-home tests that are suspiciously specific, contractors who get ghosted after detailed proposals, and learning to play the game without becoming bitter about it.

#consulting #interviews #contracting #tech-industry #lessons-learned

10 Dec 2025

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

#engineering-culture #leadership #principal-engineer #advice

5 Dec 2025

The Principal Engineer Trap

The IC ladder looks appealing until you're at the top. Many senior engineers chase Principal titles without understanding what they're signing up for. Here's what nobody tells you.

#engineering-culture #leadership #principal-engineer

2 Dec 2025

Startup vs Scale-Up vs Enterprise: Where You'll Actually Learn the Most

After working across all three - tiny startups, hypergrowth scale-ups, and massive enterprises - I can tell you they're completely different jobs. Same title, same tech, completely different experience. Here's what each teaches you.

#startups #engineering-culture #advice #leadership

18 Nov 2025

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

#contracting #salary #advice #engineering-culture

18 Sept 2025

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

#remote-work #engineering-culture #productivity #management

8 Jul 2025

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

#documentation #engineering-culture #leadership #technical-writing

15 Jun 2025

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

#engineering-culture #teams #leadership #productivity

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#devops #interviews #kubernetes #aws #terraform #sre

8 Jan 2025

Stop Chasing Certifications

Certifications have become a checkbox exercise. They don't prove competence, and they often distract from what actually matters: building things and solving real problems.

#certifications #learning #engineering-culture

#containers 11 posts

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #fargate #docker #aws #migration #terraform #devops

12 Oct 2025

Container Image Signing with Cosign - A Practical Guide

Sign and verify container images without managing keys. A hands-on guide to Cosign, keyless signing, and enforcing signatures in Kubernetes.

#security #cosign #sigstore #kubernetes #devops

15 Sept 2025

K3s Homelab Setup Guide - Running Kubernetes on Raspberry Pi 5

Build a lightweight Kubernetes cluster on three Raspberry Pi 5 devices. Step-by-step guide covering K3s installation, cluster configuration, and deployment testing.

#kubernetes #k3s #raspberry-pi #homelab #devops

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#serverless #aws #lambda #fargate #docker #devops

18 Jul 2025

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

#kubernetes #debugging #production #kubectl #devops

19 Jun 2025

Kubernetes Sidecar Startup Order - Making Your Main App Wait

How to ensure sidecar containers are ready before your main app starts. Covers startupProbe, postStart hooks, and why readinessProbe doesn't do what you think.

#kubernetes #sidecars #pods #devops

15 Mar 2023

Container Networking Deep Dive Part 1: Single Network Namespace on a VM

In the first part of our Container Networking Deep Dive, we explore how to set up a single network namespace inside a VM and connect it to the host using a veth pair.

#linux #networking #namespaces #devops

15 Feb 2023

Container Networking Deep Dive Part 2: Two Namespaces on the Same Host

In the second part of our Container Networking Deep Dive, we connect two network namespaces via a bridge on the same Linux host.

#linux #networking #netns #bridge

15 Aug 2019

ECS Fargate Deep Dive Part 1: How Fargate Really Works

In the first part of our ECS Fargate Deep Dive, we break down what happens behind the scenes when you run a task on Fargate — Firecracker microVMs, ENIs, IAM and the hidden host fleet.

#aws #ecs #fargate #devops

15 Jul 2019

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

#aws #ecs #fargate #firecracker #devops

15 Jan 2019

The ever-changing landscape of modern applications

A look at the ever-changing landscape of modern applications

#devops #kubernetes #docker

#eks 10 posts

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #cloud-costs #kubernetes #multi-tenant #saas #unit-economics #aws

12 Dec 2025

Spot Instance Patterns: Graceful Handling and Cost Savings

Master AWS Spot Instances in production. Handle interruptions gracefully, use mixed instance groups, and save 60-90% on compute costs.

#aws #spot-instances #kubernetes #cost-optimization #reliability

8 Dec 2025

Karpenter Deep Dive: Node Provisioning That Actually Works

Master Karpenter for Kubernetes node autoscaling. Replace Cluster Autoscaler with faster, smarter provisioning. Includes cost optimization patterns.

#karpenter #kubernetes #autoscaling #aws #cost-optimization

15 Jun 2025

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

#aws #networking #cni #ip-exhaustion #prefix-delegation

15 May 2025

Kubernetes Cluster Upgrades: Production-Ready Guide

Technical guide for upgrading managed Kubernetes clusters across GKE, EKS, and AKS

#kubernetes #gke #aks #cluster-management #devops

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#kubernetes #aws #production #devops #karpenter #observability

15 Oct 2022

EKS Private Network with Twingate

How to setup a private network for your EKS cluster with Twingate

#kubernetes #twingate #private #network

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#aws #calico #cni #networking #bgp #ipip

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#kubernetes #networking #aws #cni #cilium #calico

15 Mar 2020

Serverless containers in Kubernetes with Fargate (Part 2) — Hands-on

A hands-on article on deploying an application on Kubernetes with Fargate.

#kubernetes #fargate #aws

#observability 10 posts

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#opentelemetry #kubernetes #aws #devops #platform-engineering #monitoring

3 Feb 2026

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

#elasticsearch #elk #kibana #logstash #migration

28 Jan 2026

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

#elasticsearch #elastic-cloud #logging #saas #managed-services

16 Dec 2025

FinOps Automation: Kubecost, OpenCost, and Automated Rightsizing

Implement automated cloud cost optimization with Kubecost and OpenCost. Track costs per team, rightsize resources, and automate savings.

#finops #kubecost #opencost #kubernetes #cost-optimization

30 Nov 2025

SLO-Based Alerting: Burn Rate Alerts vs Threshold Alerts

Implement SLO-based alerting with burn rate alerts. Move from noisy threshold alerts to meaningful reliability signals using error budgets.

#slo #sre #alerting #prometheus #reliability

26 Nov 2025

OpenTelemetry Collector Pipelines: Transform, Filter, Route Telemetry

Master OpenTelemetry Collector configuration. Build pipelines to transform metrics, filter traces, route logs, and reduce telemetry costs.

#opentelemetry #metrics #traces #logs #collector

18 Mar 2025

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

#opentelemetry #tracing #metrics #logging #kubernetes

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#kubernetes #eks #aws #production #devops #karpenter

10 Feb 2025

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

#ebpf #linux #networking #security #kernel

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#dynatrace #ansible #monitoring #alerting #automation #iac

#docker 8 posts

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #fargate #aws #containers #migration #terraform #devops

20 Nov 2025

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

#localstack #aws #testing #development #devops

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #ecs #rds #cognito #terraform #devops #platform-engineering

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#etl #python #airflow #postgresql #data-engineering #devops

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #soc #elasticsearch #cribl #homelab #devops #siem

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#serverless #containers #aws #lambda #fargate #devops

15 Feb 2019

Creating a Lab Container

An end-to-end guide for creating a lab container for DevOps training.

#devops #lab #container

15 Jan 2019

The ever-changing landscape of modern applications

A look at the ever-changing landscape of modern applications

#devops #kubernetes #containers

#vpc 7 posts

2 Nov 2025

AWS PrivateLink Deep Dive: Private Connectivity Patterns

Master AWS PrivateLink for private API access, cross-account connectivity, and SaaS integrations. Includes Terraform examples and multi-region patterns.

#aws #privatelink #networking #terraform #security

22 Jun 2025

NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep

NAT Gateways are the silent budget killer in AWS. Here's how to reduce costs with NAT instances, VPC endpoints, IPv6, and architectural changes - with real numbers and trade-offs.

#aws #nat #networking #cost-optimization #finops

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#aws #privatelink #networking #security #endpoints #terraform

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#aws #terraform #security #networking #prefix-lists #security-groups

15 Jun 2023

Private API Gateway - Part 2: Secure Cross-VPC Access with PrivateLink and IAM Authentication

Extend your private API Gateway with secure access from other VPCs using PrivateLink and enforce IAM-based authentication.

#aws #api-gateway #privatelink #security #iam

15 Jun 2020

Securing APIs in AWS: Private API Gateway + VPC Endpoint Deep Dive

Learn how to deploy a secure, private-only API Gateway inside your VPC using interface endpoints, resource policies, and VPC integration.

#aws #api-gateway #networking #security #terraform

15 May 2020

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

#aws #privatelink #terraform #networking #security

#ecs 7 posts

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #infrastructure #devops #platform-engineering #lambda #terraform #networking

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #fargate #docker #aws #containers #migration #terraform #devops

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #rds #cognito #terraform #docker #devops #platform-engineering

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#aws #blue-green #deployments #task-sets #fargate #terraform #devops

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #cloudmap #traefik #service-discovery #aws #networking #devops

15 Aug 2019

ECS Fargate Deep Dive Part 1: How Fargate Really Works

In the first part of our ECS Fargate Deep Dive, we break down what happens behind the scenes when you run a task on Fargate — Firecracker microVMs, ENIs, IAM and the hidden host fleet.

#aws #fargate #containers #devops

15 Jul 2019

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

#aws #fargate #firecracker #containers #devops

#migration 7 posts

9 Feb 2026

Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

S3 backup/restore, direct connectivity, Parquet exports - none of them worked cleanly. Here's the full war story of migrating a production ClickHouse instance to Cloud, the version mismatch that broke everything, and the dumb-simple approach that actually got the job done.

#clickhouse #aws #database #devops #production

3 Feb 2026

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

#elasticsearch #elk #kibana #logstash #observability

1 Feb 2026

Terraform State Surgery - Splitting, Moving, and Refactoring Without Downtime

A practical guide to breaking up monolithic Terraform state files, moving resources between states, and refactoring infrastructure safely. Includes real examples, scripts, and the exact commands we use.

#terraform #state #refactoring #iac #devops

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#terraform #iac #aws #s3 #state-management #hcl2

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #fargate #docker #aws #containers #terraform #devops

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#github-actions #jenkins #cicd #devops #aws #oidc

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #aws #dms #terraform #event-sourcing #platform-engineering #devops

#leadership 6 posts

10 Dec 2025

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

#career #engineering-culture #principal-engineer #advice

5 Dec 2025

The Principal Engineer Trap

The IC ladder looks appealing until you're at the top. Many senior engineers chase Principal titles without understanding what they're signing up for. Here's what nobody tells you.

#career #engineering-culture #principal-engineer

2 Dec 2025

Startup vs Scale-Up vs Enterprise: Where You'll Actually Learn the Most

After working across all three - tiny startups, hypergrowth scale-ups, and massive enterprises - I can tell you they're completely different jobs. Same title, same tech, completely different experience. Here's what each teaches you.

#career #startups #engineering-culture #advice

22 Nov 2025

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

#engineering-culture #post-mortems #incident-management #psychological-safety

8 Jul 2025

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

#documentation #engineering-culture #career #technical-writing

15 Jun 2025

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

#engineering-culture #teams #career #productivity

#finops 6 posts

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#cloud-costs #kubernetes #eks #multi-tenant #saas #unit-economics #aws

16 Dec 2025

FinOps Automation: Kubecost, OpenCost, and Automated Rightsizing

Implement automated cloud cost optimization with Kubecost and OpenCost. Track costs per team, rightsize resources, and automate savings.

#kubecost #opencost #kubernetes #cost-optimization #observability

25 Oct 2025

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

#aws #terraform #tagging #governance #devops

10 Aug 2025

FinOps for Engineering Teams - Making Cost Everyone's Problem

Cloud cost management isn't just for finance. Here's how engineering teams can build cost awareness into their workflow without slowing down delivery.

#cloud #aws #cost-optimization #devops #engineering

22 Jun 2025

NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep

NAT Gateways are the silent budget killer in AWS. Here's how to reduce costs with NAT instances, VPC endpoints, IPv6, and architectural changes - with real numbers and trade-offs.

#aws #nat #networking #cost-optimization #vpc

15 Dec 2024

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

#kubernetes #cost-optimization #resource-management #devops #cloud

#gitops 6 posts

7 Mar 2026

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

How I built a fully GitOps-managed Kubernetes homelab on a single mini PC - from unboxing to production. Proxmox bare metal install, K3s cluster, HashiCorp Vault secrets, full observability, and Cloudflare Tunnel.

#kubernetes #k3s #homelab #fluxcd #hashicorp-vault

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #rego #iac #platform-engineering #devops #modules #policy-as-code

4 Dec 2025

Progressive Delivery with Flagger: Automated Canary Deployments

Implement automated canary deployments with Flagger. Metrics-based promotion, automated rollback, and integration with Istio, Linkerd, and Gateway API.

#flagger #canary #progressive-delivery #kubernetes #deployment

6 Nov 2025

Crossplane Compositions: Build Your Own Cloud API

Create custom cloud APIs with Crossplane Compositions. Abstract away complexity and give developers self-service infrastructure with guardrails.

#crossplane #kubernetes #platform-engineering #infrastructure

15 Jul 2025

External Secrets Operator with AWS Secrets Manager - Stop Mounting Secrets in ConfigMaps

How to use External Secrets Operator to sync AWS Secrets Manager secrets to Kubernetes. Covers SecretStore, ExternalSecret, IAM with IRSA, templating, and production patterns.

#kubernetes #external-secrets #aws #secrets-manager #security

18 Mar 2024

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

#argocd #kubernetes #cicd #deployment #automation

#cicd 6 posts

15 Jan 2026

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

#dora #devops #metrics #engineering-culture #platform-engineering

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#github-actions #oidc #aws #iam #security #devops

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#github-actions #jenkins #devops #migration #aws #oidc

28 Sept 2025

Terraform Best Practices (Part 2) - Testing, CI/CD, Security, and Team Workflows

Advanced Terraform practices covering testing strategies, CI/CD pipelines, security hardening, drift detection, and team collaboration patterns for infrastructure as code at scale.

#terraform #iac #devops #testing #security

18 Mar 2024

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

#gitops #argocd #kubernetes #deployment #automation

15 Apr 2019

Helm Atomics: The Flag That Saves Your Production Deploys (And Its Hidden Gotchas)

Deep dive into Helm's --atomic, --wait, and --cleanup-on-fail flags. How they work, when to use them, the CI/CD pipeline trap that catches everyone, and production-ready deployment patterns.

#helm #kubernetes #devops #deployments #rollback

#iac 6 posts

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #rego #gitops #platform-engineering #devops #modules #policy-as-code

1 Feb 2026

Terraform State Surgery - Splitting, Moving, and Refactoring Without Downtime

A practical guide to breaking up monolithic Terraform state files, moving resources between states, and refactoring infrastructure safely. Includes real examples, scripts, and the exact commands we use.

#terraform #state #migration #refactoring #devops

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#terraform #migration #aws #s3 #state-management #hcl2

28 Sept 2025

Terraform Best Practices (Part 2) - Testing, CI/CD, Security, and Team Workflows

Advanced Terraform practices covering testing strategies, CI/CD pipelines, security hardening, drift detection, and team collaboration patterns for infrastructure as code at scale.

#terraform #devops #cicd #testing #security

20 Sept 2025

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

A comprehensive guide to Terraform best practices covering project organisation, state management, module design, and foundational patterns for scalable infrastructure as code.

#terraform #devops #aws #best-practices

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#dynatrace #ansible #monitoring #alerting #automation #observability

#fargate 6 posts

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#java #ecs #docker #aws #containers #migration #terraform #devops

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#serverless #containers #aws #lambda #docker #devops

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #aws #blue-green #deployments #task-sets #terraform #devops

15 Mar 2020

Serverless containers in Kubernetes with Fargate (Part 2) — Hands-on

A hands-on article on deploying an application on Kubernetes with Fargate.

#kubernetes #eks #aws

15 Aug 2019

ECS Fargate Deep Dive Part 1: How Fargate Really Works

In the first part of our ECS Fargate Deep Dive, we break down what happens behind the scenes when you run a task on Fargate — Firecracker microVMs, ENIs, IAM and the hidden host fleet.

#aws #ecs #containers #devops

15 Jul 2019

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

#aws #ecs #firecracker #containers #devops

#cost-optimization 6 posts

16 Dec 2025

FinOps Automation: Kubecost, OpenCost, and Automated Rightsizing

Implement automated cloud cost optimization with Kubecost and OpenCost. Track costs per team, rightsize resources, and automate savings.

#finops #kubecost #opencost #kubernetes #observability

12 Dec 2025

Spot Instance Patterns: Graceful Handling and Cost Savings

Master AWS Spot Instances in production. Handle interruptions gracefully, use mixed instance groups, and save 60-90% on compute costs.

#aws #spot-instances #kubernetes #eks #reliability

8 Dec 2025

Karpenter Deep Dive: Node Provisioning That Actually Works

Master Karpenter for Kubernetes node autoscaling. Replace Cluster Autoscaler with faster, smarter provisioning. Includes cost optimization patterns.

#karpenter #kubernetes #autoscaling #aws #eks

10 Aug 2025

FinOps for Engineering Teams - Making Cost Everyone's Problem

Cloud cost management isn't just for finance. Here's how engineering teams can build cost awareness into their workflow without slowing down delivery.

#finops #cloud #aws #devops #engineering

22 Jun 2025

NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep

NAT Gateways are the silent budget killer in AWS. Here's how to reduce costs with NAT instances, VPC endpoints, IPv6, and architectural changes - with real numbers and trade-offs.

#aws #nat #networking #vpc #finops

15 Dec 2024

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

#kubernetes #resource-management #devops #cloud #finops

#production 5 posts

9 Feb 2026

Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

S3 backup/restore, direct connectivity, Parquet exports - none of them worked cleanly. Here's the full war story of migrating a production ClickHouse instance to Cloud, the version mismatch that broke everything, and the dumb-simple approach that actually got the job done.

#clickhouse #aws #migration #database #devops

18 Jul 2025

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

#kubernetes #debugging #containers #kubectl #devops

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#kubernetes #eks #aws #devops #karpenter #observability

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #incident #log-rotation #linux #on-call #devops #war-stories

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#bgp #aws #direct-connect #networking #hybrid-cloud #routing

#infrastructure 5 posts

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #devops #platform-engineering #lambda #ecs #terraform #networking

10 Jan 2026

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

#mlops #devops #kubernetes #machine-learning #platform-engineering

6 Nov 2025

Crossplane Compositions: Build Your Own Cloud API

Create custom cloud APIs with Crossplane Compositions. Abstract away complexity and give developers self-service infrastructure with guardrails.

#crossplane #kubernetes #platform-engineering #gitops

5 Apr 2023

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

#kubernetes #startups #architecture #devops #hot-takes

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#terraform #aws #s3 #dynamodb #devops #state-management

#automation 5 posts

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#clawdbot #hetzner #terraform #vps #security #devops

8 Nov 2025

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

#github-actions #ci-cd #act #devops #testing

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws #aws-config #compliance #security #ssm #terraform

18 Mar 2024

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

#gitops #argocd #kubernetes #cicd #deployment

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#dynatrace #ansible #monitoring #alerting #observability #iac

#github-actions 5 posts

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#oidc #aws #iam #security #cicd #devops

8 Nov 2025

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

#ci-cd #act #devops #testing #automation

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#jenkins #cicd #devops #migration #aws #oidc

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#traefik #blue-green #canary #deployments #aws #sigv4 #devops #ci-cd

15 Feb 2021

Zero to Production: GitHub Actions CI/CD into GKE with Workload Identity

In this deep dive, we set up a secure, production-ready CI/CD pipeline from GitHub Actions to GKE using Workload Identity Federation—no secrets needed.

#gke #ci-cd #oidc #workload-identity

#sre 5 posts

30 Nov 2025

SLO-Based Alerting: Burn Rate Alerts vs Threshold Alerts

Implement SLO-based alerting with burn rate alerts. Move from noisy threshold alerts to meaningful reliability signals using error budgets.

#slo #alerting #prometheus #observability #reliability

22 Nov 2025

Chaos Engineering with Litmus: Controlled Failure Injection

Implement chaos engineering in Kubernetes with LitmusChaos. Run pod failures, network chaos, and stress tests to validate system resilience.

#chaos-engineering #litmus #kubernetes #reliability #testing

12 Aug 2025

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

#devops #reliability #on-call #monitoring #incident-management

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#devops #interviews #career #kubernetes #aws #terraform

15 May 2025

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

#incident-management #on-call #post-mortems #devops

#cilium 5 posts

21 Oct 2025

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

#service-mesh #ebpf #kubernetes #mtls #networking

7 Oct 2025

eBPF for Security: Kernel-Level Observability Without Agents

Deep dive into eBPF-based security tools - Cilium, Falco, and Tetragon. Learn how to implement runtime security, network policies, and threat detection at the kernel level.

#ebpf #security #falco #tetragon #kubernetes

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#kubernetes #service-mesh #istio #linkerd #networking #devops

15 Apr 2023

Cilium in Kubernetes

Hands-on with Cilium CNI on a local kind cluster — installation, eBPF datapath verification, network policies and Hubble observability.

#kubernetes

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#kubernetes #networking #eks #aws #cni #calico

#linux 5 posts

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #ec2 #debugging #performance #memory #threads #devops

10 Feb 2025

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

#ebpf #networking #security #observability #kernel

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #incident #log-rotation #on-call #devops #production #war-stories

15 Mar 2023

Container Networking Deep Dive Part 1: Single Network Namespace on a VM

In the first part of our Container Networking Deep Dive, we explore how to set up a single network namespace inside a VM and connect it to the host using a veth pair.

#networking #namespaces #containers #devops

15 Feb 2023

Container Networking Deep Dive Part 2: Two Namespaces on the Same Host

In the second part of our Container Networking Deep Dive, we connect two network namespaces via a bridge on the same Linux host.

#networking #netns #containers #bridge

#dns 5 posts

22 Jun 2025

The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

A deep dive into why external DNS resolution in Kubernetes can be painfully slow, how the default ndots:5 setting causes unnecessary lookups, and practical fixes that actually work.

#kubernetes #networking #coredns #performance #debugging

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#udp #ecs #cloudmap #traefik #service-discovery #aws #networking #devops

15 Sept 2022

Using GKE DNS-based endpoints for Secure cluster access

Use GKE's DNS-based control plane endpoint to reach a private cluster without bastions or VPNs. IAM-gated kubectl access via Cloud DNS, fully private.

#k8s #gke #private #cluster #access

15 Jun 2022

Route 53 Deep Dive: Multi-Region Latency Routing with Health-Based Failover

A hands-on guide to configuring AWS Route 53 for latency-based routing across multiple regions, incorporating health checks for automatic failover.

#aws #route53 #terraform #failover #latency-routing

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#kubernetes #security #coredns #arp #net_raw #mitm

#gke 5 posts

15 May 2025

Kubernetes Cluster Upgrades: Production-Ready Guide

Technical guide for upgrading managed Kubernetes clusters across GKE, EKS, and AKS

#kubernetes #eks #aks #cluster-management #devops

15 Apr 2025

GKE Upgrade Guide and Rollback Strategy: A Production-Ready Approach

Comprehensive guide for safely upgrading GKE clusters with minimal downtime and robust rollback procedures

#kubernetes #google-cloud #devops #cluster-management

15 Sept 2022

Using GKE DNS-based endpoints for Secure cluster access

Use GKE's DNS-based control plane endpoint to reach a private cluster without bastions or VPNs. IAM-gated kubectl access via Cloud DNS, fully private.

#k8s #dns #private #cluster #access

15 Aug 2022

Secure Gateways: Configuring Mutual TLS using Gateway API on GKE

In this blog, we configure mutual TLS (mTLS) using Gateway API on GKE, securing ingress traffic with client certificate validation.

#kubernetes #gateway-api #mtls #security

15 Feb 2021

Zero to Production: GitHub Actions CI/CD into GKE with Workload Identity

In this deep dive, we set up a secure, production-ready CI/CD pipeline from GitHub Actions to GKE using Workload Identity Federation—no secrets needed.

#github-actions #ci-cd #oidc #workload-identity

#zero-trust 5 posts

6 Feb 2026

Identity Aware Proxy: Zero Trust Access for Internal Applications

Deep dive into Identity Aware Proxies - what they are, how they work, and how to implement them with GCP IAP, Pomerium, and OAuth2-Proxy. Includes Terraform and Kubernetes examples.

#identity-aware-proxy #security #kubernetes #terraform #oauth2

25 Oct 2025

Tailscale in Production: WireGuard Mesh for Hybrid Cloud

Deploy Tailscale for secure connectivity across clouds, offices, and Kubernetes clusters. Zero-config VPN mesh with SSO integration and ACLs.

#tailscale #wireguard #vpn #networking #hybrid-cloud

17 Oct 2025

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

#secretless #security #kubernetes #secrets-management #sidecar

3 Oct 2025

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

#spiffe #spire #security #kubernetes #mtls

8 Sept 2025

NetworkPolicy Default Deny – The One Rule We Add to Every Namespace

Why your Kubernetes cluster is wide open by default, and the single NetworkPolicy that changes everything. Copy, paste, deploy, sleep better.

#kubernetes #security #networkpolicy #networking

#productivity 4 posts

18 Sept 2025

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

#remote-work #engineering-culture #management #career

15 Jun 2025

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

#engineering-culture #teams #leadership #career

3 Apr 2025

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

#meetings #engineering-culture #remote-work #documentation

12 Sept 2023

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

#engineering-culture #agile #remote-work #team-management

#iam 4 posts

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#aws #control-tower #terraform #service-catalog #platform-engineering #multi-account

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#github-actions #oidc #aws #security #cicd #devops

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#aws #organizations #scps #security #governance #terraform

15 Jun 2023

Private API Gateway - Part 2: Secure Cross-VPC Access with PrivateLink and IAM Authentication

Extend your private API Gateway with secure access from other VPCs using PrivateLink and enforce IAM-based authentication.

#aws #api-gateway #vpc #privatelink #security

#ci-cd 4 posts

8 Nov 2025

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

#github-actions #act #devops #testing #automation

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #ami #aws #terraform #devops #immutable-infrastructure #security

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #blue-green #canary #deployments #aws #sigv4 #devops

15 Feb 2021

Zero to Production: GitHub Actions CI/CD into GKE with Workload Identity

In this deep dive, we set up a secure, production-ready CI/CD pipeline from GitHub Actions to GKE using Workload Identity Federation—no secrets needed.

#gke #github-actions #oidc #workload-identity

#testing 4 posts

22 Nov 2025

Chaos Engineering with Litmus: Controlled Failure Injection

Implement chaos engineering in Kubernetes with LitmusChaos. Run pod failures, network chaos, and stress tests to validate system resilience.

#chaos-engineering #litmus #kubernetes #reliability #sre

20 Nov 2025

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

#localstack #aws #development #devops #docker

8 Nov 2025

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

#github-actions #ci-cd #act #devops #automation

28 Sept 2025

Terraform Best Practices (Part 2) - Testing, CI/CD, Security, and Team Workflows

Advanced Terraform practices covering testing strategies, CI/CD pipelines, security hardening, drift detection, and team collaboration patterns for infrastructure as code at scale.

#terraform #iac #devops #cicd #security

#privatelink 4 posts

2 Nov 2025

AWS PrivateLink Deep Dive: Private Connectivity Patterns

Master AWS PrivateLink for private API access, cross-account connectivity, and SaaS integrations. Includes Terraform examples and multi-region patterns.

#aws #networking #vpc #terraform #security

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#aws #vpc #networking #security #endpoints #terraform

15 Jun 2023

Private API Gateway - Part 2: Secure Cross-VPC Access with PrivateLink and IAM Authentication

Extend your private API Gateway with secure access from other VPCs using PrivateLink and enforce IAM-based authentication.

#aws #api-gateway #vpc #security #iam

15 May 2020

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

#aws #vpc #terraform #networking #security

#reliability 4 posts

12 Dec 2025

Spot Instance Patterns: Graceful Handling and Cost Savings

Master AWS Spot Instances in production. Handle interruptions gracefully, use mixed instance groups, and save 60-90% on compute costs.

#aws #spot-instances #kubernetes #cost-optimization #eks

30 Nov 2025

SLO-Based Alerting: Burn Rate Alerts vs Threshold Alerts

Implement SLO-based alerting with burn rate alerts. Move from noisy threshold alerts to meaningful reliability signals using error budgets.

#slo #sre #alerting #prometheus #observability

22 Nov 2025

Chaos Engineering with Litmus: Controlled Failure Injection

Implement chaos engineering in Kubernetes with LitmusChaos. Run pod failures, network chaos, and stress tests to validate system resilience.

#chaos-engineering #litmus #kubernetes #sre #testing

12 Aug 2025

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

#sre #devops #on-call #monitoring #incident-management

#oidc 4 posts

19 Nov 2025

GitHub Actions OIDC – Ditch the AWS Access Keys Forever

How to authenticate GitHub Actions to AWS without storing secrets. OIDC federation explained, IAM role setup, and the token claims that control access.

#github-actions #aws #iam #security #cicd #devops

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#github-actions #jenkins #cicd #devops #migration #aws

15 Feb 2021

Zero to Production: GitHub Actions CI/CD into GKE with Workload Identity

In this deep dive, we set up a secure, production-ready CI/CD pipeline from GitHub Actions to GKE using Workload Identity Federation—no secrets needed.

#gke #github-actions #ci-cd #workload-identity

15 Jun 2019

Solving the AWS OIDC Chicken-and-Egg Problem with GitHub Actions

#aws #github

#mtls 4 posts

21 Oct 2025

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

#cilium #service-mesh #ebpf #kubernetes #networking

3 Oct 2025

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

#spiffe #spire #zero-trust #security #kubernetes

15 Aug 2022

Secure Gateways: Configuring Mutual TLS using Gateway API on GKE

In this blog, we configure mutual TLS (mTLS) using Gateway API on GKE, securing ingress traffic with client certificate validation.

#kubernetes #gateway-api #gke #security

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#traefik #tls #security #certificates #smallstep #pki #devops

#database 4 posts

9 Feb 2026

Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

S3 backup/restore, direct connectivity, Parquet exports - none of them worked cleanly. Here's the full war story of migrating a production ClickHouse instance to Cloud, the version mismatch that broke everything, and the dumb-simple approach that actually got the job done.

#clickhouse #aws #migration #devops #production

31 Dec 2025

Dragonfly vs Redis: Modern In-Memory Store Comparison

Compare Dragonfly and Redis for caching and data storage. Dragonfly's multi-threaded architecture vs Redis single-threaded model.

#dragonfly #redis #caching #kubernetes #performance

28 Dec 2025

Vitess for MySQL: Horizontal Sharding Done Right

Scale MySQL horizontally with Vitess. Automatic sharding, online schema changes, and Kubernetes-native deployment for massive scale.

#vitess #mysql #sharding #kubernetes #scaling

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#kubernetes #postgresql #kubectl #devops #socks5 #pg_dump

#advice 4 posts

4 Feb 2026

10 Rules for Negotiating Your Job Offer (From 7 Years of Engineering)

Most engineers massively undervalue themselves because no one taught them how to negotiate. Here's everything I've learned from negotiating salaries, contracts, titles, and more.

#career #negotiation #salary #engineering-culture

10 Dec 2025

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

#career #engineering-culture #leadership #principal-engineer

2 Dec 2025

Startup vs Scale-Up vs Enterprise: Where You'll Actually Learn the Most

After working across all three - tiny startups, hypergrowth scale-ups, and massive enterprises - I can tell you they're completely different jobs. Same title, same tech, completely different experience. Here's what each teaches you.

#career #startups #engineering-culture #leadership

18 Nov 2025

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

#career #contracting #salary #engineering-culture

#localstack 4 posts

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#devops #kind #act #kubernetes #aws #development

20 Nov 2025

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

#aws #testing #development #devops #docker

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #postgresql #s3 #backup #cronjob #devops #databases

15 Apr 2021

Crossplane and Localstack

Crossplane + LocalStack on kind: 100 % Local AWS Infrastructure-as-Code

#crossplane

#postgresql 4 posts

8 Oct 2025

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

#kubernetes #databases #stateful #operators #storage

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #s3 #backup #cronjob #localstack #devops #databases

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#etl #python #airflow #docker #data-engineering #devops

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#kubernetes #database #kubectl #devops #socks5 #pg_dump

#kind 4 posts

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#devops #localstack #act #kubernetes #aws #development

15 Jan 2023

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

#k8s #kafka #strimzi #operator #terraform

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#apache-pulsar #kubernetes #helm #messaging #pubsub #devtools

15 Mar 2021

Falco on K8s (Kind)

Falco Kubernetes Lab: Runtime Threat Detection with Prometheus & Grafana

#falco #prometheus #grafana

#performance 4 posts

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #ec2 #debugging #memory #threads #linux #devops

31 Dec 2025

Dragonfly vs Redis: Modern In-Memory Store Comparison

Compare Dragonfly and Redis for caching and data storage. Dragonfly's multi-threaded architecture vs Redis single-threaded model.

#dragonfly #redis #caching #database #kubernetes

20 Dec 2025

VPA + HPA Together: The Right Way to Autoscale Both

Use Vertical Pod Autoscaler and Horizontal Pod Autoscaler together without conflicts. Includes KEDA integration and best practices.

#kubernetes #autoscaling #vpa #hpa #keda

22 Jun 2025

The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

A deep dive into why external DNS resolution in Kubernetes can be painfully slow, how the default ndots:5 setting causes unnecessary lookups, and practical fixes that actually work.

#kubernetes #dns #networking #coredns #debugging

#lambda 4 posts

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#aurora #rds #aws #autoscaling #terraform #serverless

15 Jan 2026

7 Years of Infrastructure Decisions: What I'd Do Again and What I Regret

Every infrastructure decision I'd make again – and the ones I wouldn't – after running production workloads across fintech, open-source, IoT, and beyond.

#kubernetes #aws #infrastructure #devops #platform-engineering #ecs #terraform #networking

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#serverless #containers #aws #fargate #docker #devops

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #rds #rds-proxy #serverless #databases #terraform #connection-pooling

#control-tower 3 posts

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#aws #terraform #service-catalog #iam #platform-engineering #multi-account

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #security #devops

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #account-factory #terraform #service-catalog #organizations #sso #platform-engineering #devops

#service-catalog 3 posts

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#aws #control-tower #terraform #iam #platform-engineering #multi-account

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #sso #iam-identity-center #scps #platform-engineering #spacelift #security #devops

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #account-factory #terraform #organizations #sso #platform-engineering #devops

#organizations 3 posts

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #security #devops

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #account-factory #terraform #service-catalog #sso #platform-engineering #devops

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#aws #scps #security #iam #governance #terraform

#backstage 3 posts

14 Nov 2025

Backstage Plugins: Building Custom Developer Portal Features

Build custom Backstage plugins for your internal developer portal. Create frontend components, backend APIs, and integrate with your existing tools.

#developer-portal #platform-engineering #react #typescript

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#aws #ecs #rds #cognito #terraform #docker #devops #platform-engineering

22 Jul 2024

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

#platform-engineering #idp #developer-experience #devops

#rds 3 posts

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#aurora #aws #lambda #autoscaling #terraform #serverless

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #ecs #cognito #terraform #docker #devops #platform-engineering

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #lambda #rds-proxy #serverless #databases #terraform #connection-pooling

#incident-management 3 posts

22 Nov 2025

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

#engineering-culture #post-mortems #leadership #psychological-safety

12 Aug 2025

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

#sre #devops #reliability #on-call #monitoring

15 May 2025

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

#sre #on-call #post-mortems #devops

#developer-experience 3 posts

3 Feb 2026

Platform Engineering in 2026 - It's About the Discipline, Not the Tools

Platform engineering has become the most misunderstood role in tech. Everyone's building 'platforms' but few understand what actually makes one successful. Here's what I've learned building platforms for teams of 10 to 500.

#platform-engineering #devops #internal-platforms #idp

18 Nov 2025

Port and Kratix: Internal Developer Platforms Beyond Backstage

Explore Port and Kratix for building internal developer platforms. Self-service infrastructure, developer workflows, and platform engineering patterns.

#platform-engineering #port #kratix #self-service

22 Jul 2024

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

#platform-engineering #idp #backstage #devops

#ebpf 3 posts

21 Oct 2025

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

#cilium #service-mesh #kubernetes #mtls #networking

7 Oct 2025

eBPF for Security: Kernel-Level Observability Without Agents

Deep dive into eBPF-based security tools - Cilium, Falco, and Tetragon. Learn how to implement runtime security, network policies, and threat detection at the kernel level.

#security #cilium #falco #tetragon #kubernetes

10 Feb 2025

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

#linux #networking #security #observability #kernel

#clawdbot 3 posts

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#hetzner #terraform #vps #security #devops #automation

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#hetzner #vps #whatsapp #devops #security #tutorial

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#security #google-workspace #github #notion #integrations #oauth #tutorial

#s3 3 posts

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#terraform #iac #migration #aws #state-management #hcl2

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #postgresql #backup #cronjob #localstack #devops #databases

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#terraform #aws #dynamodb #infrastructure #devops #state-management

#databases 3 posts

8 Oct 2025

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

#kubernetes #postgresql #stateful #operators #storage

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #postgresql #s3 #backup #cronjob #localstack #devops

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #lambda #rds #rds-proxy #serverless #terraform #connection-pooling

#k8s 3 posts

15 Mar 2025

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

#kubernetes #gateway-api #ingress #networking #traffic-management

15 Jan 2023

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

#kafka #strimzi #operator #kind #terraform

15 Sept 2022

Using GKE DNS-based endpoints for Secure cluster access

Use GKE's DNS-based control plane endpoint to reach a private cluster without bastions or VPNs. IAM-gated kubectl access via Cloud DNS, fully private.

#gke #dns #private #cluster #access

#traefik 3 posts

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #ecs #cloudmap #service-discovery #aws #networking #devops

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #blue-green #canary #deployments #aws #sigv4 #devops #ci-cd

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #tls #security #certificates #smallstep #pki #devops

#metrics 3 posts

15 Jan 2026

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

#dora #devops #engineering-culture #cicd #platform-engineering

26 Nov 2025

OpenTelemetry Collector Pipelines: Transform, Filter, Route Telemetry

Master OpenTelemetry Collector configuration. Build pipelines to transform metrics, filter traces, route logs, and reduce telemetry costs.

#opentelemetry #observability #traces #logs #collector

18 Mar 2025

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

#opentelemetry #observability #tracing #logging #kubernetes

#monitoring 3 posts

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#opentelemetry #observability #kubernetes #aws #devops #platform-engineering

12 Aug 2025

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

#sre #devops #reliability #on-call #incident-management

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#dynatrace #ansible #alerting #automation #observability #iac

#ec2 3 posts

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #debugging #performance #memory #threads #linux #devops

15 Nov 2022

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

#networking #eni #ip #deployment #architecture

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#aws #ebs #terraform #disk #storage #devops

#deployment 3 posts

4 Dec 2025

Progressive Delivery with Flagger: Automated Canary Deployments

Implement automated canary deployments with Flagger. Metrics-based promotion, automated rollback, and integration with Istio, Linkerd, and Gateway API.

#flagger #canary #progressive-delivery #kubernetes #gitops

18 Mar 2024

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

#gitops #argocd #kubernetes #cicd #automation

15 Nov 2022

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

#ec2 #networking #eni #ip #architecture

#deployments 3 posts

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #aws #blue-green #task-sets #fargate #terraform #devops

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #blue-green #canary #aws #sigv4 #devops #ci-cd

15 Apr 2019

Helm Atomics: The Flag That Saves Your Production Deploys (And Its Hidden Gotchas)

Deep dive into Helm's --atomic, --wait, and --cleanup-on-fail flags. How they work, when to use them, the CI/CD pipeline trap that catches everyone, and production-ready deployment patterns.

#helm #kubernetes #devops #cicd #rollback

#cni 3 posts

15 Jun 2025

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

#aws #eks #networking #ip-exhaustion #prefix-delegation

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#aws #eks #calico #networking #bgp #ipip

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#kubernetes #networking #eks #aws #cilium #calico

#elasticsearch 3 posts

3 Feb 2026

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

#elk #kibana #logstash #migration #observability

28 Jan 2026

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

#elastic-cloud #observability #logging #saas #managed-services

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #soc #cribl #docker #homelab #devops #siem

#debugging 3 posts

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #ec2 #performance #memory #threads #linux #devops

18 Jul 2025

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

#kubernetes #containers #production #kubectl #devops

22 Jun 2025

The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

A deep dive into why external DNS resolution in Kubernetes can be painfully slow, how the default ndots:5 setting causes unnecessary lookups, and practical fixes that actually work.

#kubernetes #dns #networking #coredns #performance

#kubectl 3 posts

18 Jul 2025

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

#kubernetes #debugging #containers #production #devops

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#kubernetes #postgresql #database #devops #socks5 #pg_dump

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #api-server #etcd #controllers #scheduler #kubelet #devops

#gateway-api 3 posts

29 Oct 2025

Gateway API Advanced Patterns: Beyond Basic Ingress

Master Gateway API with traffic splitting, header-based routing, cross-namespace references, and TLS passthrough. The future of Kubernetes ingress.

#kubernetes #ingress #networking #traffic-management

15 Mar 2025

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

#kubernetes #ingress #networking #traffic-management #k8s

15 Aug 2022

Secure Gateways: Configuring Mutual TLS using Gateway API on GKE

In this blog, we configure mutual TLS (mTLS) using Gateway API on GKE, securing ingress traffic with client certificate validation.

#kubernetes #mtls #gke #security

#helm 3 posts

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #kubernetes #aks #azure #devops #self-hosted #startup

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#apache-pulsar #kubernetes #kind #messaging #pubsub #devtools

15 Apr 2019

Helm Atomics: The Flag That Saves Your Production Deploys (And Its Hidden Gotchas)

Deep dive into Helm's --atomic, --wait, and --cleanup-on-fail flags. How they work, when to use them, the CI/CD pipeline trap that catches everyone, and production-ready deployment patterns.

#kubernetes #devops #cicd #deployments #rollback

#homelab 3 posts

7 Mar 2026

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

How I built a fully GitOps-managed Kubernetes homelab on a single mini PC - from unboxing to production. Proxmox bare metal install, K3s cluster, HashiCorp Vault secrets, full observability, and Cloudflare Tunnel.

#kubernetes #k3s #gitops #fluxcd #hashicorp-vault

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #soc #elasticsearch #cribl #docker #devops #siem

15 Sept 2025

K3s Homelab Setup Guide - Running Kubernetes on Raspberry Pi 5

Build a lightweight Kubernetes cluster on three Raspberry Pi 5 devices. Step-by-step guide covering K3s installation, cluster configuration, and deployment testing.

#kubernetes #k3s #raspberry-pi #devops #containers

#on-call 3 posts

12 Aug 2025

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

#sre #devops #reliability #monitoring #incident-management

15 May 2025

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

#incident-management #sre #post-mortems #devops

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #incident #log-rotation #linux #devops #production #war-stories

#aks 3 posts

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #kubernetes #azure #helm #devops #self-hosted #startup

15 May 2025

Kubernetes Cluster Upgrades: Production-Ready Guide

Technical guide for upgrading managed Kubernetes clusters across GKE, EKS, and AKS

#kubernetes #gke #eks #cluster-management #devops

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#azure #kubernetes #vpn #twingate #private-cluster #networking

#autoscaling 3 posts

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#aurora #rds #aws #lambda #terraform #serverless

20 Dec 2025

VPA + HPA Together: The Right Way to Autoscale Both

Use Vertical Pod Autoscaler and Horizontal Pod Autoscaler together without conflicts. Includes KEDA integration and best practices.

#kubernetes #vpa #hpa #keda #performance

8 Dec 2025

Karpenter Deep Dive: Node Provisioning That Actually Works

Master Karpenter for Kubernetes node autoscaling. Replace Cluster Autoscaler with faster, smarter provisioning. Includes cost optimization patterns.

#karpenter #kubernetes #aws #eks #cost-optimization

#opa 3 posts

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #rego #iac #gitops #platform-engineering #devops #modules #policy-as-code

10 Nov 2025

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

#kyverno #gatekeeper #kubernetes #policy #security

12 Oct 2025

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

#gatekeeper #kubernetes #policy-as-code #security #admission-control

#remote-work 3 posts

18 Sept 2025

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

#engineering-culture #productivity #management #career

3 Apr 2025

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

#meetings #productivity #engineering-culture #documentation

12 Sept 2023

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

#engineering-culture #productivity #agile #team-management

#opentelemetry 3 posts

4 Mar 2026

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

#observability #kubernetes #aws #devops #platform-engineering #monitoring

26 Nov 2025

OpenTelemetry Collector Pipelines: Transform, Filter, Route Telemetry

Master OpenTelemetry Collector configuration. Build pipelines to transform metrics, filter traces, route logs, and reduce telemetry costs.

#observability #metrics #traces #logs #collector

18 Mar 2025

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

#observability #tracing #metrics #logging #kubernetes

#serverless 3 posts

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#aurora #rds #aws #lambda #autoscaling #terraform

25 Aug 2025

Serverless Container Framework - Deploy Containers to Lambda and Fargate with Ease

Deploy containerised applications to AWS Lambda or Fargate with a simple YAML config. No infrastructure code required - just define your containers and deploy.

#containers #aws #lambda #fargate #docker #devops

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #lambda #rds #rds-proxy #databases #terraform #connection-pooling

#karpenter 2 posts

8 Dec 2025

Karpenter Deep Dive: Node Provisioning That Actually Works

Master Karpenter for Kubernetes node autoscaling. Replace Cluster Autoscaler with faster, smarter provisioning. Includes cost optimization patterns.

#kubernetes #autoscaling #aws #eks #cost-optimization

15 Feb 2025

Lessons From 5 Years of Kubernetes in Production – Cluster Crashes, Ditching Self-Managed, Cost Cuts, and the Tooling That Actually Works

Two major cluster crashes, migrating from kops to EKS, slashing compute costs with Karpenter, and the observability stack we rebuilt three times.

#kubernetes #eks #aws #production #devops #observability

#sso 2 posts

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #iam-identity-center #scps #platform-engineering #spacelift #security #devops

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #account-factory #terraform #service-catalog #organizations #platform-engineering #devops

#multi-account 2 posts

24 Feb 2026

AWS Control Tower Account Factory - The Gotchas Nobody Tells You

Real-world lessons from automating AWS account provisioning with Control Tower, Service Catalog, and Terraform. The silent failures, IAM traps, and StackSet timing issues that cost us days.

#aws #control-tower #terraform #service-catalog #iam #platform-engineering

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #spacelift #security #devops

#scps 2 posts

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #platform-engineering #spacelift #security #devops

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#aws #organizations #security #iam #governance #terraform

#spacelift 2 posts

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #sso #iam-identity-center #scps #platform-engineering #security #devops

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#terraform #opa #rego #iac #gitops #platform-engineering #devops #modules #policy-as-code

#act 2 posts

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#devops #kind #localstack #kubernetes #aws #development

8 Nov 2025

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

#github-actions #ci-cd #devops #testing #automation

#governance 2 posts

25 Oct 2025

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

#aws #terraform #tagging #finops #devops

10 May 2025

AWS Service Control Policies (SCPs) - Guardrails for Your Organization

How to use SCPs to set permission guardrails across your AWS Organization. Covers SCP evaluation logic, deny vs allow strategies, common patterns, and production-ready Terraform examples.

#aws #organizations #scps #security #iam #terraform

#bgp 2 posts

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#aws #eks #calico #cni #networking #ipip

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#aws #direct-connect #networking #hybrid-cloud #production #routing

#hybrid-cloud 2 posts

25 Oct 2025

Tailscale in Production: WireGuard Mesh for Hybrid Cloud

Deploy Tailscale for secure connectivity across clouds, offices, and Kubernetes clusters. Zero-config VPN mesh with SSO integration and ACLs.

#tailscale #wireguard #vpn #networking #zero-trust

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#bgp #aws #direct-connect #networking #production #routing

#post-mortems 2 posts

22 Nov 2025

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

#engineering-culture #incident-management #leadership #psychological-safety

15 May 2025

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

#incident-management #sre #on-call #devops

#idp 2 posts

3 Feb 2026

Platform Engineering in 2026 - It's About the Discipline, Not the Tools

Platform engineering has become the most misunderstood role in tech. Everyone's building 'platforms' but few understand what actually makes one successful. Here's what I've learned building platforms for teams of 10 to 500.

#platform-engineering #devops #developer-experience #internal-platforms

22 Jul 2024

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

#platform-engineering #backstage #developer-experience #devops

#github 2 posts

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #security #google-workspace #notion #integrations #oauth #tutorial

15 Jun 2019

Solving the AWS OIDC Chicken-and-Egg Problem with GitHub Actions

#aws #oidc

#service-mesh 2 posts

21 Oct 2025

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

#cilium #ebpf #kubernetes #mtls #networking

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#kubernetes #istio #linkerd #cilium #networking #devops

#hetzner 2 posts

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#clawdbot #terraform #vps #security #devops #automation

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#clawdbot #vps #whatsapp #devops #security #tutorial

#vps 2 posts

28 Jan 2026

Running Clawdbot 24/7 on a Hetzner VPS – Terraform, Security Hardening, and the Bits the Docs Miss

A production-grade setup for Clawdbot on Hetzner Cloud with Terraform provisioning, proper SSH hardening, fail2ban, UFW, unattended-upgrades, and optional Tailscale – the stuff you actually need in prod.

#clawdbot #hetzner #terraform #security #devops #automation

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#clawdbot #hetzner #whatsapp #devops #security #tutorial

#tutorial 2 posts

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#clawdbot #hetzner #vps #whatsapp #devops #security

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #security #google-workspace #github #notion #integrations #oauth

#saas 2 posts

28 Jan 2026

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

#elasticsearch #elastic-cloud #observability #logging #managed-services

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #cloud-costs #kubernetes #eks #multi-tenant #unit-economics #aws

#sigstore 2 posts

12 Oct 2025

Container Image Signing with Cosign - A Practical Guide

Sign and verify container images without managing keys. A hands-on guide to Cosign, keyless signing, and enforcing signatures in Kubernetes.

#security #cosign #containers #kubernetes #devops

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#security #supply-chain #slsa #sbom #kubernetes #devops

#contracting 2 posts

5 Jan 2026

That Time I Gave Away £50k Worth of Consulting for Free (And What It Taught Me About the Industry)

On interview take-home tests that are suspiciously specific, contractors who get ghosted after detailed proposals, and learning to play the game without becoming bitter about it.

#career #consulting #interviews #tech-industry #lessons-learned

18 Nov 2025

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

#career #salary #advice #engineering-culture

#salary 2 posts

4 Feb 2026

10 Rules for Negotiating Your Job Offer (From 7 Years of Engineering)

Most engineers massively undervalue themselves because no one taught them how to negotiate. Here's everything I've learned from negotiating salaries, contracts, titles, and more.

#career #negotiation #engineering-culture #advice

18 Nov 2025

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

#career #contracting #advice #engineering-culture

#crossplane 2 posts

6 Nov 2025

Crossplane Compositions: Build Your Own Cloud API

Create custom cloud APIs with Crossplane Compositions. Abstract away complexity and give developers self-service infrastructure with guardrails.

#kubernetes #platform-engineering #infrastructure #gitops

15 Apr 2021

Crossplane and Localstack

Crossplane + LocalStack on kind: 100 % Local AWS Infrastructure-as-Code

#localstack

#storage 2 posts

8 Oct 2025

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

#kubernetes #databases #postgresql #stateful #operators

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#aws #ebs #ec2 #terraform #disk #devops

#kafka 2 posts

15 Jan 2023

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

#k8s #strimzi #operator #kind #terraform

15 Jul 2022

Pulsar vs Kafka in K8s: Battle of Event Streams

Pulsar vs Kafka

#pulsar

#interviews 2 posts

5 Jan 2026

That Time I Gave Away £50k Worth of Consulting for Free (And What It Taught Me About the Industry)

On interview take-home tests that are suspiciously specific, contractors who get ghosted after detailed proposals, and learning to play the game without becoming bitter about it.

#career #consulting #contracting #tech-industry #lessons-learned

15 May 2025

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

#devops #career #kubernetes #aws #terraform #sre

#engineering 2 posts

10 Aug 2025

FinOps for Engineering Teams - Making Cost Everyone's Problem

Cloud cost management isn't just for finance. Here's how engineering teams can build cost awareness into their workflow without slowing down delivery.

#finops #cloud #aws #cost-optimization #devops

15 Feb 2020

The Ultimate Pathway to DevOps Revamped

A practical roadmap into DevOps for engineers starting out — what to learn, in what order, and where the genuine value is vs the hype.

#devops #roadmap #aws #platform

#alerting 2 posts

30 Nov 2025

SLO-Based Alerting: Burn Rate Alerts vs Threshold Alerts

Implement SLO-based alerting with burn rate alerts. Move from noisy threshold alerts to meaningful reliability signals using error budgets.

#slo #sre #prometheus #observability #reliability

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#dynatrace #ansible #monitoring #automation #observability #iac

#falco 2 posts

7 Oct 2025

eBPF for Security: Kernel-Level Observability Without Agents

Deep dive into eBPF-based security tools - Cilium, Falco, and Tetragon. Learn how to implement runtime security, network policies, and threat detection at the kernel level.

#ebpf #security #cilium #tetragon #kubernetes

15 Mar 2021

Falco on K8s (Kind)

Falco Kubernetes Lab: Runtime Threat Detection with Prometheus & Grafana

#kind #prometheus #grafana

#architecture 2 posts

5 Apr 2023

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

#kubernetes #startups #infrastructure #devops #hot-takes

15 Nov 2022

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

#ec2 #networking #eni #ip #deployment

#java 2 posts

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#jvm #ec2 #debugging #performance #memory #threads #linux #devops

15 Dec 2025

Migrating a Java Application from EC2 to ECS Fargate: A Step-by-Step Guide

The complete journey of containerising a Java JAR running on EC2 and deploying it to ECS Fargate – from local testing to Dockerfile, task definitions, networking, secrets management, and achieving production parity.

#ecs #fargate #docker #aws #containers #migration #terraform #devops

#blue-green 2 posts

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #aws #deployments #task-sets #fargate #terraform #devops

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #canary #deployments #aws #sigv4 #devops #ci-cd

#dynamodb 2 posts

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#sql-server #oracle #migration #aws #dms #terraform #event-sourcing #platform-engineering #devops

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#terraform #aws #s3 #infrastructure #devops #state-management

#state-management 2 posts

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#terraform #iac #migration #aws #s3 #hcl2

15 Mar 2019

The Terraform State Chicken-and-Egg Problem – And Why Bootstrapping Is Just Physics

You can't use Terraform to create the S3 bucket that stores Terraform state. Here's how to bootstrap your remote backend properly, plus the philosophical reason this pattern exists everywhere in software.

#terraform #aws #s3 #dynamodb #infrastructure #devops

#twingate 2 posts

15 Oct 2022

EKS Private Network with Twingate

How to setup a private network for your EKS cluster with Twingate

#kubernetes #eks #private #network

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#azure #aks #kubernetes #vpn #private-cluster #networking

#private 2 posts

15 Oct 2022

EKS Private Network with Twingate

How to setup a private network for your EKS cluster with Twingate

#kubernetes #eks #twingate #network

15 Sept 2022

Using GKE DNS-based endpoints for Secure cluster access

Use GKE's DNS-based control plane endpoint to reach a private cluster without bastions or VPNs. IAM-gated kubectl access via Cloud DNS, fully private.

#k8s #gke #dns #cluster #access

#calico 2 posts

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#aws #eks #cni #networking #bgp #ipip

15 Apr 2020

Kubernetes Networking: A Deep Dive From First Principles

How packets actually flow in Kubernetes – from veth pairs to CNI plugins to kube-proxy modes. With AWS/EKS context throughout.

#kubernetes #networking #eks #aws #cni #cilium

#logging 2 posts

28 Jan 2026

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

#elasticsearch #elastic-cloud #observability #saas #managed-services

18 Mar 2025

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

#opentelemetry #observability #tracing #metrics #kubernetes

#prometheus 2 posts

30 Nov 2025

SLO-Based Alerting: Burn Rate Alerts vs Threshold Alerts

Implement SLO-based alerting with burn rate alerts. Move from noisy threshold alerts to meaningful reliability signals using error budgets.

#slo #sre #alerting #observability #reliability

15 Mar 2021

Falco on K8s (Kind)

Falco Kubernetes Lab: Runtime Threat Detection with Prometheus & Grafana

#falco #kind #grafana

#development 2 posts

5 Dec 2025

The Fast Feedback Loop - Local Development with Kind, LocalStack, and Act

Combine Kind, LocalStack, and Act for a complete local development environment. Test Kubernetes, AWS services, and CI pipelines without leaving your laptop.

#devops #kind #localstack #act #kubernetes #aws

20 Nov 2025

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

#localstack #aws #testing #devops #docker

#cloud 2 posts

10 Aug 2025

FinOps for Engineering Teams - Making Cost Everyone's Problem

Cloud cost management isn't just for finance. Here's how engineering teams can build cost awareness into their workflow without slowing down delivery.

#finops #aws #cost-optimization #devops #engineering

15 Dec 2024

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

#kubernetes #cost-optimization #resource-management #devops #finops

#ingress 2 posts

29 Oct 2025

Gateway API Advanced Patterns: Beyond Basic Ingress

Master Gateway API with traffic splitting, header-based routing, cross-namespace references, and TLS passthrough. The future of Kubernetes ingress.

#gateway-api #kubernetes #networking #traffic-management

15 Mar 2025

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

#kubernetes #gateway-api #networking #traffic-management #k8s

#traffic-management 2 posts

29 Oct 2025

Gateway API Advanced Patterns: Beyond Basic Ingress

Master Gateway API with traffic splitting, header-based routing, cross-namespace references, and TLS passthrough. The future of Kubernetes ingress.

#gateway-api #kubernetes #ingress #networking

15 Mar 2025

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

#kubernetes #gateway-api #ingress #networking #k8s

#cluster-management 2 posts

15 May 2025

Kubernetes Cluster Upgrades: Production-Ready Guide

Technical guide for upgrading managed Kubernetes clusters across GKE, EKS, and AKS

#kubernetes #gke #eks #aks #devops

15 Apr 2025

GKE Upgrade Guide and Rollback Strategy: A Production-Ready Approach

Comprehensive guide for safely upgrading GKE clusters with minimal downtime and robust rollback procedures

#kubernetes #gke #google-cloud #devops

#k3s 2 posts

7 Mar 2026

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

How I built a fully GitOps-managed Kubernetes homelab on a single mini PC - from unboxing to production. Proxmox bare metal install, K3s cluster, HashiCorp Vault secrets, full observability, and Cloudflare Tunnel.

#kubernetes #homelab #gitops #fluxcd #hashicorp-vault

15 Sept 2025

K3s Homelab Setup Guide - Running Kubernetes on Raspberry Pi 5

Build a lightweight Kubernetes cluster on three Raspberry Pi 5 devices. Step-by-step guide covering K3s installation, cluster configuration, and deployment testing.

#kubernetes #raspberry-pi #homelab #devops #containers

#coredns 2 posts

22 Jun 2025

The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

A deep dive into why external DNS resolution in Kubernetes can be painfully slow, how the default ndots:5 setting causes unnecessary lookups, and practical fixes that actually work.

#kubernetes #dns #networking #performance #debugging

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#kubernetes #dns #security #arp #net_raw #mitm

#pods 2 posts

18 Dec 2025

Pod Topology Spread Constraints - Distributing Workloads Intelligently

Control how pods spread across nodes, zones, and regions. A deep dive into topology spread constraints for high availability and efficient resource utilization.

#kubernetes #scheduling #high-availability #devops

19 Jun 2025

Kubernetes Sidecar Startup Order - Making Your Main App Wait

How to ensure sidecar containers are ready before your main app starts. Covers startupProbe, postStart hooks, and why readinessProbe doesn't do what you think.

#kubernetes #sidecars #containers #devops

#gatekeeper 2 posts

10 Nov 2025

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

#kyverno #opa #kubernetes #policy #security

12 Oct 2025

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

#opa #kubernetes #policy-as-code #security #admission-control

#documentation 2 posts

8 Jul 2025

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

#engineering-culture #leadership #career #technical-writing

3 Apr 2025

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

#meetings #productivity #engineering-culture #remote-work

#nat 2 posts

15 Jul 2025

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

#aws #gateway #instance #cost #savings

22 Jun 2025

NAT Gateway Alternatives - Cutting Your AWS Bill Without Losing Sleep

NAT Gateways are the silent budget killer in AWS. Here's how to reduce costs with NAT instances, VPC endpoints, IPv6, and architectural changes - with real numbers and trade-offs.

#aws #networking #cost-optimization #vpc #finops

#messaging 2 posts

24 Dec 2025

NATS JetStream: Lightweight Alternative to Kafka

Deploy NATS JetStream for messaging and streaming. Simpler than Kafka, faster than RabbitMQ, with persistence and exactly-once delivery.

#nats #jetstream #streaming #kubernetes #microservices

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#apache-pulsar #kubernetes #kind #helm #pubsub #devtools

#policy-as-code 2 posts

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #rego #iac #gitops #platform-engineering #devops #modules

12 Oct 2025

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

#opa #gatekeeper #kubernetes #security #admission-control

#packer 2 posts

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#ami #aws #terraform #ci-cd #devops #immutable-infrastructure #security

15 Jan 2020

Deploying Vault with a Custom AMI

An end-to-end guide for baking a Vault AMI using Packer and deploying a Vault EC2 instance on AWS.

#vault #aws #ami #devops

#ami 2 posts

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #aws #terraform #ci-cd #devops #immutable-infrastructure #security

15 Jan 2020

Deploying Vault with a Custom AMI

An end-to-end guide for baking a Vault AMI using Packer and deploying a Vault EC2 instance on AWS.

#vault #aws #packer #devops

#principal-engineer 2 posts

10 Dec 2025

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

#career #engineering-culture #leadership #advice

5 Dec 2025

The Principal Engineer Trap

The IC ladder looks appealing until you're at the top. Many senior engineers chase Principal titles without understanding what they're signing up for. Here's what nobody tells you.

#career #engineering-culture #leadership

#api-gateway 2 posts

15 Jun 2023

Private API Gateway - Part 2: Secure Cross-VPC Access with PrivateLink and IAM Authentication

Extend your private API Gateway with secure access from other VPCs using PrivateLink and enforce IAM-based authentication.

#aws #vpc #privatelink #security #iam

15 Jun 2020

Securing APIs in AWS: Private API Gateway + VPC Endpoint Deep Dive

Learn how to deploy a secure, private-only API Gateway inside your VPC using interface endpoints, resource policies, and VPC integration.

#aws #vpc #networking #security #terraform

#azure 2 posts

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #kubernetes #aks #helm #devops #self-hosted #startup

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#aks #kubernetes #vpn #twingate #private-cluster #networking

#vpn 2 posts

25 Oct 2025

Tailscale in Production: WireGuard Mesh for Hybrid Cloud

Deploy Tailscale for secure connectivity across clouds, offices, and Kubernetes clusters. Zero-config VPN mesh with SSO integration and ACLs.

#tailscale #wireguard #networking #hybrid-cloud #zero-trust

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#azure #aks #kubernetes #twingate #private-cluster #networking

#canary 2 posts

4 Dec 2025

Progressive Delivery with Flagger: Automated Canary Deployments

Implement automated canary deployments with Flagger. Metrics-based promotion, automated rollback, and integration with Istio, Linkerd, and Gateway API.

#flagger #progressive-delivery #kubernetes #gitops #deployment

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #blue-green #deployments #aws #sigv4 #devops #ci-cd

#spiffe 2 posts

3 Oct 2025

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

#spire #zero-trust #security #kubernetes #mtls

15 May 2022

SPIFFE and SPIRE in Kubernetes

Secure Your Kubernetes with SPIFFE + SPIRE: Zero-Trust Identity for Workloads

#kubernetes #spire

#spire 2 posts

3 Oct 2025

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

#spiffe #zero-trust #security #kubernetes #mtls

15 May 2022

SPIFFE and SPIRE in Kubernetes

Secure Your Kubernetes with SPIFFE + SPIRE: Zero-Trust Identity for Workloads

#kubernetes #spiffe

#startups 2 posts

2 Dec 2025

Startup vs Scale-Up vs Enterprise: Where You'll Actually Learn the Most

After working across all three - tiny startups, hypergrowth scale-ups, and massive enterprises - I can tell you they're completely different jobs. Same title, same tech, completely different experience. Here's what each teaches you.

#career #engineering-culture #advice #leadership

5 Apr 2023

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

#kubernetes #architecture #infrastructure #devops #hot-takes

#teams 1 post

15 Jun 2025

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

#engineering-culture #leadership #career #productivity

#account-factory 1 post

15 Nov 2025

AWS Account Provisioning at Scale with Control Tower, Service Catalog, and Terraform

How to build an automated account vending machine using AWS Control Tower Account Factory, Service Catalog, CloudFormation StackSets, and Terraform – from request to fully provisioned account with SSO and IAM roles.

#aws #control-tower #terraform #service-catalog #organizations #sso #platform-engineering #devops

#ack 1 post

15 May 2021

AWS Controllers for Kubernetes

Manage AWS resources from Kubernetes manifests using AWS Controllers for Kubernetes (ACK). End-to-end demo on kind covering setup, RDS provisioning and the trade-offs vs Terraform.

#kubernetes #aws

#prefix-lists 1 post

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#aws #terraform #security #networking #security-groups #vpc

#security-groups 1 post

15 Sept 2024

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

#aws #terraform #security #networking #prefix-lists #vpc

#mlops 1 post

10 Jan 2026

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

#devops #kubernetes #machine-learning #platform-engineering #infrastructure

#machine-learning 1 post

10 Jan 2026

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

#mlops #devops #kubernetes #platform-engineering #infrastructure

#aws-config 1 post

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws #compliance #security #automation #ssm #terraform

#compliance 1 post

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws #aws-config #security #automation #ssm #terraform

#ssm 1 post

20 Apr 2025

AWS Config Rules with Auto Remediation - Enforce Compliance Automatically

How to use AWS Config Rules to detect compliance violations and automatically remediate them using SSM Automation documents. Covers managed rules, custom rules, remediation actions, and complete Terraform examples.

#aws #aws-config #compliance #security #automation #terraform

#iam-identity-center 1 post

14 Feb 2026

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

A hands-on walkthrough of enabling AWS Control Tower, designing an OU structure, automating account provisioning via Service Catalog, and deploying security baselines - from zero to fully automated account vending in production.

#aws #control-tower #terraform #multi-account #organizations #service-catalog #sso #scps #platform-engineering #spacelift #security #devops

#endpoints 1 post

5 Jun 2025

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

#aws #vpc #privatelink #networking #security #terraform

#cognito 1 post

1 Oct 2025

Backstage on AWS ECS - Production-Ready Deployment with RDS and Cognito

A comprehensive guide to deploying Spotify's Backstage developer portal on AWS ECS Fargate with PostgreSQL RDS, Cognito authentication, and proper production hardening.

#backstage #aws #ecs #rds #terraform #docker #devops #platform-engineering

#developer-portal 1 post

14 Nov 2025

Backstage Plugins: Building Custom Developer Portal Features

Build custom Backstage plugins for your internal developer portal. Create frontend components, backend APIs, and integrate with your existing tools.

#backstage #platform-engineering #react #typescript

#react 1 post

14 Nov 2025

Backstage Plugins: Building Custom Developer Portal Features

Build custom Backstage plugins for your internal developer portal. Create frontend components, backend APIs, and integrate with your existing tools.

#backstage #developer-portal #platform-engineering #typescript

#typescript 1 post

14 Nov 2025

Backstage Plugins: Building Custom Developer Portal Features

Build custom Backstage plugins for your internal developer portal. Create frontend components, backend APIs, and integrate with your existing tools.

#backstage #developer-portal #platform-engineering #react

#direct-connect 1 post

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#bgp #aws #networking #hybrid-cloud #production #routing

#routing 1 post

15 May 2019

BGP in the Cloud – A Deep Dive into AWS Direct Connect, Routing Pathologies, and What Breaks in Production

A production-focused deep dive into how BGP actually behaves over AWS Direct Connect – route selection, failover, ASN design, MEDs, prepending, blackholing scenarios, and the real-world issues teams hit at scale.

#bgp #aws #direct-connect #networking #hybrid-cloud #production

#psychological-safety 1 post

22 Nov 2025

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

#engineering-culture #post-mortems #incident-management #leadership

#cdn 1 post

15 Jul 2023

How we migrated our CDN to AWS CloudFront at Trainline

Migrating the Trainline CDN to AWS CloudFront — traffic shaping with Lambda@Edge, the cache-hit ratio we landed on, and the production gotchas behind the cutover.

#aws #cloudfront #trainline

#cloudfront 1 post

15 Jul 2023

How we migrated our CDN to AWS CloudFront at Trainline

Migrating the Trainline CDN to AWS CloudFront — traffic shaping with Lambda@Edge, the cache-hit ratio we landed on, and the production gotchas behind the cutover.

#cdn #aws #trainline

#trainline 1 post

15 Jul 2023

How we migrated our CDN to AWS CloudFront at Trainline

Migrating the Trainline CDN to AWS CloudFront — traffic shaping with Lambda@Edge, the cache-hit ratio we landed on, and the production gotchas behind the cutover.

#cdn #aws #cloudfront

#chaos-engineering 1 post

22 Nov 2025

Chaos Engineering with Litmus: Controlled Failure Injection

Implement chaos engineering in Kubernetes with LitmusChaos. Run pod failures, network chaos, and stress tests to validate system resilience.

#litmus #kubernetes #reliability #sre #testing

#litmus 1 post

22 Nov 2025

Chaos Engineering with Litmus: Controlled Failure Injection

Implement chaos engineering in Kubernetes with LitmusChaos. Run pod failures, network chaos, and stress tests to validate system resilience.

#chaos-engineering #kubernetes #reliability #sre #testing

#google-workspace 1 post

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #security #github #notion #integrations #oauth #tutorial

#notion 1 post

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #security #google-workspace #github #integrations #oauth #tutorial

#integrations 1 post

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #security #google-workspace #github #notion #oauth #tutorial

#oauth 1 post

14 Mar 2025

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

#clawdbot #security #google-workspace #github #notion #integrations #tutorial

#whatsapp 1 post

27 Jan 2026

Clawdbot Manual Setup – Step-by-Step VPS Configuration with WhatsApp Integration

A detailed walkthrough for setting up Clawdbot on a Hetzner VPS from scratch – SSH hardening, firewall configuration, Tailscale, and WhatsApp Business integration using a dedicated number.

#clawdbot #hetzner #vps #devops #security #tutorial

#clickhouse 1 post

9 Feb 2026

Migrating ClickHouse From EC2 to ClickHouse Cloud - Every Approach We Tried and Why Most Failed

S3 backup/restore, direct connectivity, Parquet exports - none of them worked cleanly. Here's the full war story of migrating a production ClickHouse instance to Cloud, the version mismatch that broke everything, and the dumb-simple approach that actually got the job done.

#aws #migration #database #devops #production

#tagging 1 post

25 Oct 2025

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

#aws #terraform #finops #governance #devops

#cloud-costs 1 post

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #kubernetes #eks #multi-tenant #saas #unit-economics #aws

#multi-tenant 1 post

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #cloud-costs #kubernetes #eks #saas #unit-economics #aws

#unit-economics 1 post

20 Jan 2026

Cloud Unit Economics for Multi-Tenant SaaS - Cost Per Customer, Not Per Service

How to calculate true cost-per-tenant in a shared infrastructure environment. Covers EKS with Karpenter, shared databases (Aurora, DynamoDB), and tools like OpenCost, CloudZero, and custom attribution approaches.

#finops #cloud-costs #kubernetes #eks #multi-tenant #saas #aws

#namespaces 1 post

15 Mar 2023

Container Networking Deep Dive Part 1: Single Network Namespace on a VM

In the first part of our Container Networking Deep Dive, we explore how to set up a single network namespace inside a VM and connect it to the host using a veth pair.

#linux #networking #containers #devops

#netns 1 post

15 Feb 2023

Container Networking Deep Dive Part 2: Two Namespaces on the Same Host

In the second part of our Container Networking Deep Dive, we connect two network namespaces via a bridge on the same Linux host.

#linux #networking #containers #bridge

#bridge 1 post

15 Feb 2023

Container Networking Deep Dive Part 2: Two Namespaces on the Same Host

In the second part of our Container Networking Deep Dive, we connect two network namespaces via a bridge on the same Linux host.

#linux #networking #netns #containers

#cosign 1 post

12 Oct 2025

Container Image Signing with Cosign - A Practical Guide

Sign and verify container images without managing keys. A hands-on guide to Cosign, keyless signing, and enforcing signatures in Kubernetes.

#security #containers #sigstore #kubernetes #devops

#backup 1 post

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #postgresql #s3 #cronjob #localstack #devops #databases

#cronjob 1 post

28 Sept 2025

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

#kubernetes #postgresql #s3 #backup #localstack #devops #databases

#stateful 1 post

8 Oct 2025

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

#kubernetes #databases #postgresql #operators #storage

#operators 1 post

8 Oct 2025

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

#kubernetes #databases #postgresql #stateful #storage

#strimzi 1 post

15 Jan 2023

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

#k8s #kafka #operator #kind #terraform

#operator 1 post

15 Jan 2023

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

#k8s #kafka #strimzi #kind #terraform

#roadmap 1 post

15 Feb 2020

The Ultimate Pathway to DevOps Revamped

A practical roadmap into DevOps for engineers starting out — what to learn, in what order, and where the genuine value is vs the hype.

#devops #aws #platform #engineering

#platform 1 post

15 Feb 2020

The Ultimate Pathway to DevOps Revamped

A practical roadmap into DevOps for engineers starting out — what to learn, in what order, and where the genuine value is vs the hype.

#devops #roadmap #aws #engineering

#udp 1 post

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #ecs #cloudmap #traefik #service-discovery #aws #networking #devops

#cloudmap 1 post

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #ecs #traefik #service-discovery #aws #networking #devops

#service-discovery 1 post

15 Jan 2024

DNS UDP Truncation: Why Your ECS Tasks Aren't Getting Traffic

How DNS UDP's 512-byte limit caps responses at ~8 A records, breaking service discovery for scaled ECS/CloudMap workloads – and the sidecar solution to bypass it.

#dns #udp #ecs #cloudmap #traefik #aws #networking #devops

#dora 1 post

15 Jan 2026

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

#devops #metrics #engineering-culture #cicd #platform-engineering

#dragonfly 1 post

31 Dec 2025

Dragonfly vs Redis: Modern In-Memory Store Comparison

Compare Dragonfly and Redis for caching and data storage. Dragonfly's multi-threaded architecture vs Redis single-threaded model.

#redis #caching #database #kubernetes #performance

#redis 1 post

31 Dec 2025

Dragonfly vs Redis: Modern In-Memory Store Comparison

Compare Dragonfly and Redis for caching and data storage. Dragonfly's multi-threaded architecture vs Redis single-threaded model.

#dragonfly #caching #database #kubernetes #performance

#caching 1 post

31 Dec 2025

Dragonfly vs Redis: Modern In-Memory Store Comparison

Compare Dragonfly and Redis for caching and data storage. Dragonfly's multi-threaded architecture vs Redis single-threaded model.

#dragonfly #redis #database #kubernetes #performance

#dynatrace 1 post

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#ansible #monitoring #alerting #automation #observability #iac

#ansible 1 post

15 Sept 2022

Managing Dynatrace Alerts at Scale with Custom Ansible Roles

How we automated Dynatrace alerting configuration using custom Ansible roles - covering alert profiles, problem notifications, metric events, and maintenance windows across multiple environments.

#dynatrace #monitoring #alerting #automation #observability #iac

#kernel 1 post

10 Feb 2025

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

#ebpf #linux #networking #security #observability

#tetragon 1 post

7 Oct 2025

eBPF for Security: Kernel-Level Observability Without Agents

Deep dive into eBPF-based security tools - Cilium, Falco, and Tetragon. Learn how to implement runtime security, network policies, and threat detection at the kernel level.

#ebpf #security #cilium #falco #kubernetes

#eni 1 post

15 Nov 2022

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

#ec2 #networking #ip #deployment #architecture

#ip 1 post

15 Nov 2022

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

#ec2 #networking #eni #deployment #architecture

#task-sets 1 post

15 Mar 2025

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

#ecs #aws #blue-green #deployments #fargate #terraform #devops

#network 1 post

15 Oct 2022

EKS Private Network with Twingate

How to setup a private network for your EKS cluster with Twingate

#kubernetes #eks #twingate #private

#ip-exhaustion 1 post

15 Jun 2025

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

#aws #eks #networking #cni #prefix-delegation

#prefix-delegation 1 post

15 Jun 2025

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

#aws #eks #networking #cni #ip-exhaustion

#ipip 1 post

15 Apr 2022

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

#aws #eks #calico #cni #networking #bgp

#elastic-cloud 1 post

28 Jan 2026

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

#elasticsearch #observability #logging #saas #managed-services

#managed-services 1 post

28 Jan 2026

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

#elasticsearch #elastic-cloud #observability #logging #saas

#elk 1 post

3 Feb 2026

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

#elasticsearch #kibana #logstash #migration #observability

#kibana 1 post

3 Feb 2026

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

#elasticsearch #elk #logstash #migration #observability

#logstash 1 post

3 Feb 2026

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

#elasticsearch #elk #kibana #migration #observability

#etl 1 post

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#python #airflow #postgresql #docker #data-engineering #devops

#python 1 post

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#etl #airflow #postgresql #docker #data-engineering #devops

#airflow 1 post

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#etl #python #postgresql #docker #data-engineering #devops

#data-engineering 1 post

25 Sept 2025

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

#etl #python #airflow #postgresql #docker #devops

#external-secrets 1 post

15 Jul 2025

External Secrets Operator with AWS Secrets Manager - Stop Mounting Secrets in ConfigMaps

How to use External Secrets Operator to sync AWS Secrets Manager secrets to Kubernetes. Covers SecretStore, ExternalSecret, IAM with IRSA, templating, and production patterns.

#kubernetes #aws #secrets-manager #security #gitops

#secrets-manager 1 post

15 Jul 2025

External Secrets Operator with AWS Secrets Manager - Stop Mounting Secrets in ConfigMaps

How to use External Secrets Operator to sync AWS Secrets Manager secrets to Kubernetes. Covers SecretStore, ExternalSecret, IAM with IRSA, templating, and production patterns.

#kubernetes #external-secrets #aws #security #gitops

#grafana 1 post

15 Mar 2021

Falco on K8s (Kind)

Falco Kubernetes Lab: Runtime Threat Detection with Prometheus & Grafana

#falco #kind #prometheus

#firecracker 1 post

15 Jul 2019

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

#aws #ecs #fargate #containers #devops

#kubecost 1 post

16 Dec 2025

FinOps Automation: Kubecost, OpenCost, and Automated Rightsizing

Implement automated cloud cost optimization with Kubecost and OpenCost. Track costs per team, rightsize resources, and automate savings.

#finops #opencost #kubernetes #cost-optimization #observability

#opencost 1 post

16 Dec 2025

FinOps Automation: Kubecost, OpenCost, and Automated Rightsizing

Implement automated cloud cost optimization with Kubecost and OpenCost. Track costs per team, rightsize resources, and automate savings.

#finops #kubecost #kubernetes #cost-optimization #observability

#consulting 1 post

5 Jan 2026

That Time I Gave Away £50k Worth of Consulting for Free (And What It Taught Me About the Industry)

On interview take-home tests that are suspiciously specific, contractors who get ghosted after detailed proposals, and learning to play the game without becoming bitter about it.

#career #interviews #contracting #tech-industry #lessons-learned

#tech-industry 1 post

5 Jan 2026

That Time I Gave Away £50k Worth of Consulting for Free (And What It Taught Me About the Industry)

On interview take-home tests that are suspiciously specific, contractors who get ghosted after detailed proposals, and learning to play the game without becoming bitter about it.

#career #consulting #interviews #contracting #lessons-learned

#lessons-learned 1 post

5 Jan 2026

That Time I Gave Away £50k Worth of Consulting for Free (And What It Taught Me About the Industry)

On interview take-home tests that are suspiciously specific, contractors who get ghosted after detailed proposals, and learning to play the game without becoming bitter about it.

#career #consulting #interviews #contracting #tech-industry

#argocd 1 post

18 Mar 2024

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

#gitops #kubernetes #cicd #deployment #automation

#cluster 1 post

15 Sept 2022

Using GKE DNS-based endpoints for Secure cluster access

Use GKE's DNS-based control plane endpoint to reach a private cluster without bastions or VPNs. IAM-gated kubectl access via Cloud DNS, fully private.

#k8s #gke #dns #private #access

#access 1 post

15 Sept 2022

Using GKE DNS-based endpoints for Secure cluster access

Use GKE's DNS-based control plane endpoint to reach a private cluster without bastions or VPNs. IAM-gated kubectl access via Cloud DNS, fully private.

#k8s #gke #dns #private #cluster

#google-cloud 1 post

15 Apr 2025

GKE Upgrade Guide and Rollback Strategy: A Production-Ready Approach

Comprehensive guide for safely upgrading GKE clusters with minimal downtime and robust rollback procedures

#kubernetes #gke #devops #cluster-management

#workload-identity 1 post

15 Feb 2021

Zero to Production: GitHub Actions CI/CD into GKE with Workload Identity

In this deep dive, we set up a secure, production-ready CI/CD pipeline from GitHub Actions to GKE using Workload Identity Federation—no secrets needed.

#gke #github-actions #ci-cd #oidc

#rollback 1 post

15 Apr 2019

Helm Atomics: The Flag That Saves Your Production Deploys (And Its Hidden Gotchas)

Deep dive into Helm's --atomic, --wait, and --cleanup-on-fail flags. How they work, when to use them, the CI/CD pipeline trap that catches everyone, and production-ready deployment patterns.

#helm #kubernetes #devops #cicd #deployments

#fluxcd 1 post

7 Mar 2026

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

How I built a fully GitOps-managed Kubernetes homelab on a single mini PC - from unboxing to production. Proxmox bare metal install, K3s cluster, HashiCorp Vault secrets, full observability, and Cloudflare Tunnel.

#kubernetes #k3s #homelab #gitops #hashicorp-vault

#hashicorp-vault 1 post

7 Mar 2026

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

How I built a fully GitOps-managed Kubernetes homelab on a single mini PC - from unboxing to production. Proxmox bare metal install, K3s cluster, HashiCorp Vault secrets, full observability, and Cloudflare Tunnel.

#kubernetes #k3s #homelab #gitops #fluxcd

#identity-aware-proxy 1 post

6 Feb 2026

Identity Aware Proxy: Zero Trust Access for Internal Applications

Deep dive into Identity Aware Proxies - what they are, how they work, and how to implement them with GCP IAP, Pomerium, and OAuth2-Proxy. Includes Terraform and Kubernetes examples.

#zero-trust #security #kubernetes #terraform #oauth2

#oauth2 1 post

6 Feb 2026

Identity Aware Proxy: Zero Trust Access for Internal Applications

Deep dive into Identity Aware Proxies - what they are, how they work, and how to implement them with GCP IAP, Pomerium, and OAuth2-Proxy. Includes Terraform and Kubernetes examples.

#identity-aware-proxy #zero-trust #security #kubernetes #terraform

#ebs 1 post

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#aws #ec2 #terraform #disk #storage #devops

#disk 1 post

15 Oct 2021

How to Increase EBS Disk Size on EC2 (Without Downtime)

Online EBS volume resizing for running instances – the IaC way with Terraform and ASG instance refresh, plus the manual escape hatch when you need it now. No reboot required.

#aws #ebs #ec2 #terraform #storage #devops

#port 1 post

18 Nov 2025

Port and Kratix: Internal Developer Platforms Beyond Backstage

Explore Port and Kratix for building internal developer platforms. Self-service infrastructure, developer workflows, and platform engineering patterns.

#platform-engineering #kratix #developer-experience #self-service

#kratix 1 post

18 Nov 2025

Port and Kratix: Internal Developer Platforms Beyond Backstage

Explore Port and Kratix for building internal developer platforms. Self-service infrastructure, developer workflows, and platform engineering patterns.

#platform-engineering #port #developer-experience #self-service

#self-service 1 post

18 Nov 2025

Port and Kratix: Internal Developer Platforms Beyond Backstage

Explore Port and Kratix for building internal developer platforms. Self-service infrastructure, developer workflows, and platform engineering patterns.

#platform-engineering #port #kratix #developer-experience

#jenkins 1 post

15 Oct 2025

Migrating 30 Repos from Jenkins to GitHub Actions – The Complete Runbook

A battle-tested playbook for migrating CI/CD pipelines from Jenkins to GitHub Actions at scale. Covers OIDC authentication, parallel running, secrets migration, and the gotchas that will bite you.

#github-actions #cicd #devops #migration #aws #oidc

#jvm 1 post

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #ec2 #debugging #performance #memory #threads #linux #devops

#memory 1 post

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #ec2 #debugging #performance #threads #linux #devops

#threads 1 post

10 Jan 2026

Debugging JVM Thread Exhaustion on EC2: A Contractor War Story

How I diagnosed and fixed a Java application that kept crashing under load – from 'cannot create native thread' errors to properly tuned JVM settings, system limits, and right-sized EC2 instances.

#java #jvm #ec2 #debugging #performance #memory #linux #devops

#raspberry-pi 1 post

15 Sept 2025

K3s Homelab Setup Guide - Running Kubernetes on Raspberry Pi 5

Build a lightweight Kubernetes cluster on three Raspberry Pi 5 devices. Step-by-step guide covering K3s installation, cluster configuration, and deployment testing.

#kubernetes #k3s #homelab #devops #containers

#socks5 1 post

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#kubernetes #postgresql #database #kubectl #devops #pg_dump

#pg_dump 1 post

21 Jan 2025

Working with Databases in Kubernetes: Connections, Dumps and Data Extraction

A practical guide to connecting to PostgreSQL databases in Kubernetes – exec into pods, VPN access, SOCKS5 proxies, pg_dump, kubectl cp and getting data out when you need it.

#kubernetes #postgresql #database #kubectl #devops #socks5

#arp 1 post

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#kubernetes #dns #security #coredns #net_raw #mitm

#net_raw 1 post

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#kubernetes #dns #security #coredns #arp #mitm

#mitm 1 post

15 Mar 2022

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

#kubernetes #dns #security #coredns #arp #net_raw

#networkpolicy 1 post

8 Sept 2025

NetworkPolicy Default Deny – The One Rule We Add to Every Namespace

Why your Kubernetes cluster is wide open by default, and the single NetworkPolicy that changes everything. Copy, paste, deploy, sleep better.

#kubernetes #security #networking #zero-trust

#sidecars 1 post

19 Jun 2025

Kubernetes Sidecar Startup Order - Making Your Main App Wait

How to ensure sidecar containers are ready before your main app starts. Covers startupProbe, postStart hooks, and why readinessProbe doesn't do what you think.

#kubernetes #pods #containers #devops

#kyverno 1 post

10 Nov 2025

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

#opa #gatekeeper #kubernetes #policy #security

#policy 1 post

10 Nov 2025

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

#kyverno #opa #gatekeeper #kubernetes #security

#lab 1 post

15 Feb 2019

Creating a Lab Container

An end-to-end guide for creating a lab container for DevOps training.

#devops #docker #container

#container 1 post

15 Feb 2019

Creating a Lab Container

An end-to-end guide for creating a lab container for DevOps training.

#devops #docker #lab

#meetings 1 post

3 Apr 2025

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

#productivity #engineering-culture #remote-work #documentation

#gateway 1 post

15 Jul 2025

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

#aws #nat #instance #cost #savings

#instance 1 post

15 Jul 2025

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

#aws #nat #gateway #cost #savings

#cost 1 post

15 Jul 2025

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

#aws #nat #gateway #instance #savings

#savings 1 post

15 Jul 2025

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

#aws #nat #gateway #instance #cost

#nats 1 post

24 Dec 2025

NATS JetStream: Lightweight Alternative to Kafka

Deploy NATS JetStream for messaging and streaming. Simpler than Kafka, faster than RabbitMQ, with persistence and exactly-once delivery.

#jetstream #messaging #streaming #kubernetes #microservices

#jetstream 1 post

24 Dec 2025

NATS JetStream: Lightweight Alternative to Kafka

Deploy NATS JetStream for messaging and streaming. Simpler than Kafka, faster than RabbitMQ, with persistence and exactly-once delivery.

#nats #messaging #streaming #kubernetes #microservices

#streaming 1 post

24 Dec 2025

NATS JetStream: Lightweight Alternative to Kafka

Deploy NATS JetStream for messaging and streaming. Simpler than Kafka, faster than RabbitMQ, with persistence and exactly-once delivery.

#nats #jetstream #messaging #kubernetes #microservices

#microservices 1 post

24 Dec 2025

NATS JetStream: Lightweight Alternative to Kafka

Deploy NATS JetStream for messaging and streaming. Simpler than Kafka, faster than RabbitMQ, with persistence and exactly-once delivery.

#nats #jetstream #messaging #streaming #kubernetes

#negotiation 1 post

4 Feb 2026

10 Rules for Negotiating Your Job Offer (From 7 Years of Engineering)

Most engineers massively undervalue themselves because no one taught them how to negotiate. Here's everything I've learned from negotiating salaries, contracts, titles, and more.

#career #salary #engineering-culture #advice

#tools 1 post

15 Jul 2020

Networking Tools

#networking

#nginx 1 post

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#incident #log-rotation #linux #on-call #devops #production #war-stories

#incident 1 post

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #log-rotation #linux #on-call #devops #production #war-stories

#log-rotation 1 post

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #incident #linux #on-call #devops #production #war-stories

#war-stories 1 post

15 Jan 2025

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

#nginx #incident #log-rotation #linux #on-call #devops #production

#admission-control 1 post

12 Oct 2025

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

#opa #gatekeeper #kubernetes #policy-as-code #security

#traces 1 post

26 Nov 2025

OpenTelemetry Collector Pipelines: Transform, Filter, Route Telemetry

Master OpenTelemetry Collector configuration. Build pipelines to transform metrics, filter traces, route logs, and reduce telemetry costs.

#opentelemetry #observability #metrics #logs #collector

#logs 1 post

26 Nov 2025

OpenTelemetry Collector Pipelines: Transform, Filter, Route Telemetry

Master OpenTelemetry Collector configuration. Build pipelines to transform metrics, filter traces, route logs, and reduce telemetry costs.

#opentelemetry #observability #metrics #traces #collector

#collector 1 post

26 Nov 2025

OpenTelemetry Collector Pipelines: Transform, Filter, Route Telemetry

Master OpenTelemetry Collector configuration. Build pipelines to transform metrics, filter traces, route logs, and reduce telemetry costs.

#opentelemetry #observability #metrics #traces #logs

#tracing 1 post

18 Mar 2025

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

#opentelemetry #observability #metrics #logging #kubernetes

#immutable-infrastructure 1 post

15 Sept 2024

Building Production AMIs with Packer: CI Pipelines, Terraform Integration, and Security Best Practices

Complete guide to building immutable AMIs with Packer in production - CI/CD pipelines, Terraform ASG integration, rollback strategies, maintenance workflows, and security hardening.

#packer #ami #aws #terraform #ci-cd #devops #security

#internal-platforms 1 post

3 Feb 2026

Platform Engineering in 2026 - It's About the Discipline, Not the Tools

Platform engineering has become the most misunderstood role in tech. Everyone's building 'platforms' but few understand what actually makes one successful. Here's what I've learned building platforms for teams of 10 to 500.

#platform-engineering #devops #developer-experience #idp

#pod-security 1 post

10 Aug 2025

Pod Security Standards Enforcement - The PSP Replacement That Actually Works

How to enforce Pod Security Standards using the built-in Pod Security Admission controller. Covers Privileged, Baseline, and Restricted profiles, migration from PSPs, namespace labeling, and exemptions.

#kubernetes #security #psp #admission-controller #hardening

#psp 1 post

10 Aug 2025

Pod Security Standards Enforcement - The PSP Replacement That Actually Works

How to enforce Pod Security Standards using the built-in Pod Security Admission controller. Covers Privileged, Baseline, and Restricted profiles, migration from PSPs, namespace labeling, and exemptions.

#kubernetes #security #pod-security #admission-controller #hardening

#admission-controller 1 post

10 Aug 2025

Pod Security Standards Enforcement - The PSP Replacement That Actually Works

How to enforce Pod Security Standards using the built-in Pod Security Admission controller. Covers Privileged, Baseline, and Restricted profiles, migration from PSPs, namespace labeling, and exemptions.

#kubernetes #security #pod-security #psp #hardening

#hardening 1 post

10 Aug 2025

Pod Security Standards Enforcement - The PSP Replacement That Actually Works

How to enforce Pod Security Standards using the built-in Pod Security Admission controller. Covers Privileged, Baseline, and Restricted profiles, migration from PSPs, namespace labeling, and exemptions.

#kubernetes #security #pod-security #psp #admission-controller

#scheduling 1 post

18 Dec 2025

Pod Topology Spread Constraints - Distributing Workloads Intelligently

Control how pods spread across nodes, zones, and regions. A deep dive into topology spread constraints for high availability and efficient resource utilization.

#kubernetes #high-availability #pods #devops

#high-availability 1 post

18 Dec 2025

Pod Topology Spread Constraints - Distributing Workloads Intelligently

Control how pods spread across nodes, zones, and regions. A deep dive into topology spread constraints for high availability and efficient resource utilization.

#kubernetes #scheduling #pods #devops

#private-cluster 1 post

15 Feb 2022

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

#azure #aks #kubernetes #vpn #twingate #networking

#flagger 1 post

4 Dec 2025

Progressive Delivery with Flagger: Automated Canary Deployments

Implement automated canary deployments with Flagger. Metrics-based promotion, automated rollback, and integration with Istio, Linkerd, and Gateway API.

#canary #progressive-delivery #kubernetes #gitops #deployment

#progressive-delivery 1 post

4 Dec 2025

Progressive Delivery with Flagger: Automated Canary Deployments

Implement automated canary deployments with Flagger. Metrics-based promotion, automated rollback, and integration with Istio, Linkerd, and Gateway API.

#flagger #canary #kubernetes #gitops #deployment

#apache-pulsar 1 post

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#kubernetes #kind #helm #messaging #pubsub #devtools

#pubsub 1 post

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#apache-pulsar #kubernetes #kind #helm #messaging #devtools

#devtools 1 post

15 Jan 2022

Apache Pulsar Playground: Running Pulsar Locally on kind with Dashboards, Clients, and Admin Tools

In this blog, I'll walk you through setting up a full-featured Apache Pulsar playground using kind (Kubernetes in Docker). Whether you're testing Pulsar for learning or demoing a real pub/sub model with admin tools and monitoring, this setup gives you everything.

#apache-pulsar #kubernetes #kind #helm #messaging #pubsub

#pulsar 1 post

15 Jul 2022

Pulsar vs Kafka in K8s: Battle of Event Streams

Pulsar vs Kafka

#kafka

#rds-proxy 1 post

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #lambda #rds #serverless #databases #terraform #connection-pooling

#connection-pooling 1 post

25 Feb 2025

RDS Proxy for Lambda - Solving the Connection Exhaustion Problem

How to use Amazon RDS Proxy to handle database connections from Lambda functions at scale. Covers connection pooling, IAM authentication, Terraform setup, and the gotchas you'll hit in production.

#aws #lambda #rds #rds-proxy #serverless #databases #terraform

#management 1 post

18 Sept 2025

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

#remote-work #engineering-culture #productivity #career

#resource-management 1 post

15 Dec 2024

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

#kubernetes #cost-optimization #devops #cloud #finops

#route53 1 post

15 Jun 2022

Route 53 Deep Dive: Multi-Region Latency Routing with Health-Based Failover

A hands-on guide to configuring AWS Route 53 for latency-based routing across multiple regions, incorporating health checks for automatic failover.

#aws #dns #terraform #failover #latency-routing

#failover 1 post

15 Jun 2022

Route 53 Deep Dive: Multi-Region Latency Routing with Health-Based Failover

A hands-on guide to configuring AWS Route 53 for latency-based routing across multiple regions, incorporating health checks for automatic failover.

#aws #route53 #dns #terraform #latency-routing

#latency-routing 1 post

15 Jun 2022

Route 53 Deep Dive: Multi-Region Latency Routing with Health-Based Failover

A hands-on guide to configuring AWS Route 53 for latency-based routing across multiple regions, incorporating health checks for automatic failover.

#aws #route53 #dns #terraform #failover

#secretless 1 post

17 Oct 2025

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

#security #kubernetes #zero-trust #secrets-management #sidecar

#secrets-management 1 post

17 Oct 2025

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

#secretless #security #kubernetes #zero-trust #sidecar

#sidecar 1 post

17 Oct 2025

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

#secretless #security #kubernetes #zero-trust #secrets-management

#gitlab 1 post

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#kubernetes #aks #azure #helm #devops #self-hosted #startup

#self-hosted 1 post

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #kubernetes #aks #azure #helm #devops #startup

#startup 1 post

25 Jan 2026

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

#gitlab #kubernetes #aks #azure #helm #devops #self-hosted

#technical-writing 1 post

8 Jul 2025

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

#documentation #engineering-culture #leadership #career

#istio 1 post

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#kubernetes #service-mesh #linkerd #cilium #networking #devops

#linkerd 1 post

20 Nov 2024

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

#kubernetes #service-mesh #istio #cilium #networking #devops

#slo 1 post

30 Nov 2025

SLO-Based Alerting: Burn Rate Alerts vs Threshold Alerts

Implement SLO-based alerting with burn rate alerts. Move from noisy threshold alerts to meaningful reliability signals using error budgets.

#sre #alerting #prometheus #observability #reliability

#soc 1 post

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #elasticsearch #cribl #docker #homelab #devops #siem

#cribl 1 post

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #soc #elasticsearch #docker #homelab #devops #siem

#siem 1 post

20 Sept 2025

Build a SOC Homelab with Docker - Elasticsearch, Cribl, and Log Simulation

Set up a Security Operations Center lab environment using Docker. Includes Elasticsearch, Kibana, Cribl Stream for log routing, and simulated log generators for hands-on security analysis practice.

#security #soc #elasticsearch #cribl #docker #homelab #devops

#rego 1 post

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #iac #gitops #platform-engineering #devops #modules #policy-as-code

#modules 1 post

14 Feb 2026

Spacelift from Scratch: Automating Terraform at Scale with Spaces, Stacks, OPA Policies, and a Private Module Registry

A complete guide to setting up Spacelift for multi-team Terraform automation - from zero to production with spaces, dynamic stacks, OPA security policies in Rego, private module registry, and GitOps-driven infrastructure.

#spacelift #terraform #opa #rego #iac #gitops #platform-engineering #devops #policy-as-code

#spot-instances 1 post

12 Dec 2025

Spot Instance Patterns: Graceful Handling and Cost Savings

Master AWS Spot Instances in production. Handle interruptions gracefully, use mixed instance groups, and save 60-90% on compute costs.

#aws #kubernetes #cost-optimization #eks #reliability

#sql-server 1 post

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #oracle #migration #aws #dms #terraform #event-sourcing #platform-engineering #devops

#oracle 1 post

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #migration #aws #dms #terraform #event-sourcing #platform-engineering #devops

#dms 1 post

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #migration #aws #terraform #event-sourcing #platform-engineering #devops

#event-sourcing 1 post

15 Sept 2025

Migrating Event Store Data from SQL Server and Oracle to DynamoDB with AWS DMS

How we used AWS DMS with database views, partitioned replication tasks, and Terraform to migrate event sourcing data from on-prem SQL Server and Oracle to DynamoDB – the architecture, the gotchas, and production Terraform you can reuse.

#dynamodb #sql-server #oracle #migration #aws #dms #terraform #platform-engineering #devops

#agile 1 post

12 Sept 2023

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

#engineering-culture #productivity #remote-work #team-management

#team-management 1 post

12 Sept 2023

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

#engineering-culture #productivity #agile #remote-work

#certifications 1 post

8 Jan 2025

Stop Chasing Certifications

Certifications have become a checkbox exercise. They don't prove competence, and they often distract from what actually matters: building things and solving real problems.

#career #learning #engineering-culture

#learning 1 post

8 Jan 2025

Stop Chasing Certifications

Certifications have become a checkbox exercise. They don't prove competence, and they often distract from what actually matters: building things and solving real problems.

#career #certifications #engineering-culture

#supply-chain 1 post

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#security #sigstore #slsa #sbom #kubernetes #devops

#slsa 1 post

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#security #supply-chain #sigstore #sbom #kubernetes #devops

#sbom 1 post

5 Sept 2025

Software Supply Chain Security - Sigstore, SLSA, and Beyond

Your dependencies are an attack vector. Here's how to secure your software supply chain with Sigstore, SLSA frameworks, SBOMs, and admission policies that actually work.

#security #supply-chain #sigstore #slsa #kubernetes #devops

#tailscale 1 post

25 Oct 2025

Tailscale in Production: WireGuard Mesh for Hybrid Cloud

Deploy Tailscale for secure connectivity across clouds, offices, and Kubernetes clusters. Zero-config VPN mesh with SSO integration and ACLs.

#wireguard #vpn #networking #hybrid-cloud #zero-trust

#wireguard 1 post

25 Oct 2025

Tailscale in Production: WireGuard Mesh for Hybrid Cloud

Deploy Tailscale for secure connectivity across clouds, offices, and Kubernetes clusters. Zero-config VPN mesh with SSO integration and ACLs.

#tailscale #vpn #networking #hybrid-cloud #zero-trust

#hcl2 1 post

30 Jan 2026

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

#terraform #iac #migration #aws #s3 #state-management

#best-practices 1 post

20 Sept 2025

Terraform Best Practices (Part 1) - Project Structure, State, and Modules

A comprehensive guide to Terraform best practices covering project organisation, state management, module design, and foundational patterns for scalable infrastructure as code.

#terraform #iac #devops #aws

#state 1 post

1 Feb 2026

Terraform State Surgery - Splitting, Moving, and Refactoring Without Downtime

A practical guide to breaking up monolithic Terraform state files, moving resources between states, and refactoring infrastructure safely. Includes real examples, scripts, and the exact commands we use.

#terraform #migration #refactoring #iac #devops

#refactoring 1 post

1 Feb 2026

Terraform State Surgery - Splitting, Moving, and Refactoring Without Downtime

A practical guide to breaking up monolithic Terraform state files, moving resources between states, and refactoring infrastructure safely. Includes real examples, scripts, and the exact commands we use.

#terraform #state #migration #iac #devops

#sigv4 1 post

15 Jul 2021

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

#github-actions #traefik #blue-green #canary #deployments #aws #devops #ci-cd

#tls 1 post

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #traefik #security #certificates #smallstep #pki #devops

#certificates 1 post

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #traefik #tls #security #smallstep #pki #devops

#smallstep 1 post

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #traefik #tls #security #certificates #pki #devops

#pki 1 post

15 Jun 2021

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

#mtls #traefik #tls #security #certificates #smallstep #devops

#vault 1 post

15 Jan 2020

Deploying Vault with a Custom AMI

An end-to-end guide for baking a Vault AMI using Packer and deploying a Vault EC2 instance on AWS.

#aws #packer #ami #devops

#aurora 1 post

2 Feb 2026

Implementing Vertical Autoscaling for Aurora Databases Using Lambda Functions

AWS doesn't offer vertical autoscaling for Aurora – so we built it. CloudWatch Alarms, SNS, Lambda coordination, and the gotchas we hit in production.

#rds #aws #lambda #autoscaling #terraform #serverless

#vitess 1 post

28 Dec 2025

Vitess for MySQL: Horizontal Sharding Done Right

Scale MySQL horizontally with Vitess. Automatic sharding, online schema changes, and Kubernetes-native deployment for massive scale.

#mysql #database #sharding #kubernetes #scaling

#mysql 1 post

28 Dec 2025

Vitess for MySQL: Horizontal Sharding Done Right

Scale MySQL horizontally with Vitess. Automatic sharding, online schema changes, and Kubernetes-native deployment for massive scale.

#vitess #database #sharding #kubernetes #scaling

#sharding 1 post

28 Dec 2025

Vitess for MySQL: Horizontal Sharding Done Right

Scale MySQL horizontally with Vitess. Automatic sharding, online schema changes, and Kubernetes-native deployment for massive scale.

#vitess #mysql #database #kubernetes #scaling

#scaling 1 post

28 Dec 2025

Vitess for MySQL: Horizontal Sharding Done Right

Scale MySQL horizontally with Vitess. Automatic sharding, online schema changes, and Kubernetes-native deployment for massive scale.

#vitess #mysql #database #sharding #kubernetes

#vpa 1 post

20 Dec 2025

VPA + HPA Together: The Right Way to Autoscale Both

Use Vertical Pod Autoscaler and Horizontal Pod Autoscaler together without conflicts. Includes KEDA integration and best practices.

#kubernetes #autoscaling #hpa #keda #performance

#hpa 1 post

20 Dec 2025

VPA + HPA Together: The Right Way to Autoscale Both

Use Vertical Pod Autoscaler and Horizontal Pod Autoscaler together without conflicts. Includes KEDA integration and best practices.

#kubernetes #autoscaling #vpa #keda #performance

#keda 1 post

20 Dec 2025

VPA + HPA Together: The Right Way to Autoscale Both

Use Vertical Pod Autoscaler and Horizontal Pod Autoscaler together without conflicts. Includes KEDA integration and best practices.

#kubernetes #autoscaling #vpa #hpa #performance

#api-server 1 post

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #kubectl #etcd #controllers #scheduler #kubelet #devops

#etcd 1 post

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #kubectl #api-server #controllers #scheduler #kubelet #devops

#controllers 1 post

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #kubectl #api-server #etcd #scheduler #kubelet #devops

#scheduler 1 post

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #kubectl #api-server #etcd #controllers #kubelet #devops

#kubelet 1 post

15 Nov 2021

What Actually Happens When You kubectl apply – The Full Chain From YAML to Running Pod

The complete journey: client-side vs server-side apply, admission controllers, etcd persistence, controller reconciliation, scheduler binding, and kubelet container creation. Every step traced.

#kubernetes #kubectl #api-server #etcd #controllers #scheduler #devops

#hot-takes 1 post

5 Apr 2023

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

#kubernetes #startups #architecture #infrastructure #devops