Skip to content

K8s 54 posts

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

DevOps

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

Security

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

Security

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

Security

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

Databases

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

Security

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

Databases

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

DevOps

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

Networking

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

DevOps

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

GitOps

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

Career

Deploying Kafka on Kubernetes with Strimzi

A step-by-step guide to setting up a Kafka cluster on a local Kind cluster using the Strimzi operator, with optional Terraform provisioning.

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

AWS

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

Security

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

Azure

Falco on K8s (Kind)

Falco Kubernetes Lab: Runtime Threat Detection with Prometheus & Grafana

Security

AWS 48 posts

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

Observability

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

DevOps

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

Terraform

Why I replaced AWS NAT Gateway with a NAT Instance - and saved 20$ of dollar per month

AWS offers NAT Gateways as the default, fully managed solution for letting private subnet resources reach the internet. However, NAT Gateways can be pricey: Hourly cost: ~₹3.75/hour (varies by region) Data transfer cost: Additional ₹3.75/GB on top of standard data transfer For small dev/test environments or personal labs, these costs can add up quickly. In contrast, a NAT Instance is just a normal EC2 instance configured to perform IP forwarding and NAT. It’s typically much cheaper to run a small instance (`t3.micro`) than a NAT Gateway, especially if your traffic volume is modest.

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

Networking

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

Networking

ECS Task Sets: Blue/Green Deployments Without CodeDeploy

How to use ECS external deployment controllers and task sets for manual blue/green deployments – the setup, the CLI commands, the Terraform, and an honest assessment of when it's worth the complexity.

EKS without VPC CNI: Deploying Calico with IPIP and BGP

AWS EKS defaults to the VPC CNI plugin, assigning VPC IPs to pods via ENIs. While straightforward, this setup limits pod density per node and consumes VPC IPs rapidly. To overcome these constraints, deploying Calico with IPIP or BGP offers a scalable alternative.

K8s

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

CICD

ECS Fargate Deep Dive Part 1: How Fargate Really Works

In the first part of our ECS Fargate Deep Dive, we break down what happens behind the scenes when you run a task on Fargate — Firecracker microVMs, ENIs, IAM and the hidden host fleet.

ECS Fargate Deep Dive Part 2: Firecracker in Action

In the second part of our ECS Fargate Deep Dive, we get hands-on with Firecracker — the lightweight VMM that powers Fargate — and simulate task isolation and networking locally.

DevOps 40 posts

Self-Hosted GitLab on Kubernetes - A Startup's Journey

A detailed guide on deploying GitLab on AKS using Helm charts, with Azure SQL as the database backend. Covers architecture decisions, configuration, lessons learned, and the gotchas we hit in production.

K8s

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

SRE

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

MLOps

LocalStack Deep Dive - AWS on Your Laptop

Run AWS services locally for faster development and testing. A practical guide to LocalStack covering S3, Lambda, DynamoDB, SQS, and integration testing patterns.

AWS

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

CICD

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

Backend

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

SRE

Ephemeral Containers for Production Debugging

Debug distroless and minimal containers in production without redeploying. Ephemeral containers let you attach debugging tools to running pods - here's how to use them effectively.

K8s

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

SRE

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

Career

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

Observability

Production War Stories: The NGINX Log Rotation That Caused a P1

How a 'safe' AMI upgrade led to traffic drops, zombie log files, and disk exhaustion – and the debugging journey that followed. A real incident from on-call, with technical details and lessons learned.

Right-Sizing Kubernetes Workloads - Stop Burning Money

Most Kubernetes clusters waste 50-70% of their resources. Here's how to measure what you're actually using, fix the worst offenders, and automate the process - without breaking production.

K8s

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

Platform Engineering

Creating a Lab Container

An end-to-end guide for creating a lab container for DevOps training.

Security 29 posts

Kyverno vs OPA: Policy Engines Compared

Detailed comparison of Kyverno and OPA Gatekeeper for Kubernetes policy enforcement. Includes real examples, performance considerations, and migration guidance.

K8s

Secretless Broker: Zero-Secret Applications

Remove secrets from your applications entirely with Secretless Broker. Inject database credentials, API keys, and certificates via sidecar without your app knowing they exist.

K8s

OPA Gatekeeper: Policy as Code for Kubernetes

Implement admission control policies with OPA Gatekeeper. Enforce security standards, naming conventions, resource limits, and compliance requirements at the cluster level.

K8s

SPIFFE and SPIRE: Zero Trust Workload Identity

Deep dive into SPIFFE and SPIRE for workload identity. Replace shared secrets with cryptographic identity for service-to-service authentication. Includes Kubernetes deployment and mTLS examples.

K8s

Securing Your Clawdbot & Setting Up Powerful Integrations

A comprehensive guide to hardening your Clawdbot installation and integrating with Google Workspace, GitHub, and Notion – turning your AI assistant into a productivity powerhouse.

AI

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

Networking

AWS Managed Prefix Lists with Terraform - Stop Hardcoding CIDRs

How to use AWS Managed Prefix Lists to eliminate hardcoded CIDR blocks in security groups and route tables. Covers AWS-managed prefixes, customer-managed lists for data centres, and production Terraform patterns.

AWS

Kubernetes DNS Spoofing: Exploiting NET_RAW and ARP

DNS spoofing in Kubernetes remains a critical threat, enabling attackers to redirect traffic, intercept data, or disrupt services. This article explores how such attacks occur and outlines strategies to prevent them.

K8s

mTLS with Traefik: Hands-On Setup with Step CA

A complete walkthrough of setting up mutual TLS with Traefik and Smallstep CA – from certificate generation to client authentication. Includes local DNS, ACME integration, and a working demo you can deploy.

Falco on K8s (Kind)

Falco Kubernetes Lab: Runtime Threat Detection with Prometheus & Grafana

K8s

Networking 19 posts

Gateway API Advanced Patterns: Beyond Basic Ingress

Master Gateway API with traffic splitting, header-based routing, cross-namespace references, and TLS passthrough. The future of Kubernetes ingress.

K8s

Cilium Service Mesh: Sidecar-Free with eBPF

Deploy a service mesh without sidecars using Cilium. Get mTLS, traffic management, and observability powered by eBPF at the kernel level.

K8s

EKS IP Exhaustion: Running out of IPs, one way to fix it

Running out of IP addresses in AWS EKS can be a subtle yet critical issue. It often manifests as pods stuck in a pending state or nodes failing to join the cluster, leading to deployment bottlenecks and potential downtime. Understanding the root cause and implementing effective solutions is essential for maintaining cluster health and scalability. Now, there are many ways to fix this, but this is one way.

AWS

AWS VPC Endpoints - Keep Your Traffic Off the Internet

How to use VPC Endpoints to access AWS services without internet gateways or NAT. Covers Gateway vs Interface endpoints, PrivateLink, endpoint policies, cost optimization, and production Terraform patterns.

AWS

Kubernetes Gateway API vs Ingress - When to Migrate and How

Gateway API is the successor to Ingress, bringing role-oriented design, native traffic splitting, and cross-namespace routing. This post compares both APIs, when to migrate, and practical migration patterns.

K8s

eBPF Deep Dive - Beyond Cilium

eBPF is transforming how we observe, secure, and network Linux systems. This guide covers the fundamentals, practical use cases beyond Cilium, and how to start writing your own eBPF programs.

Security

Service Mesh Comparison - Istio vs Linkerd vs Cilium

Service meshes promise observability, security, and traffic management. But which one should you choose? A practical comparison based on running all three in production.

K8s

Deep Dive into EC2 Networking

Deep Dive into EC2 Networking: ENIs, IP Addressing and Deployment Architectures

AWS

Career 14 posts

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

Culture

The Principal Engineer Trap

The IC ladder looks appealing until you're at the top. Many senior engineers chase Principal titles without understanding what they're signing up for. Here's what nobody tells you.

Culture

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

Culture

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

Culture

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

Culture

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

Culture

Common DevOps Interview Questions Candidates Fail

The questions that separate senior engineers from those who memorised tutorials. Real interview failures, what interviewers are actually looking for, and how to answer with depth.

DevOps

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

Culture

Stop Chasing Certifications

Certifications have become a checkbox exercise. They don't prove competence, and they often distract from what actually matters: building things and solving real problems.

Culture

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

Culture

Your Startup Doesn't Need Kubernetes

Kubernetes is an incredible technology that solves real problems. But for most startups, it's the wrong tool. Here's how to know when you're ready - and what to use instead.

K8s

Platform Engineering 14 posts

Platform Engineering in 2026 - It's About the Discipline, Not the Tools

Platform engineering has become the most misunderstood role in tech. Everyone's building 'platforms' but few understand what actually makes one successful. Here's what I've learned building platforms for teams of 10 to 500.

DevOps

Crossplane Compositions: Build Your Own Cloud API

Create custom cloud APIs with Crossplane Compositions. Abstract away complexity and give developers self-service infrastructure with guardrails.

K8s

Building an Internal Developer Platform

A practical guide to building an IDP that developers actually want to use. Covers the build vs buy decision, Backstage implementation, and the organisational changes required for success.

DevOps

Crossplane and Localstack

Crossplane + LocalStack on kind: 100 % Local AWS Infrastructure-as-Code

Culture 13 posts

The Real Difference Between Senior, Staff, and Principal Engineer

Everyone wants to know the difference between Senior, Staff, and Principal. After holding all three titles, I can tell you the real differences aren't what most people think. It's not about years - it's about scope.

Career

The Principal Engineer Trap

The IC ladder looks appealing until you're at the top. Many senior engineers chase Principal titles without understanding what they're signing up for. Here's what nobody tells you.

Career

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

SRE

Contract vs Perm: 4 Years of Both and What I'd Choose Now

I've done both. Multiple times. Here's the real trade-offs nobody talks about - the money, the time off problem, the boredom factor, and why your life situation matters more than you think.

Career

Remote Work Won

The RTO push isn't about productivity. The data is clear: remote work works. What's really happening is a fight over control, real estate, and management inability to adapt.

Career

Why Senior Engineers Should Write Docs

Documentation is often treated as junior work. That's backwards. The most impactful documentation comes from senior engineers, and writing it is a force multiplier for your expertise.

Career

The 10x Engineer is a Myth

The idea of the 10x engineer has done more harm than good. What actually matters is team multipliers - engineers who make everyone around them better.

Career

The Meeting That Should Have Been a Doc

Most meetings are information broadcasts disguised as collaboration. Learn when to meet, when to write, and how to save everyone's time.

Career

Stop Chasing Certifications

Certifications have become a checkbox exercise. They don't prove competence, and they often distract from what actually matters: building things and solving real problems.

Career

Standups Are Broken

Daily standups were meant to improve communication. Instead, they've become status meetings that waste time and interrupt deep work. There's a better way.

Career

Terraform 9 posts

Terraform 0.11 to 1.11 Migration - The Full Journey

A detailed guide on migrating Terraform from 0.11 to 1.11, covering HCL2 syntax changes, the S3 bucket resource split, state manipulation, and ensuring zero-drift upgrades.

AWS

Cloud Tagging Strategies That Actually Work

Tagging is the foundation of cloud governance, cost allocation, and automation. Here's how to implement tagging consistently across your infrastructure using context modules, policies, and automation.

AWS

AWS PrivateLink with Terraform

A hands-on technical guide to implementing AWS PrivateLink between VPCs using Terraform.

AWS

SRE 8 posts

DORA Metrics Implementation - Measuring What Matters

DORA metrics are the industry standard for measuring DevOps performance. Here's how to implement them properly, avoid common pitfalls, and actually use them to improve your team's delivery.

DevOps

Blameless Culture is Harder Than You Think

Everyone claims to have a blameless culture. Few actually do. Here's what real blamelessness looks like and why it's so difficult to achieve.

Culture

SRE for Small Teams

You don't need Google's budget to practice SRE. Here's how to implement Site Reliability Engineering principles with a small team and limited resources.

DevOps

Incident Management That Actually Works

Most incident processes are theatre. Here's how to build incident management that reduces downtime, prevents recurrence, and doesn't burn out your team.

DevOps

Backend 8 posts

Build an ETL Pipeline with Python, PostgreSQL, and Airflow

A practical guide to building an ETL pipeline that extracts weather data from OpenWeatherMap, transforms it with pandas, and loads it into PostgreSQL. Includes Airflow orchestration with email notifications.

DevOps

CICD 7 posts

Test GitHub Actions Locally with Act

Stop pushing to test your workflows. Act lets you run GitHub Actions locally with instant feedback. Here's how to set it up and use it effectively.

DevOps

Building a Custom GitHub Action for Traefik Traffic Weighting

How I built a GitHub Action to manage blue/green and canary deployments by dynamically updating Traefik weighted services – with SigV4 authentication, YAML configuration, and a generator API.

AWS

Observability 7 posts

OpenTelemetry Changed How I Think About Observability

A practical, opinionated take on OpenTelemetry - why it matters, what it actually solves, and how to instrument across Kubernetes, Lambda, ECS, and EC2 without losing your mind.

DevOps

ELK Stack Migration: From 6.x to 8.x - The Complete Guide

A comprehensive guide to migrating your Elasticsearch, Logstash, and Kibana stack from version 6.x to 8.x. Covers breaking changes, migration strategies, index compatibility, and zero-downtime approaches.

DevOps

Elastic Cloud Setup Guide - From Zero to Production

A comprehensive guide to setting up Elastic Cloud (Elasticsearch Service), including deployment configuration, security setup, index lifecycle management, integrations, and cost optimization.

AWS

OpenTelemetry from Scratch

OpenTelemetry unifies traces, metrics, and logs under one standard. This guide covers how to instrument your applications, set up collectors, and actually make sense of the data.

DevOps

Databases 6 posts

Database on Kubernetes - When It Makes Sense

Running databases on Kubernetes is controversial. Sometimes it's the right call, sometimes it's a disaster waiting to happen. Here's how to decide, and how to do it properly if you choose to proceed.

K8s

Database Backup to S3 with Kubernetes CronJobs

Build a production-ready database backup system using Kubernetes CronJobs, PostgreSQL, and S3. Includes a complete local testing environment with KIND and LocalStack.

K8s

GCP 2 posts

MLOps 1 post

MLOps for DevOps Engineers - What You Actually Need to Know

MLOps is becoming a critical skill for DevOps engineers. Here's what matters: the infrastructure patterns, tooling, and operational practices that make ML systems work in production - from someone who learned the hard way.

DevOps

AI 1 post

GitOps 1 post

GitOps with ArgoCD - A Practical Setup Guide

A hands-on guide to implementing GitOps with ArgoCD. Covers installation, application management, sync strategies, secrets handling, and the patterns that actually work in production.

K8s

Azure 1 post

Private AKS Cluster with Twingate: Secure API Access Without a Public Endpoint

Running Kubernetes clusters privately is a growing best practice. In this blog, I'll walk you through deploying a private AKS cluster on Azure with no public API endpoint, and enabling secure access via Twingate VPN, which provides identity-based access without opening up your network.

K8s