Skip to content
Back to blog Kubernetes Networking: A Deep Dive From First Principles

Kubernetes Networking: A Deep Dive From First Principles

K8sNetworking

Kubernetes networking is where theory meets reality – and where most production incidents happen. DNS failures, service discovery issues, CNI misconfigurations, IP exhaustion. I’ve debugged them all.

This guide walks through Kubernetes networking from first principles: how packets actually move between containers, pods, nodes, and the outside world. We’ll cover the networking model, CNI plugins, kube-proxy modes, Services, Ingress, and Network Policies – with AWS/EKS context throughout.

The Kubernetes Networking Model

Kubernetes enforces three foundational networking principles:

  1. Pod-to-Pod Communication: Every Pod can communicate directly with any other Pod across nodes, without NAT
  2. Node-to-Pod Communication: Nodes can reach every Pod, and Pods can reach nodes, without NAT
  3. Pod IP Consistency: A Pod’s IP address is the same whether viewed from inside or outside the Pod

This creates a flat, routable L3 network where every Pod is a first-class network entity. No port translation. No NAT tables to debug. Applications communicate using standard IPs.

Kubernetes Flat Network Model

Why This Matters

The flat network model simplifies microservices communication dramatically. A Pod doesn’t need to know which node another Pod runs on – it just needs the IP address.

This design enables:

  • Network policies at the Pod level (not port mappings)
  • Simple service discovery via DNS
  • Portable applications that don’t embed network topology

But Kubernetes doesn’t implement networking itself – it delegates to CNI plugins.

The Network Stack: From Container to Wire

Before diving into CNI plugins, let’s trace how a packet actually leaves a container.

Container → Pod: The Pause Container

Every Pod has a hidden “pause” container that holds the network namespace. Application containers join this namespace using --net=container.

┌─────────────────────────────────────────┐
│                  POD                    │
│  ┌─────────────┐    ┌─────────────┐    │
│  │  Container  │    │  Container  │    │
│  │    (app)    │    │  (sidecar)  │    │
│  └──────┬──────┘    └──────┬──────┘    │
│         │                  │           │
│         └────────┬─────────┘           │
│                  │                     │
│           ┌──────┴──────┐              │
│           │    pause    │              │
│           │  (network   │              │
│           │  namespace) │              │
│           └──────┬──────┘              │
│                  │                     │
│           eth0 (Pod IP)                │
└──────────────────┬──────────────────────┘

              veth pair

Containers within a Pod share:

  • The same IP address
  • The same port space
  • Communication via localhost

This is why containers in the same Pod can reach each other on 127.0.0.1 – they’re in the same network namespace.

Pod → Node: Virtual Ethernet Pairs

Pods connect to the node’s network namespace via veth pairs – virtual Ethernet cables with one end in the Pod and one end on the host.

┌─────────────────────────────────────────────────────────────┐
│                          NODE                               │
│                                                             │
│  ┌─────────────┐          ┌─────────────┐                  │
│  │    Pod A    │          │    Pod B    │                  │
│  │             │          │             │                  │
│  │    eth0     │          │    eth0     │                  │
│  └──────┬──────┘          └──────┬──────┘                  │
│         │ veth                   │ veth                    │
│         │                        │                         │
│  ┌──────┴────────────────────────┴──────┐                  │
│  │              cni0 bridge             │                  │
│  └──────────────────┬───────────────────┘                  │
│                     │                                       │
│              ┌──────┴──────┐                               │
│              │    eth0     │                               │
│              │ (node IP)   │                               │
│              └──────┬──────┘                               │
└─────────────────────┼───────────────────────────────────────┘

                Physical Network

Key components:

ComponentLocationPurpose
eth0 (Pod)Pod namespacePod’s network interface, holds Pod IP
vethXXXHost namespaceHost end of veth pair, connects to bridge
cni0Host namespaceVirtual bridge connecting all Pods on node
eth0 (Node)Host namespacePhysical/virtual NIC to external network

Same-Node Communication

When Pod A talks to Pod B on the same node:

  1. Packet exits Pod A via eth0
  2. Travels through veth pair to cni0 bridge
  3. Bridge forwards to Pod B’s veth pair
  4. Enters Pod B via eth0

No routing required – it’s just a layer 2 switch operation.

Cross-Node Communication

When Pod A on Node 1 talks to Pod C on Node 2:

  1. Packet exits Pod A via eth0
  2. Travels through veth pair to cni0 bridge
  3. Bridge routes to node’s routing table
  4. Encapsulation (VXLAN, IPIP, etc.) wraps packet
  5. Travels over physical network to Node 2
  6. Decapsulation unwraps packet
  7. Routes to Pod C via Node 2’s cni0 bridge

Cross-Node Pod Communication

This encapsulation is the overlay network – making Pod IPs routable across nodes even when the underlying network doesn’t know about them.

CNI Plugins: The Network Implementation

The Container Network Interface (CNI) is the specification that defines how container runtimes configure networking. Kubernetes doesn’t care how networking works – it just calls CNI plugins.

How CNI Works

When kubelet creates a Pod:

  1. Container runtime creates network namespace
  2. Runtime calls CNI plugin with namespace details
  3. CNI plugin:
    • Allocates IP address (IPAM)
    • Creates veth pair
    • Configures routes
    • Sets up any overlay/encapsulation
  4. Runtime starts containers in namespace

Major CNI Plugins

AWS VPC CNI (EKS Default)

The VPC CNI is AWS’s native plugin for EKS. Instead of overlay networking, it assigns real VPC IP addresses to Pods.

How it works:

  • Each node gets multiple ENIs (Elastic Network Interfaces)
  • Each ENI has multiple secondary IPs
  • Pods get secondary IPs directly from VPC subnet

Advantages:

  • Native VPC networking – no encapsulation overhead
  • Security groups can apply to Pods
  • VPC Flow Logs capture Pod traffic
  • Direct routing – better performance

Disadvantages:

  • IP address exhaustion is real (limited IPs per instance type)
  • Requires VPC CIDR planning
  • Pod density limited by ENI/IP limits
# Check IP allocation on an EKS node
kubectl get node -o jsonpath='{.items[*].status.allocatable.pods}'

# View ENI attachments
aws ec2 describe-instances --instance-id i-xxx \
  --query 'Reservations[].Instances[].NetworkInterfaces'

IP Exhaustion Mitigation:

  • Use prefix delegation (assign /28 prefixes instead of individual IPs)
  • Use larger instance types (more ENIs, more IPs)
  • Consider secondary CIDR ranges
  • Or switch to overlay-based CNI

Calico

Calico is the most popular third-party CNI. It supports multiple modes:

BGP Mode (Native Routing):

  • Uses BGP to advertise Pod CIDRs
  • No encapsulation – best performance
  • Requires BGP-capable network infrastructure
  • Each node becomes a BGP peer

IPIP Mode:

  • Encapsulates packets in IP-in-IP tunnels
  • Works on any network
  • Small performance overhead

VXLAN Mode:

  • Encapsulates in VXLAN
  • Works on any network
  • Slightly higher overhead than IPIP
# Calico IPPool with VXLAN
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-pool
spec:
  cidr: 10.244.0.0/16
  encapsulation: VXLAN
  natOutgoing: true
  nodeSelector: all()

Calico’s killer feature: Network Policies. Native Kubernetes NetworkPolicy support plus Calico’s extended policies for more granular control.

Cilium

Cilium is the modern choice – built on eBPF (extended Berkeley Packet Filter), it operates at the kernel level without iptables.

Advantages:

  • No iptables overhead – kernel-native packet processing
  • Hubble for network observability
  • Native support for L7 policies (HTTP, gRPC, Kafka)
  • Better performance at scale

Architecture:

┌───────────────────────────────────────────┐
│                   Node                    │
│                                           │
│  ┌─────────────────────────────────────┐  │
│  │           Cilium Agent              │  │
│  │  (programs eBPF, manages policies)  │  │
│  └──────────────────┬──────────────────┘  │
│                     │                     │
│           ┌─────────┴─────────┐           │
│           │   eBPF Programs   │           │
│           │   (in kernel)     │           │
│           └─────────┬─────────┘           │
│                     │                     │
│  ┌─────────────┐    │    ┌─────────────┐  │
│  │    Pod A    │◄───┴───►│    Pod B    │  │
│  └─────────────┘         └─────────────┘  │
└───────────────────────────────────────────┘

On EKS:

# Replace VPC CNI with Cilium
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.14.0 \
  --namespace kube-system \
  --set eni.enabled=true \
  --set ipam.mode=eni \
  --set egressMasqueradeInterfaces=eth0 \
  --set routingMode=native

Flannel

The simplest CNI. Uses VXLAN overlay by default. Good for learning, but lacks NetworkPolicy support without Calico integration (Canal).

CNI Comparison

FeatureVPC CNICalicoCiliumFlannel
OverlayNo (native)OptionalOptionalYes
PerformanceExcellentVery GoodExcellentGood
NetworkPolicyLimitedFull + ExtendedFull + L7No
ComplexityLowMediumMedium-HighLow
ObservabilityVPC Flow LogsLimitedHubbleLimited
EKS IntegrationNativeManualManualManual

Services: Stable Endpoints for Pods

Pods are ephemeral – they come and go, IPs change. Services provide stable endpoints.

Service Types

ClusterIP (Default)

Internal-only virtual IP. Only reachable from within the cluster.

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
  - port: 80
    targetPort: 8080

How it works:

  1. Service gets a ClusterIP (e.g., 10.100.0.100)
  2. kube-proxy programs routing rules on every node
  3. Traffic to ClusterIP gets DNAT’d to a backend Pod IP
  4. Load balancing across healthy endpoints

NodePort

Exposes service on each node’s IP at a static port (30000-32767).

apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  type: NodePort
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080

Traffic flow: Node:30080 → kube-proxy → Pod:8080

LoadBalancer

Creates external load balancer (cloud-provider specific).

On EKS, this creates an AWS Classic Load Balancer or NLB:

apiVersion: v1
kind: Service
metadata:
  name: api
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 443
    targetPort: 8443

AWS Load Balancer Controller is the modern way to manage ALBs/NLBs:

apiVersion: v1
kind: Service
metadata:
  name: api
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  loadBalancerClass: service.k8s.aws/nlb
  selector:
    app: api
  ports:
  - port: 443
    targetPort: 8443

ExternalName

DNS CNAME to external service. No proxying.

apiVersion: v1
kind: Service
metadata:
  name: external-db
spec:
  type: ExternalName
  externalName: mydb.example.com

Service Discovery via DNS

CoreDNS runs in every cluster and provides DNS-based service discovery.

DNS naming:

<service>.<namespace>.svc.cluster.local

Examples:

  • backend.default.svc.cluster.local → ClusterIP of backend service in default namespace
  • backend.default → Short form (works within cluster)
  • backend → Shortest form (works within same namespace)

Headless Services (no ClusterIP):

apiVersion: v1
kind: Service
metadata:
  name: stateful-app
spec:
  clusterIP: None  # Headless
  selector:
    app: stateful-app

DNS returns Pod IPs directly instead of a single ClusterIP. Used for StatefulSets where clients need to reach specific Pods.

kube-proxy: The Service Implementation

kube-proxy runs on every node and implements Services by programming network rules.

kube-proxy Modes

iptables Mode (Default)

Uses Linux iptables rules for packet filtering and NAT.

How it works:

  1. kube-proxy watches API server for Service/Endpoint changes
  2. Programs iptables rules: KUBE-SERVICESKUBE-SVC-xxxKUBE-SEP-xxx
  3. Kernel handles packet routing directly (no userspace proxy)

Rule chain:

PREROUTING
    └── KUBE-SERVICES
            └── KUBE-SVC-XXXXX (per service)
                    ├── KUBE-SEP-AAAA (50% probability)
                    └── KUBE-SEP-BBBB (50% probability)

View rules:

# List service rules
iptables -t nat -L KUBE-SERVICES -n -v

# Follow a specific service
iptables -t nat -L KUBE-SVC-XXXXXXX -n -v

Drawbacks:

  • O(n) rule evaluation – scales poorly with many services
  • Rules are sequential – first match wins
  • Debugging is painful

IPVS Mode

Uses Linux Virtual Server for L4 load balancing.

Advantages over iptables:

  • O(1) lookup – hash tables instead of sequential rules
  • Better load balancing algorithms (round-robin, least connections, etc.)
  • Designed for high-performance load balancing
  • Scales to thousands of services

Enable on EKS:

# kube-proxy ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
data:
  config.conf: |
    mode: "ipvs"
    ipvs:
      scheduler: "rr"
# Restart kube-proxy
kubectl -n kube-system rollout restart daemonset kube-proxy

# Verify IPVS rules
ipvsadm -Ln

Output:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.100.0.1:443 rr
  -> 10.0.1.10:443                Masq    1      0          0
TCP  10.100.0.10:53 rr
  -> 10.244.0.5:53                Masq    1      0          0
  -> 10.244.1.3:53                Masq    1      0          0

nftables Mode (Future)

The successor to iptables. Better performance than iptables, similar to IPVS. Still in development for kube-proxy.

eBPF (Cilium)

Cilium can replace kube-proxy entirely with eBPF-based service implementation:

# Cilium values for kube-proxy replacement
kubeProxyReplacement: true
k8sServiceHost: <API_SERVER_IP>
k8sServicePort: 443

Benefits: kernel-native packet processing, no iptables overhead, better observability via Hubble.

kube-proxy Mode Comparison

ModeComplexityPerformanceScaleStatus
iptablesHigh rulesO(n)~5000 servicesDefault
IPVSLowerO(1)10000+ servicesGA
nftablesLowerO(1)10000+ servicesBeta
eBPF (Cilium)MediumO(1)10000+ servicesProduction

Recommendation: For clusters with >1000 services, switch to IPVS or Cilium.

Ingress: HTTP/S Traffic Management

Ingress provides HTTP/S routing from outside the cluster to Services.

Ingress Resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls

AWS Load Balancer Controller

The modern way to do Ingress on EKS – creates ALBs directly from Ingress resources.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    kubernetes.io/ingress.class: "alb"
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 80

Target types:

  • instance: ALB routes to NodePort (extra hop)
  • ip: ALB routes directly to Pod IPs (requires VPC CNI)

Gateway API (The Future)

Gateway API is the evolution of Ingress – more expressive, role-oriented, and extensible.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
spec:
  gatewayClassName: aws-alb
  listeners:
  - name: https
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: app-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: app-route
spec:
  parentRefs:
  - name: main-gateway
  hostnames:
  - "app.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    backendRefs:
    - name: api-service
      port: 80

Network Policies: Firewall for Pods

By default, all Pods can communicate with all other Pods. Network Policies restrict this.

Default Deny

Start with deny-all, then whitelist:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}  # Applies to all pods
  policyTypes:
  - Ingress
  - Egress

Allow Specific Traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

Allow External Egress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-external-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Egress
  egress:
  # Allow DNS
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
  # Allow HTTPS to external
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.0.0/8
        - 172.16.0.0/12
        - 192.168.0.0/16
    ports:
    - protocol: TCP
      port: 443

CNI Support Required

Important: Network Policies require CNI support. VPC CNI alone doesn’t support them – you need Calico or Cilium.

On EKS with VPC CNI + Calico for policies:

kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/calico-operator.yaml
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/calico-crs.yaml

DNS Deep Dive

CoreDNS is the cluster DNS server.

How DNS Resolution Works

  1. Pod makes DNS query (e.g., backend.default)
  2. Query goes to CoreDNS (ClusterIP 10.96.0.10, port 53)
  3. CoreDNS looks up Service → returns ClusterIP
  4. Pod connects to ClusterIP

The ndots Problem

Default resolv.conf in Pods:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

ndots:5 means: if hostname has fewer than 5 dots, append search domains first.

Query for api.external.com:

  1. api.external.com.default.svc.cluster.local → NXDOMAIN
  2. api.external.com.svc.cluster.local → NXDOMAIN
  3. api.external.com.cluster.local → NXDOMAIN
  4. api.external.com → Success

Four extra DNS queries for every external domain lookup!

Fix: Lower ndots or use FQDN with trailing dot:

spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"

Or in application code: api.external.com. (trailing dot = FQDN, skip search domains).

CoreDNS Scaling

CoreDNS can become a bottleneck. Signs:

  • High DNS latency
  • Increased 5xx from services
  • CoreDNS Pod CPU saturation

Solutions:

  1. Scale CoreDNS replicas
  2. Enable NodeLocal DNSCache (runs DNS cache on every node)
  3. Tune ndots
# Enable NodeLocal DNSCache on EKS
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

Troubleshooting

Essential Commands

# Pod networking
kubectl exec -it <pod> -- ip addr
kubectl exec -it <pod> -- ip route
kubectl exec -it <pod> -- cat /etc/resolv.conf

# DNS testing
kubectl exec -it <pod> -- nslookup <service>
kubectl exec -it <pod> -- dig <service>.default.svc.cluster.local

# Connectivity testing
kubectl exec -it <pod> -- curl -v <service>:<port>
kubectl exec -it <pod> -- nc -zv <service> <port>

# Node networking
kubectl get nodes -o wide
ssh <node> ip route
ssh <node> iptables -t nat -L KUBE-SERVICES -n

# Service/Endpoints
kubectl get svc,endpoints -n <namespace>
kubectl describe svc <service>

# kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy
kubectl get configmap -n kube-system kube-proxy -o yaml

# CNI
kubectl logs -n kube-system -l k8s-app=aws-node  # VPC CNI
kubectl logs -n kube-system -l k8s-app=calico-node
kubectl logs -n kube-system -l k8s-app=cilium

Common Issues

SymptomLikely CauseCheck
Pod can’t resolve DNSCoreDNS down, NetworkPolicy blockingkubectl get pods -n kube-system -l k8s-app=kube-dns
Pod can’t reach Servicekube-proxy misconfigured, endpoints missingkubectl get endpoints <svc>
Cross-node Pod failureCNI overlay broken, node routingip route, CNI logs
External traffic failsLoadBalancer health check, security groupsAWS console, target group health
Slow DNSndots:5, CoreDNS overloaddig +trace, CoreDNS metrics

Network Policy Debugging

# Check if policies exist
kubectl get networkpolicies -A

# Describe policy
kubectl describe networkpolicy <name>

# Test connectivity (deploy debug pod)
kubectl run debug --image=nicolaka/netshoot -it --rm -- bash

Summary

Kubernetes networking is layered:

  1. CNI Plugin: Creates Pod network, assigns IPs, handles routing
  2. kube-proxy: Implements Services via iptables/IPVS/eBPF
  3. CoreDNS: Provides service discovery via DNS
  4. Ingress/Gateway: Routes external HTTP/S traffic
  5. Network Policies: Controls Pod-to-Pod communication

For EKS specifically:

  • VPC CNI is default – native VPC networking, watch for IP exhaustion
  • AWS Load Balancer Controller for ALB/NLB integration
  • Consider Cilium for eBPF benefits and better observability
  • Calico or Cilium required for Network Policies

The networking model is elegant in design but complex in implementation. Understanding the layers – and where packets actually flow – is essential for debugging production issues.


More Kubernetes deep-dives at CoderCo. Connect on LinkedIn for infrastructure patterns and networking war stories.

Found this helpful?

Comments