Ephemeral Containers for Production Debugging

Your production pods are running distroless images. No shell. No curl. No tcpdump.

Then something breaks.

The old approach: add debugging tools to the image, redeploy, wait for the rollout, reproduce the issue, hope it still happens. By then, the incident is over or the problem has changed.

Ephemeral containers solve this. Attach a debugging container to a running pod, with all the tools you need, without modifying the original container or restarting anything.

TL;DR

Ephemeral containers attach to running pods without restart
Perfect for distroless/minimal images that lack debugging tools
Share process namespace, network, and optionally filesystem
Available in Kubernetes 1.25+ (stable)
Use kubectl debug for easy access

What Are Ephemeral Containers?

Ephemeral containers are a special type of container that runs temporarily in an existing pod. Unlike regular containers:

They can be added to running pods
They can’t be removed once added (pod must be deleted)
They don’t have ports, liveness probes, or resource limits
They’re designed purely for debugging

Think of them as SSH-ing into a container, except you’re bringing your own toolbox.

Basic Usage

Debugging a Running Pod

# Attach a debug container with common tools
kubectl debug -it my-pod --image=busybox --target=my-container

# Or use a more complete debugging image
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container

The --target flag shares the process namespace with that container, so you can see and interact with its processes.

What You Get

Once attached, you can:

# See processes from the target container
ps aux

# Check network connections
netstat -tlnp
ss -tlnp

# Debug DNS
nslookup kubernetes.default
dig +short kubernetes.default.svc.cluster.local

# Inspect files (if sharing filesystem)
cat /proc/1/environ

# Capture network traffic
tcpdump -i any -n port 8080

# Check what the app is doing
strace -p 1

Debugging Images

Different debugging scenarios need different tools. Here are images I use:

General Purpose: netshoot

kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container

Includes: curl, wget, ping, nslookup, dig, tcpdump, netstat, iptables, strace, and more.

Minimal: busybox

kubectl debug -it my-pod --image=busybox:1.36 --target=my-container

Includes: Basic Unix tools, good for filesystem inspection.

Network Heavy: nicolaka/netshoot

kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container

Includes: iperf, mtr, nmap, tcptraceroute, and advanced network tools.

Custom Debug Image

Build your own with exactly what you need:

# Dockerfile.debug
FROM alpine:3.19

RUN apk add --no-cache \
    bash \
    curl \
    wget \
    bind-tools \
    tcpdump \
    strace \
    htop \
    vim \
    jq \
    postgresql-client \
    mysql-client \
    redis

# Add any custom scripts
COPY debug-scripts/ /usr/local/bin/

CMD ["sleep", "infinity"]

# Build and push
docker build -f Dockerfile.debug -t myregistry/debug:latest .
docker push myregistry/debug:latest

# Use it
kubectl debug -it my-pod --image=myregistry/debug:latest --target=my-container

Real Debugging Scenarios

Scenario 1: Network Connectivity Issues

App can’t reach an external service. Is it DNS? Firewall? The service itself?

# Attach with network tools
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container

# Check DNS resolution
nslookup external-service.com
dig external-service.com

# Check if port is reachable
nc -zv external-service.com 443
curl -v https://external-service.com/health

# Trace the route
mtr external-service.com

# Capture actual traffic
tcpdump -i any host external-service.com -w /tmp/capture.pcap

Scenario 2: Database Connection Pool Exhaustion

App is timing out on database queries. Is it the app or the database?

kubectl debug -it my-pod --image=myregistry/debug:latest --target=my-container

# Check current connections from this pod
netstat -an | grep 5432 | wc -l

# Test direct database connectivity
psql -h db-host -U user -d database -c "SELECT 1"

# Check for connection leaks
watch -n 1 'netstat -an | grep 5432 | grep ESTABLISHED | wc -l'

# Look at app's connection state
cat /proc/1/fd/* 2>/dev/null | grep -c socket

Scenario 3: Memory Issues

Pod is approaching memory limits but you can’t tell what’s consuming it.

kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container

# Check memory from inside
cat /proc/meminfo
cat /sys/fs/cgroup/memory/memory.usage_in_bytes

# If it's a JVM app, trigger heap dump (if JDK present in target)
# First, find the Java process
ps aux | grep java

# Alternative: check from /proc
ls -la /proc/1/fd | head -20
cat /proc/1/smaps | grep -A 2 heap

Scenario 4: Filesystem Investigation

Need to check what files an app created or is reading.

# Share the filesystem with the target container
kubectl debug -it my-pod \
  --image=busybox \
  --target=my-container \
  --share-processes

# Now you can see the target's filesystem via /proc
ls /proc/1/root/app/
cat /proc/1/root/app/config/settings.yaml

# Check open files
ls -la /proc/1/fd/

# Watch file access in real-time
# (requires strace in debug image)
strace -e trace=open,read,write -p 1

Advanced Patterns

Debug with Same Network Namespace

Sometimes you need to debug networking from the exact same network context:

kubectl debug -it my-pod \
  --image=nicolaka/netshoot \
  --target=my-container \
  --share-processes

This shares both process and network namespaces. You’ll see:

Same IP address
Same network interfaces
Same iptables rules
Can bind to ports (if not already in use)

Debug a CrashLooping Pod

If a pod keeps crashing, normal debug won’t help. Create a copy that doesn’t run the original command:

# Create a copy of the pod with a different command
kubectl debug my-crashing-pod -it \
  --copy-to=my-pod-debug \
  --container=my-container \
  --image=busybox \
  -- sh

# Now you're in a pod with the same config but running shell
# Check the filesystem, environment, etc.
env
cat /app/config.yaml

Debug a Node

Ephemeral containers can also debug nodes:

# Debug a node (creates a pod with host namespaces)
kubectl debug node/my-node -it --image=busybox

# Now you have access to the host
chroot /host

# Check host processes
ps aux

# Check host networking
iptables -L -n
ip route

# Check kubelet logs
journalctl -u kubelet -f

Practical Tips

1. Pre-approve Debug Images

Add your debug images to your allowed image list:

# Kyverno policy to allow debug images
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: allow-debug-images
spec:
  validationFailureAction: Enforce
  rules:
  - name: allow-ephemeral-debug
    match:
      resources:
        kinds:
        - Pod
    validate:
      message: "Debug images must be from approved list"
      pattern:
        spec:
          ephemeralContainers:
          - image: "nicolaka/netshoot | busybox:* | myregistry/debug:*"

2. Create Debug Aliases

# Add to ~/.bashrc or ~/.zshrc
alias kdebug='kubectl debug -it --image=nicolaka/netshoot'
alias kdebug-node='kubectl debug node -it --image=busybox'

# Usage
kdebug my-pod --target=my-container

3. RBAC for Debug Access

Control who can create ephemeral containers:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-debugger
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["patch", "update"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

4. Audit Ephemeral Container Usage

Track who’s debugging what:

# Audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["pods/ephemeralcontainers"]

Limitations

Be aware of what ephemeral containers can’t do:

No removal - Once added, ephemeral containers exist until pod deletion
No resource limits - They can consume unlimited resources
No restart - If the debug container exits, you need to create a new one
Security context - Inherits pod’s security context (may limit capabilities)
PodSecurityPolicy/Standards - May block certain debug images

Comparison with Alternatives

Method	Restart Required	Production Safe	Tool Flexibility
Ephemeral Containers	No	Yes	High
kubectl exec	No	Yes	Limited to image
Modify deployment	Yes	Risky	High
kubectl cp + run	No	Yes	Medium
Node SSH	No	Depends	Very High

Ephemeral containers hit the sweet spot: no restart, safe for production, full flexibility.

Security Considerations

Ephemeral containers are powerful - treat access seriously:

# Restrict to specific namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: debug-access
  namespace: staging  # Only staging, not production
subjects:
- kind: Group
  name: developers
roleRef:
  kind: Role
  name: pod-debugger
  apiGroup: rbac.authorization.k8s.io

For production:

Require approval workflows (via admission webhook)
Log all ephemeral container creations
Use read-only debug images where possible
Time-box debug sessions

Quick Reference

# Basic debug
kubectl debug -it POD --image=IMAGE --target=CONTAINER

# Copy pod with different command
kubectl debug POD -it --copy-to=DEBUG-POD --container=CONTAINER -- COMMAND

# Debug node
kubectl debug node/NODE -it --image=IMAGE

# List ephemeral containers in a pod
kubectl get pod POD -o jsonpath='{.spec.ephemeralContainers[*].name}'

# See ephemeral container logs
kubectl logs POD -c EPHEMERAL-CONTAINER-NAME

Conclusion

Ephemeral containers changed how I debug production issues. No more:

Adding debug tools to production images
Waiting for rollouts to investigate issues
SSH-ing to nodes and docker exec-ing into containers
Losing the reproduction window while deploying debug versions

The next time something breaks in production, don’t rebuild - just kubectl debug.

Ephemeral Containers for Production Debugging

Ephemeral Containers for Production Debugging

TL;DR

What Are Ephemeral Containers?

Basic Usage

Debugging a Running Pod

What You Get

Debugging Images

General Purpose: netshoot

Minimal: busybox

Network Heavy: nicolaka/netshoot

Custom Debug Image

Real Debugging Scenarios

Scenario 1: Network Connectivity Issues

Scenario 2: Database Connection Pool Exhaustion

Scenario 3: Memory Issues

Scenario 4: Filesystem Investigation

Advanced Patterns

Debug with Same Network Namespace

Debug a CrashLooping Pod

Debug a Node

Practical Tips

1. Pre-approve Debug Images

2. Create Debug Aliases

3. RBAC for Debug Access

4. Audit Ephemeral Container Usage

Limitations

Comparison with Alternatives

Security Considerations

Quick Reference

Conclusion

References

Comments

Ephemeral Containers for Production Debugging

TL;DR

What Are Ephemeral Containers?

Basic Usage

Debugging a Running Pod

What You Get

Debugging Images

General Purpose: netshoot

Minimal: busybox

Network Heavy: nicolaka/netshoot

Custom Debug Image

Real Debugging Scenarios

Scenario 1: Network Connectivity Issues

Scenario 2: Database Connection Pool Exhaustion

Scenario 3: Memory Issues

Scenario 4: Filesystem Investigation

Advanced Patterns

Debug with Same Network Namespace

Debug a CrashLooping Pod

Debug a Node

Practical Tips

1. Pre-approve Debug Images

2. Create Debug Aliases

3. RBAC for Debug Access

4. Audit Ephemeral Container Usage

Limitations

Comparison with Alternatives

Security Considerations

Quick Reference

Conclusion

References

Related Posts

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

OpenTelemetry Changed How I Think About Observability

Building an Automated Multi-Account AWS Architecture with Control Tower and Terraform

Comments