Ephemeral Containers for Production Debugging
Your production pods are running distroless images. No shell. No curl. No tcpdump.
Then something breaks.
The old approach: add debugging tools to the image, redeploy, wait for the rollout, reproduce the issue, hope it still happens. By then, the incident is over or the problem has changed.
Ephemeral containers solve this. Attach a debugging container to a running pod, with all the tools you need, without modifying the original container or restarting anything.
TL;DR
- Ephemeral containers attach to running pods without restart
- Perfect for distroless/minimal images that lack debugging tools
- Share process namespace, network, and optionally filesystem
- Available in Kubernetes 1.25+ (stable)
- Use
kubectl debugfor easy access
What Are Ephemeral Containers?
Ephemeral containers are a special type of container that runs temporarily in an existing pod. Unlike regular containers:
- They can be added to running pods
- They can’t be removed once added (pod must be deleted)
- They don’t have ports, liveness probes, or resource limits
- They’re designed purely for debugging
Think of them as SSH-ing into a container, except you’re bringing your own toolbox.
Basic Usage
Debugging a Running Pod
# Attach a debug container with common tools
kubectl debug -it my-pod --image=busybox --target=my-container
# Or use a more complete debugging image
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container
The --target flag shares the process namespace with that container, so you can see and interact with its processes.
What You Get
Once attached, you can:
# See processes from the target container
ps aux
# Check network connections
netstat -tlnp
ss -tlnp
# Debug DNS
nslookup kubernetes.default
dig +short kubernetes.default.svc.cluster.local
# Inspect files (if sharing filesystem)
cat /proc/1/environ
# Capture network traffic
tcpdump -i any -n port 8080
# Check what the app is doing
strace -p 1
Debugging Images
Different debugging scenarios need different tools. Here are images I use:
General Purpose: netshoot
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container
Includes: curl, wget, ping, nslookup, dig, tcpdump, netstat, iptables, strace, and more.
Minimal: busybox
kubectl debug -it my-pod --image=busybox:1.36 --target=my-container
Includes: Basic Unix tools, good for filesystem inspection.
Network Heavy: nicolaka/netshoot
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container
Includes: iperf, mtr, nmap, tcptraceroute, and advanced network tools.
Custom Debug Image
Build your own with exactly what you need:
# Dockerfile.debug
FROM alpine:3.19
RUN apk add --no-cache \
bash \
curl \
wget \
bind-tools \
tcpdump \
strace \
htop \
vim \
jq \
postgresql-client \
mysql-client \
redis
# Add any custom scripts
COPY debug-scripts/ /usr/local/bin/
CMD ["sleep", "infinity"]
# Build and push
docker build -f Dockerfile.debug -t myregistry/debug:latest .
docker push myregistry/debug:latest
# Use it
kubectl debug -it my-pod --image=myregistry/debug:latest --target=my-container
Real Debugging Scenarios
Scenario 1: Network Connectivity Issues
App can’t reach an external service. Is it DNS? Firewall? The service itself?
# Attach with network tools
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container
# Check DNS resolution
nslookup external-service.com
dig external-service.com
# Check if port is reachable
nc -zv external-service.com 443
curl -v https://external-service.com/health
# Trace the route
mtr external-service.com
# Capture actual traffic
tcpdump -i any host external-service.com -w /tmp/capture.pcap
Scenario 2: Database Connection Pool Exhaustion
App is timing out on database queries. Is it the app or the database?
kubectl debug -it my-pod --image=myregistry/debug:latest --target=my-container
# Check current connections from this pod
netstat -an | grep 5432 | wc -l
# Test direct database connectivity
psql -h db-host -U user -d database -c "SELECT 1"
# Check for connection leaks
watch -n 1 'netstat -an | grep 5432 | grep ESTABLISHED | wc -l'
# Look at app's connection state
cat /proc/1/fd/* 2>/dev/null | grep -c socket
Scenario 3: Memory Issues
Pod is approaching memory limits but you can’t tell what’s consuming it.
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-container
# Check memory from inside
cat /proc/meminfo
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
# If it's a JVM app, trigger heap dump (if JDK present in target)
# First, find the Java process
ps aux | grep java
# Alternative: check from /proc
ls -la /proc/1/fd | head -20
cat /proc/1/smaps | grep -A 2 heap
Scenario 4: Filesystem Investigation
Need to check what files an app created or is reading.
# Share the filesystem with the target container
kubectl debug -it my-pod \
--image=busybox \
--target=my-container \
--share-processes
# Now you can see the target's filesystem via /proc
ls /proc/1/root/app/
cat /proc/1/root/app/config/settings.yaml
# Check open files
ls -la /proc/1/fd/
# Watch file access in real-time
# (requires strace in debug image)
strace -e trace=open,read,write -p 1
Advanced Patterns
Debug with Same Network Namespace
Sometimes you need to debug networking from the exact same network context:
kubectl debug -it my-pod \
--image=nicolaka/netshoot \
--target=my-container \
--share-processes
This shares both process and network namespaces. You’ll see:
- Same IP address
- Same network interfaces
- Same iptables rules
- Can bind to ports (if not already in use)
Debug a CrashLooping Pod
If a pod keeps crashing, normal debug won’t help. Create a copy that doesn’t run the original command:
# Create a copy of the pod with a different command
kubectl debug my-crashing-pod -it \
--copy-to=my-pod-debug \
--container=my-container \
--image=busybox \
-- sh
# Now you're in a pod with the same config but running shell
# Check the filesystem, environment, etc.
env
cat /app/config.yaml
Debug a Node
Ephemeral containers can also debug nodes:
# Debug a node (creates a pod with host namespaces)
kubectl debug node/my-node -it --image=busybox
# Now you have access to the host
chroot /host
# Check host processes
ps aux
# Check host networking
iptables -L -n
ip route
# Check kubelet logs
journalctl -u kubelet -f
Practical Tips
1. Pre-approve Debug Images
Add your debug images to your allowed image list:
# Kyverno policy to allow debug images
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: allow-debug-images
spec:
validationFailureAction: Enforce
rules:
- name: allow-ephemeral-debug
match:
resources:
kinds:
- Pod
validate:
message: "Debug images must be from approved list"
pattern:
spec:
ephemeralContainers:
- image: "nicolaka/netshoot | busybox:* | myregistry/debug:*"
2. Create Debug Aliases
# Add to ~/.bashrc or ~/.zshrc
alias kdebug='kubectl debug -it --image=nicolaka/netshoot'
alias kdebug-node='kubectl debug node -it --image=busybox'
# Usage
kdebug my-pod --target=my-container
3. RBAC for Debug Access
Control who can create ephemeral containers:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-debugger
namespace: production
rules:
- apiGroups: [""]
resources: ["pods/ephemeralcontainers"]
verbs: ["patch", "update"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
4. Audit Ephemeral Container Usage
Track who’s debugging what:
# Audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
resources:
- group: ""
resources: ["pods/ephemeralcontainers"]
Limitations
Be aware of what ephemeral containers can’t do:
- No removal - Once added, ephemeral containers exist until pod deletion
- No resource limits - They can consume unlimited resources
- No restart - If the debug container exits, you need to create a new one
- Security context - Inherits pod’s security context (may limit capabilities)
- PodSecurityPolicy/Standards - May block certain debug images
Comparison with Alternatives
| Method | Restart Required | Production Safe | Tool Flexibility |
|---|---|---|---|
| Ephemeral Containers | No | Yes | High |
| kubectl exec | No | Yes | Limited to image |
| Modify deployment | Yes | Risky | High |
| kubectl cp + run | No | Yes | Medium |
| Node SSH | No | Depends | Very High |
Ephemeral containers hit the sweet spot: no restart, safe for production, full flexibility.
Security Considerations
Ephemeral containers are powerful - treat access seriously:
# Restrict to specific namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: debug-access
namespace: staging # Only staging, not production
subjects:
- kind: Group
name: developers
roleRef:
kind: Role
name: pod-debugger
apiGroup: rbac.authorization.k8s.io
For production:
- Require approval workflows (via admission webhook)
- Log all ephemeral container creations
- Use read-only debug images where possible
- Time-box debug sessions
Quick Reference
# Basic debug
kubectl debug -it POD --image=IMAGE --target=CONTAINER
# Copy pod with different command
kubectl debug POD -it --copy-to=DEBUG-POD --container=CONTAINER -- COMMAND
# Debug node
kubectl debug node/NODE -it --image=IMAGE
# List ephemeral containers in a pod
kubectl get pod POD -o jsonpath='{.spec.ephemeralContainers[*].name}'
# See ephemeral container logs
kubectl logs POD -c EPHEMERAL-CONTAINER-NAME
Conclusion
Ephemeral containers changed how I debug production issues. No more:
- Adding debug tools to production images
- Waiting for rollouts to investigate issues
- SSH-ing to nodes and docker exec-ing into containers
- Losing the reproduction window while deploying debug versions
The next time something breaks in production, don’t rebuild - just kubectl debug.