The Kubernetes ndots:5 Problem – Why DNS Lookups Take 15 Seconds

Your app is slow. Not CPU slow. Not memory slow. DNS slow.

You’ve deployed to Kubernetes, everything works, but external API calls that should take 50ms are taking 5-15 seconds. The culprit? A tiny setting called ndots:5 that’s been silently multiplying your DNS queries.

The Problem

By default, Kubernetes sets ndots:5 in every pod’s /etc/resolv.conf. This innocent-looking setting has massive performance implications.

Here’s what it looks like inside a pod:

$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

What ndots Actually Does

The ndots setting tells the resolver: “If a hostname has fewer than N dots, try appending the search domains first.”

With ndots:5, when your app tries to resolve api.stripe.com (which has 2 dots), the resolver thinks it might be a relative name. So it tries:

api.stripe.com.default.svc.cluster.local → NXDOMAIN
api.stripe.com.svc.cluster.local → NXDOMAIN
api.stripe.com.cluster.local → NXDOMAIN
api.stripe.com → SUCCESS ✓

That’s 4 DNS queries instead of 1. Each query might take 1-5ms locally, but factor in:

UDP packet loss and retries
CoreDNS under load
Upstream DNS latency
TCP fallback for truncated responses

Suddenly you’re looking at 100ms-15s of DNS overhead per external hostname.

Seeing It In Action

You can watch this happen with tcpdump:

# In one terminal, start capture
kubectl exec -it debug-pod -- tcpdump -n -i eth0 port 53

# In another, make a request
kubectl exec -it debug-pod -- curl https://api.stripe.com/v1/charges

You’ll see something like:

10:23:01.001 IP 10.1.2.3.45678 > 10.96.0.10.53: A? api.stripe.com.default.svc.cluster.local
10:23:01.003 IP 10.96.0.10.53 > 10.1.2.3.45678: NXDOMAIN
10:23:01.004 IP 10.1.2.3.45678 > 10.96.0.10.53: A? api.stripe.com.svc.cluster.local
10:23:01.006 IP 10.96.0.10.53 > 10.1.2.3.45678: NXDOMAIN
10:23:01.007 IP 10.1.2.3.45678 > 10.96.0.10.53: A? api.stripe.com.cluster.local
10:23:01.009 IP 10.96.0.10.53 > 10.1.2.3.45678: NXDOMAIN
10:23:01.010 IP 10.1.2.3.45678 > 10.96.0.10.53: A? api.stripe.com
10:23:01.015 IP 10.96.0.10.53 > 10.1.2.3.45678: A 54.187.174.169

Four queries for one hostname. Now multiply that by every external service your app calls.

The Fixes

Option 1: Use FQDNs (Quick Fix)

Add a trailing dot to force absolute lookups:

# In your app config
API_ENDPOINT: "api.stripe.com."  # Note the trailing dot

The trailing dot tells the resolver “this is a fully qualified domain name – don’t append search domains.”

Pros: Works immediately, no cluster changes Cons: Have to update every external hostname in your config

Option 2: Override ndots Per Pod (Recommended)

Set ndots:2 in your pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"
  containers:
    - name: app
      image: my-app:latest

With ndots:2, hostnames with 2+ dots (like api.stripe.com) are resolved directly. Internal service names (my-service.default) still work because they have fewer than 2 dots.

For Deployments:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "2"
      containers:
        - name: app
          image: my-app:latest

Option 3: Use dnsPolicy: ClusterFirstWithHostFallback

For pods that mostly talk to external services:

spec:
  dnsPolicy: "Default"  # Use node's DNS, not cluster DNS

Or keep cluster DNS but optimise:

spec:
  dnsPolicy: "ClusterFirst"
  dnsConfig:
    options:
      - name: ndots
        value: "1"
      - name: single-request-reopen
        value: ""

The single-request-reopen option helps with some DNS race conditions where A and AAAA queries interfere with each other.

Option 4: NodeLocal DNSCache (Cluster-Wide Fix)

For cluster-wide improvement, deploy NodeLocal DNSCache:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

This runs a DNS cache on every node, dramatically reducing:

CoreDNS load
Cross-node DNS traffic
Lookup latency

Queries hit the local cache first, and NXDOMAIN responses for search domain attempts are cached, making subsequent lookups fast.

The Nuclear Option: Reduce Search Domains

You can override the entire DNS config:

spec:
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 10.96.0.10  # CoreDNS
    searches:
      - default.svc.cluster.local
    options:
      - name: ndots
        value: "2"

This removes svc.cluster.local and cluster.local from the search path. Only do this if you understand the implications – some internal lookups might break.

Debugging DNS Issues

Check Current Settings

kubectl exec -it <pod> -- cat /etc/resolv.conf

Measure DNS Latency

kubectl exec -it <pod> -- time nslookup api.stripe.com

Watch DNS Queries

kubectl exec -it <pod> -- tcpdump -n port 53

Check CoreDNS Logs

kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

CoreDNS Metrics

If you have Prometheus, check:

coredns_dns_requests_total – Total queries
coredns_dns_responses_total{rcode="NXDOMAIN"} – Failed lookups (the search domain noise)
coredns_dns_request_duration_seconds – Latency histogram

A high ratio of NXDOMAIN responses to successful responses indicates the ndots problem.

Why ndots:5?

You might wonder why Kubernetes chose 5 as the default.

It’s because internal service DNS names can have up to 5 dots:

my-service.my-namespace.svc.cluster.local
    1          2        3     4      5

Setting ndots:5 ensures that even the longest internal names get the search domain treatment by default.

The assumption is that most lookups are internal. For many workloads, that’s wrong.

Our Standard Configuration

After dealing with this across multiple clusters, here’s our go-to configuration:

# deployment.yaml
spec:
  template:
    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "2"
          - name: single-request-reopen
            value: ""
      containers:
        - name: app
          # ...

Combined with NodeLocal DNSCache on every cluster.

This gives us:

Fast external lookups (direct resolution)
Working internal lookups (search domains for short names)
Cached NXDOMAIN responses (fast subsequent lookups)
Reduced CoreDNS load

Summary

Setting	External Queries	Internal Works	Effort
Default (ndots:5)	4 per hostname	✓	None
Trailing dot	1 per hostname	✓	Config changes
ndots:2	1 per hostname	✓	Pod spec change
NodeLocal DNS	1 (cached)	✓	Cluster addon

The fix is simple. The debugging isn’t. If your app is slow and you’ve ruled out the usual suspects, check your DNS. That ndots:5 might be silently killing your latency budget.

Further reading: The Kubernetes DNS specification and CoreDNS documentation cover more edge cases.