feat: Node-local DNS cache support #550

mumoshu · 2019-02-18T06:59:32Z

TL;DR; This is the smallest change to allow enabling node-local DNS cache on eksctl-created nodes.

What

Add a new field named clusterDNS that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the --cluster-dns flag of kubelet.

nodeGroups:
- name: nodegroup1
  clusterDNS: 169.254.20.10
  # snip

This, in combination with k8s-dns-node-cache deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.

Notes

The configuration key clusterDNS is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.
It would also be nice to add clusterDNS as a cluster-level config key later. But I believe it isn't a must-have in this change.

Usage

See the cluster/addons/dns/nodelocaldns in the upstream repository for more details.

A concrete steps to enable node-local DNS would look like the below:

Decide the which IP addr to be used for binding the node-local DNS. Typically this is 169.254.20.10
Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config
Deploy nodelocaldns.yaml, replacing:
__PILLAR__LOCAL__DNS__ with 169.254.169.254, __PILLAR__DNS__DOMAIN__ with cluster.local, __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 according to your VPC CIDR

Resolves #542

Add a new field named `clusterDNS` that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the `--cluster-dns` flag of `kubelet`. ```yaml nodeGroups: - name: nodegroup1 clusterDNS: 169.254.20.10 # snip ``` This, in combination with `k8s-dns-node-cache` deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability. The configuration key `clusterDNS` is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload. It would also be nice to add `clusterDNS` as a cluster-level config key later. But I believe it isn't a must-have in this change. See the [cluster/addons/dns/nodelocaldns](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/nodelocaldns) in the upstream repository for more details. A concrete steps to enable node-local DNS would look like the below: - Decide the which IP addr to be used for binding the node-local DNS. Typically this is `169.254.20.10` - Add `clusterDNS: 169.254.20.10` to your nodegroup in the cluster config - Deploy [nodelocaldns.yaml](https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml), replacing: `__PILLAR__LOCAL__DNS__` with `169.254.169.254`, `__PILLAR__DNS__DOMAIN__` with `cluster.local`, `__PILLAR__DNS__SERVER__` with [`10.100.0.10` or `172.20.0.10`](https://github.com/weaveworks/eksctl/blob/master/pkg/nodebootstrap/userdata.go#L87-L94) according to your VPC CIDR Resolves eksctl-io#542

mumoshu · 2019-02-18T07:55:23Z

For anyone interested, this is how I've verified this to work.

Add your nodegroup clusterDNS: 169.254.20.10

nodeGroups:
  - name: nodegroup1
    clusterDNS: "169.254.20.10"
    instanceType: m4.large
   # and whatever you like

After bringing up your cluster, kubectl run a pod and try resolving any host name from it. I've used an ubuntu container and a yum update, but whatever works.

This should fail, as kubelet points cluster DNS to 169.254.20.10 as we configured in the cluster.yaml, but nothing binds it yet:

Deploy node-local-dns. After the deployment, any DNS lookup against 169.254.20.10 should work, as the node-local-dns binds it!

$ curl -L https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml | sed 's/__PILLAR__DNS__DOMAIN__g/cluster.local/' | sed 's/__PILLAR__LOCAL__DNS__/169.254.20.10/' | sed 's/__PILLAR__DNS__SERVER__/172.20.0.10/' > node-local-dns.yaml

$ k apply -f node-local-dns.yaml

Repeat the step 2, but expect successful DNS lookup this time:

$ kru
If you don't see a command prompt, try pressing enter.
root@xenial-1550475807:/#
root@xenial-1550475807:/# apt-get update -y
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://security.ubuntu.com/ubuntu xenial-security InRelease [109 kB]
Get:3 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [785 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:6 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
*snip*

errordeveloper · 2019-02-20T07:21:30Z

Thanks @mumoshu, also thanks for keeping this small.

errordeveloper · 2019-02-20T07:23:26Z

pkg/nodebootstrap/userdata.go

+func clusterDNS(spec *api.ClusterConfig, ng *api.NodeGroup) string {
+	if ng.ClusterDNS != "" {
+		return ng.ClusterDNS
+	}
 	// Default service network is 10.100.0.0, but it gets set 172.20.0.0 automatically when pod network
 	// is anywhere within 10.0.0.0/8
 	if spec.VPC.CIDR != nil && spec.VPC.CIDR.IP[0] == 10 {


As a future improvement, we should probably move this into struct defaulting code path, and set the field to the default value. That is so the struct fully represents what's going on.

rsyvarth · 2019-02-21T06:47:26Z

Just in case anyone else is implementing DNS caching using instructions here, there is a typo in mumoshu's instructions. Replace __PILLAR__LOCAL__DNS__ with 169.254.20.10 not 169.254.169.254

mumoshu · 2019-02-21T08:46:47Z

@rsyvarth Oh! Good catch! Seems like I was too used to type 169.254.169.254.

mumoshu · 2019-02-27T13:21:01Z

For anyone interested in this feature - an alternative way to use the node local DNS would be to specify dnsPolicy: None and dnsConfig in your pod spec:

For example, this:

    dnsConfig:
        nameservers:
        - 169.254.20.10
        searches:
        - default.svc.cluster.local
        - svc.cluster.local
        - cluster.local
        - ap-northeast-1.compute.internal
        - us-west-2.compute.internal
        options:
        - name: attempts
          value: "3"
        - name: timeout
          value: "1"
        - name: rotate
      dnsPolicy: None

would result in pods having /etc/resolv.conf whose content is:

$ cat /etc/resolv.conf
nameserver 169.254.20.10
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal us-west-2.compute.internal
options attempts:3 timeout:1 rotate

whereas the default one used when I omit dnsConfig in my region(ap-northeast-1) is:

$ cat /etc/resolv.conf
nameserver 172.20.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal us-west-2.compute.internal
options ndots:5

The benefit of dnsConfig over the usage of kubelet's --cluster-dns flag would be that you can gradually migrate to the node-local dns cache.

You can even add the default cluster DNS as the secondary DNS server:

    dnsConfig:
        nameservers:
        - 169.254.20.10
        - 172.20.0.10

This way, even while the node-local dns is down due to a rolling update or a transient failure you'll be unlikely to notice the down time because your DNS client(from e.g. glibc) is likely to handle the failure with retries and/or parallel query(I believe it depends on the DNS client you rely on).

StevenACoffman · 2019-04-24T21:43:02Z

@mumoshu

Is it possible to add the default cluster DNS as the secondary DNS server for the node this way:

nodeGroups:
  - name: nodegroup1
    clusterDNS: "169.254.20.10,172.20.0.10"
    instanceType: m4.large
   # and whatever you like

Since kubelet options are:

Kubelet Option	Description
`--cluster-dns stringSlice`	Comma-separated list of DNS server IP address.

This way, even while the node-local dns is down due to a rolling update or a transient failure you'll be unlikely to notice the down time because your DNS client(from e.g. glibc) is likely to handle the failure with retries and/or parallel query(I believe it depends on the DNS client you rely on).

StevenACoffman · 2019-04-25T12:26:37Z

Seems like it works! Hurrah!

StevenACoffman · 2019-04-25T13:28:46Z

The KEP warns against this for some reason:

Populating both the nodelocal cache ip address and kube-dns ip address in resolv.conf is not a reliable option. Depending on underlying implementation, this can result in kube-dns being queried only if cache ip does not respond, or both queried simultaneously.

If musl queries nodelocal cache via tcp and kube-dns via udp in parallel, and one responds quickly and the other times out, why is that a problem?

mumoshu · 2019-05-17T00:31:03Z

@StevenACoffman I guess you'll be interested on this line in the updated KEP https://github.com/kubernetes/enhancements/pull/1005/files#diff-a43ddcc01ee886cc9ca7c60a0900e436R166

The problem:

so if we use both kube-dns IP as well as the link-local IP used by NodeLocal DNSCache, we could make the DNS query explosion problem worse. More queries means more conntrack entries and more DNATs.

The updated KEP also note:

This workaround could be viable for client implementations that do round-robin.

But when I experimented I saw on recent distros the resolver worked parallely.

So my best bet that would work TODAY is that you have two nodelocal-dns daemonset, each listens on different virtual IP. All the application pods that wants to leverage the H/A nodelocal-dns-cache, the pod spec must be updated to include a dnsConfig section with the two virtual IPs in it.

mumoshu · 2019-05-17T03:44:07Z

I have one more alternative: Just try running two nodelocaldns, both listening on the same virtual IP address and the port. Surveying github issues and the codedns code, it turned out that coredns sets SO_REUSEPORT whereas available.

So two or more nodelocal dns pods with the same IP:Port should just work. This allows you to use H/A nodelocaldns behind a single IP address for your cluster dns(set via a kubelet option), which is automatically used when dnsConfig is inexistent in your pod specs.

StevenACoffman · 2019-05-17T11:17:40Z

Thank you! That last alternative is very interesting.
By the way, In order to successfully start a fresh cluster with clusterDNS: "169.254.20.10" initially:

eksctl create cluster --config-file=$EKSCONFIGFILE --without-nodegroup
Apply nodelocaldns and any aws-node fixes using kubectl
eksctl create nodegroup --config-file=$EKSCONFIGFILE

If the nodegroups are created before nodelocaldns is applied, they will not come up successfully.

…error Clarify error message when unsupported volume capabilities are provided

mumoshu force-pushed the node-local-dns-cache-support branch 2 times, most recently from 5eeb11d to cd9341f Compare February 18, 2019 07:05

mumoshu force-pushed the node-local-dns-cache-support branch from cd9341f to 58c4446 Compare February 18, 2019 07:06

mumoshu changed the title ~~feat: Node-local DNS cache support~~ wip: feat: Node-local DNS cache support Feb 18, 2019

mumoshu changed the title ~~wip: feat: Node-local DNS cache support~~ feat: Node-local DNS cache support Feb 18, 2019

errordeveloper approved these changes Feb 20, 2019

View reviewed changes

errordeveloper merged commit fe7d351 into eksctl-io:master Feb 20, 2019

errordeveloper reviewed Feb 20, 2019

View reviewed changes

mumoshu deleted the node-local-dns-cache-support branch February 27, 2019 13:10

StevenACoffman mentioned this pull request Apr 24, 2019

Node-local DNS cache support #542

Closed

Vlaaaaaaad mentioned this pull request Jul 17, 2019

[EKS] [request]: Nodelocal DNS Cache aws/containers-roadmap#303

Open

torredil pushed a commit to torredil/eksctl that referenced this pull request May 20, 2022

Merge pull request eksctl-io#550 from huffmanca/enhance-capabilities-…

e2a50ad

…error Clarify error message when unsupported volume capabilities are provided

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Node-local DNS cache support #550

feat: Node-local DNS cache support #550

mumoshu commented Feb 18, 2019 •

edited

Loading

mumoshu commented Feb 18, 2019

errordeveloper commented Feb 20, 2019

errordeveloper Feb 20, 2019

rsyvarth commented Feb 21, 2019

mumoshu commented Feb 21, 2019 •

edited

Loading

mumoshu commented Feb 27, 2019

StevenACoffman commented Apr 24, 2019 •

edited

Loading

StevenACoffman commented Apr 25, 2019

StevenACoffman commented Apr 25, 2019 •

edited

Loading

mumoshu commented May 17, 2019

mumoshu commented May 17, 2019

StevenACoffman commented May 17, 2019

feat: Node-local DNS cache support #550

feat: Node-local DNS cache support #550

Conversation

mumoshu commented Feb 18, 2019 • edited Loading

What

Notes

Usage

mumoshu commented Feb 18, 2019

errordeveloper commented Feb 20, 2019

errordeveloper Feb 20, 2019

Choose a reason for hiding this comment

rsyvarth commented Feb 21, 2019

mumoshu commented Feb 21, 2019 • edited Loading

mumoshu commented Feb 27, 2019

StevenACoffman commented Apr 24, 2019 • edited Loading

StevenACoffman commented Apr 25, 2019

StevenACoffman commented Apr 25, 2019 • edited Loading

mumoshu commented May 17, 2019

mumoshu commented May 17, 2019

StevenACoffman commented May 17, 2019

mumoshu commented Feb 18, 2019 •

edited

Loading

mumoshu commented Feb 21, 2019 •

edited

Loading

StevenACoffman commented Apr 24, 2019 •

edited

Loading

StevenACoffman commented Apr 25, 2019 •

edited

Loading