Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow DNS resolution failures in the alpine:latest attached to a non-default bridge network #313

Closed
tagirb opened this issue Jul 24, 2017 · 5 comments
Labels

Comments

@tagirb
Copy link

tagirb commented Jul 24, 2017

When running an alpine:latest container attached to a non-default bridged network, meaning it has the Docker internal DNS enabled, I observe that DNS resolution failures are processed very slowly.

Steps to reproduce:

  1. Create a new bridged network:
$ docker network create --driver bridge alpine_test
$ docker network inspect alpine_test 
[
    {
        "Name": "alpine_test",
        "Id": "a1bf8d14aa4b6918a6810f93f46dcf953d91a31532544d7eac89760cccfbcdda",
        "Created": "2017-07-24T10:36:56.348996363+02:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.20.0.0/16",
                    "Gateway": "172.20.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]
  1. Run an alpine:latest container attached to this network:
$ docker run -it --network alpine_test alpine
/ #
  1. Try resolving an unknown hostname as compared with known hostnames:
/ # time getent hosts unknown_host
Command exited with non-zero status 2
real    0m 10.00s
user    0m 0.00s
sys     0m 0.00s
/ # time getent hosts google.com
2a00:1450:4001:820::200e  google.com  google.com
real    0m 0.05s
user    0m 0.00s
sys     0m 0.00s
/ # time getent hosts $(hostname)
172.20.0.2        846b9c66c58f  846b9c66c58f
real    0m 0.00s
user    0m 0.00s
sys     0m 0.00s

strace-ing this command shows multiple DNS requests and SERVFAIL replies during this process (see the attached file strace.txt ).

Software versions:

$ docker --version
Docker version 17.05.0-ce, build 89658be
$ uname  -rv
4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017
$ cat /etc/os-release 
NAME="Linux Mint"
VERSION="18.2 (Sonya)"
ID=linuxmint
ID_LIKE=ubuntu
PRETTY_NAME="Linux Mint 18.2"
VERSION_ID="18.2"
HOME_URL="http://www.linuxmint.com/"
SUPPORT_URL="http://forums.linuxmint.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/linuxmint/"
VERSION_CODENAME=sonya
UBUNTU_CODENAME=xenial

This does not happen if the default bridge network is used, where /etc/resolv.conf points to 8.8.8.8 and 8.8.4.4.

@tagirb
Copy link
Author

tagirb commented Jul 24, 2017

Further observations:

  • Running centos:latest results in multiple DNS requests as well. However, these get answered much faster, that is, within 0.1 s at my system.
  • Same happens when using the --network host Docker option.
  • Images centos:latest and ubuntu:latest do not have this issue.
  • tcpdump also shows some really weird queries containing apparently random character sequences during the container startup and shutdown:
13:52:07.761760 IP 127.0.0.1.32151 > 127.0.1.1.53: 50468+ A? nyqtvpnmnuzwvy. (32)
13:52:07.762060 IP 127.0.0.1.56184 > 127.0.1.1.53: 57235+ A? yuoyizvh. (26)
13:52:07.762296 IP 127.0.1.1.53 > 127.0.0.1.32151: 50468 ServFail 0/0/0 (32)
13:52:07.762399 IP 127.0.0.1.13377 > 127.0.1.1.53: 31969+ A? lbinbfgjcfssof. (32)
13:52:07.762533 IP 127.0.0.1.52391 > 127.0.1.1.53: 5419+ A? nyqtvpnmnuzwvy. (32)
13:52:07.762537 IP 127.0.1.1.53 > 127.0.0.1.56184: 57235 ServFail 0/0/0 (26)
13:52:07.762607 IP 127.0.0.1.39877 > 127.0.1.1.53: 56596+ A? yuoyizvh. (26)
13:52:07.762963 IP 127.0.1.1.53 > 127.0.0.1.13377: 31969 ServFail 0/0/0 (32)
13:52:07.763021 IP 127.0.1.1.53 > 127.0.0.1.52391: 5419 ServFail 0/0/0 (32)
13:52:07.763045 IP 127.0.1.1.53 > 127.0.0.1.39877: 56596 ServFail 0/0/0 (26)
13:52:07.763141 IP 127.0.0.1.37716 > 127.0.1.1.53: 61231+ A? lbinbfgjcfssof. (32)
13:52:07.763547 IP 127.0.0.1.53119 > 127.0.1.1.53: 33293+ A? nyqtvpnmnuzwvy. (32)
13:52:07.763657 IP 127.0.1.1.53 > 127.0.0.1.37716: 61231 ServFail 0/0/0 (32)
13:52:07.763671 IP 127.0.0.1.52634 > 127.0.1.1.53: 14470+ A? yuoyizvh. (26)
13:52:07.763994 IP 127.0.0.1.53713 > 127.0.1.1.53: 5245+ A? lbinbfgjcfssof. (32)
13:52:07.764196 IP 127.0.1.1.53 > 127.0.0.1.53119: 33293 ServFail 0/0/0 (32)
13:52:07.764245 IP 127.0.1.1.53 > 127.0.0.1.52634: 14470 ServFail 0/0/0 (26)
13:52:07.764341 IP 127.0.0.1.43291 > 127.0.1.1.53: 33293+ A? nyqtvpnmnuzwvy. (32)
13:52:07.764381 IP 127.0.0.1.60569 > 127.0.1.1.53: 14470+ A? yuoyizvh. (26)
13:52:07.764494 IP 127.0.1.1.53 > 127.0.0.1.53713: 5245 ServFail 0/0/0 (32)
13:52:07.764632 IP 127.0.0.1.59036 > 127.0.1.1.53: 5245+ A? lbinbfgjcfssof. (32)
13:52:07.764934 IP 127.0.1.1.53 > 127.0.0.1.43291: 33293 ServFail 0/0/0 (32)
13:52:07.764981 IP 127.0.1.1.53 > 127.0.0.1.60569: 14470 ServFail 0/0/0 (26)
13:52:07.765089 IP 127.0.1.1.53 > 127.0.0.1.59036: 5245 ServFail 0/0/0 (32)

Looks like musl's DNS resolver is going crazy for some reason.

@thomaspeitz
Copy link

thomaspeitz commented Aug 2, 2017

We are observing the same. Running Kubernetes 1.5/1.6/1.7 with calico or weave. All have the same problem with alpine images. Switching to another distro like debian with docker image solves our problem. (Happening with Alpine 3.4/3.5/3.6)

@tagirb
Copy link
Author

tagirb commented May 15, 2018

Seems to have resolved by itself -- not happening with the latest Alpine. Thanks anyway :)

@riturajb
Copy link

riturajb commented Jun 8, 2018

Facing the same issue even on Alpine 3.7

@inter169
Copy link

inter169 commented Aug 2, 2018

Alpine linux is using musl libc to resolve the DNS, it sends both A query and AAAA query concurrently by default.
in my case no need to query AAAA by default, and I've coded a fix that removed the AAAA by default (for the AF_UNSPEC family) just did an A query, and it worked for me.

I posted the solution here kubernetes/kubernetes#56903 (comment)

hope it can help

thanks
harper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants