Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS does not work in containers if host uses local server #3277

Closed
nertpinx opened this issue Jun 7, 2019 · 32 comments · Fixed by #3305
Closed

DNS does not work in containers if host uses local server #3277

nertpinx opened this issue Jun 7, 2019 · 32 comments · Fixed by #3305
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@nertpinx
Copy link

nertpinx commented Jun 7, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Steps to reproduce the issue:

  1. Be in a network that prohibits external DNS queries, disable external DNS communication or just use some only-locally available hostname in step 3.

  2. Setup local DNS server/forwarder (e.g. systemd-resolved) so that the local address is in /etc/resolv.conf

  3. Start any container (without --network host) and try to resolve a hostname (e.g. podman run --rm -it fedora curl -v ifconfig.me)

Describe the results you received:
curl: (6) Could not resolve host: ifconfig.me

Describe the results you expected:
No error (some IP address)

Additional information you deem important (e.g. issue happens only occasionally):
The contents of /etc/resolv.conf are:

search virt
nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 2001:4860:4860::8888
nameserver 2001:4860:4860::8844
nameserver 10.0.2.3
options edns0

Which would normally work (although I might not want to send my DNS requests somewhere else because I might have services available in a local network), but I am in a network that prohibits external DNS queries, so that doesn't work.

If I leave just the slirp4netns nameserver there (echo nameserver 10.0.2.3 >/etc/resolv.conf) it works in a VM where I am trying to reproduce this issue. However on my original host, where I discovered this, 10.0.2.3 is still inaccessible (even though the version and the command-line of slirp4netns is identical, apart from the PID argument).

Output of podman version:

Version:            1.3.1
RemoteAPI Version:  1
Go Version:         go1.12.2
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.2
  podman version: 1.3.1
host:
  BuildahVersion: 1.8.2
  Conmon:
    package: podman-1.3.1-1.git7210727.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.12.0-dev, commit: c9a4c48d1bff85033b7fc9b62d25961dd5048689'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 2884521984
  MemTotal: 4133556224
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 644870144
  SwapTotal: 644870144
  arch: amd64
  cpus: 4
  hostname: fedora30.virt
  kernel: 5.0.9-301.fc30.x86_64
  os: linux
  rootless: true
  uptime: 19m 40.53s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/nert/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /home/nert/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 1
  RunRoot: /tmp/1000
  VolumePath: /home/nert/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
I am trying this in a Fedora 30 VM, clean install, as that is the easiest and cleanest reproducer I can get. I cannot reproduce the issue related to my local environment in there.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 7, 2019
@nertpinx
Copy link
Author

nertpinx commented Jun 7, 2019

@giuseppe This is the issue we were talking about, I hope it has all the information that is related, feel free to ask for any additional info, I will gladly provide it.

@mheon
Copy link
Member

mheon commented Jun 7, 2019

For root containers, if you have 127.0.0.1 in resolv.conf, we remove it (and add default nameservers if there are none remaining). We can't expect to be able to connect to a DNS server running on the host's localhost address; it might not be listening on the bridge we created (this last bit isn't really relevant for rootless containers, so it could be safe there?).

@nertpinx
Copy link
Author

@mheon Well, clearly slirp4netns, as the userspace process running in the default network namespace, should be able to connect to the same nameservers as other processes in the default net namespace, so I see no reason for having google dns servers in /etc/resolv.conf. I just do not know if that is done by slirp4netns or podman.

@rhatdan
Copy link
Member

rhatdan commented Jun 10, 2019

That would most certainly be podman doing the resolv.conf.

@nertpinx
Copy link
Author

That would most certainly be podman doing the resolv.conf.

Great! So did I miss any reasoning behind providing default google nameservers with slirp4netns instead of just using 10.0.2.3?

@rhatdan
Copy link
Member

rhatdan commented Jun 10, 2019

@giuseppe Any ideas?

@mheon
Copy link
Member

mheon commented Jun 10, 2019

@nertpinx So, to clarify - what does your host system's resolv.conf look like? 127.0.0.1, then 10.0.2.3? And we're dropping 127.0.0.1 in favor of the Google DNS servers, despite 10.0.2.3 being in resolv.conf already?

@nertpinx
Copy link
Author

No, 10.0.2.3 is the slirp4netns' DNS provided on the emulated network stack that works. On my machine resolvconf has only ::1, on the clean fedora install with systemd-resolved properly applied it has only 127.0.0.53.

@mheon
Copy link
Member

mheon commented Jun 10, 2019

Aha. Alright, I think we're probably seeing a bad interaction between our resolv.conf handling and the handling in slirp4netns, then.

@giuseppe
Copy link
Member

I think the issue is that we block slirp4netns from accessing the loopback device for security reasons. If you check, you'll see that we are passing an explicit --disable-host-loopback to the slirp4netns process.

@nertpinx
Copy link
Author

I would guess that forbids accessing the host directly (through 10.0.2.2) from the child namespace, not the slirp4netns process. And trying it out it really is the case, if I leave 10.0.2.3 in resolv.conf it works nicely (on the fedora reproducer VM).

@mheon mheon added the rootless label Jun 11, 2019
giuseppe added a commit to giuseppe/libpod that referenced this issue Jun 12, 2019
When using slirp4netns, be sure the built-in DNS server is the first
one to be used.

Closes: containers#3277

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

we could probably just drop all the other DNS servers and use only 10.0.2.3 but let's be safe and keep the other DNS servers around. I've changed it so now 10.0.2.3 is the first in the list:

#3305

@nertpinx, does it work if you place it as the first entry in the /etc/resolv.conf file?

@nertpinx
Copy link
Author

@giuseppe Yes it does (at least the main issue), thank you!

@intelfx
Copy link

intelfx commented Mar 12, 2020

@giuseppe

does it work if you place it as the first entry in the /etc/resolv.conf file?

This does not always work. The nameserver is chosen at random (or at least it is not defined how it is chosen), so chances are that a generic nameserver will be chosen instead of 10.0.2.3.

This has just hit me. Host resolv.conf:

$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0
search bu1.loc

Container resolv.conf:

/ # cat /etc/resolv.conf
search bu1.loc
nameserver 10.0.2.3
nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 2001:4860:4860::8888
nameserver 2001:4860:4860::8844
options edns0

Host is connected to a corporate VPN which provides a custom nameserver that resolves several internal-only domains. The semantics of "try all nameservers" is enforced via systemd-resolved on the host side. However, connections to internal domains from the container can (and do) arbitrarily fail. Manually changing the container's resolv.conf to only include the 10.0.2.3 resolves this issue.

we could probably just drop all the other DNS servers and use only 10.0.2.3

As far as I can see, this would be the fully correct behavior. Please reconsider.

@mheon
Copy link
Member

mheon commented Mar 12, 2020

Chosen as random? What? That does not sound correct. Per the resolv.conf manpage:

If there are multiple servers, the resolver library queries them in the order listed.

Queried in order should be a safe assumption. How are you making DNS queries / what libc are you using?

@mheon
Copy link
Member

mheon commented Mar 12, 2020

Ah, I see. systemd-resolvd has apparently decided that the rules for everyone else don't apply to it, and is doing its own thing. So that's fun.

@intelfx
Copy link

intelfx commented Mar 12, 2020

@mheon

How are you making DNS queries / what libc are you using?

curl in an alpine container (so, musl).

Ah, I see. systemd-resolvd has apparently decided that the rules for everyone else don't apply to it, and is doing its own thing.

No, systemd-resolved is on the host, not in the container, and it's actually doing the right thing here (as tempting as it is to blame systemd for everything).

@rhatdan
Copy link
Member

rhatdan commented Mar 12, 2020

If the resolv.conf is properly formatted, their is little podman can do to fix that.
Is the issue that the processes inside of the container can not reach the nameserver?

@intelfx
Copy link

intelfx commented Mar 14, 2020

@rhatdan

If the resolv.conf is properly formatted, their is little podman can do to fix that.

I'm not quite sure what do you mean by that, but earlier in this thread @giuseppe did something to reorder nameservers in the generated resolv.conf, which means that my suggestion is also possible.

Is the issue that the processes inside of the container can not reach the nameserver?

No, the issue is that the processes inside of the container reach the wrong nameserver — instead of contacting systemd-resolved on the host they try to contact upstream nameservers directly.

@giuseppe
Copy link
Member

if you'd like to have only 10.0.2.3, you can force it with --dns 10.0.2.3

@bulhoes
Copy link

bulhoes commented Nov 6, 2020

I had the same issue on my setup by I was able to overcome the issue with a simple fix.
The issue might be related to the acl on the named server.
Can you please let me know if you have the containers network allowed on the named acl?
That might be the issue.

@fbezdeka
Copy link

fbezdeka commented Dec 4, 2020

A /etc/resolv.conf generated by systemd-resolved looks like this:

nameserver 127.0.0.53
options edns0
search some.dom

As a result podman seems to remove the nameserver line and adds the "upstream" DNS servers directly.
Bypassing systemd-resolved on the host may work in some scenarios, but it breaks others.
Consider a corporate VPN connections where "upstream" is not defined. (We have at least two "upstreams" when connected)

Bypassing the host's systemd-resolved has at least the following problems:

  • Upstream/Internet DNS servers do not know about corporate specific DNS entries
  • Corporate DNS servers may deliver different addresses than public DNS servers
  • Corporate DNS servers may not deliver results for public stuff
  • Corporate DNS servers may change, so unable to use fixed IP addresses

So whatever which DNS server I choose by setting --dns=<ip>, I will never get the same results as talking to the systemd-resolved running on the host.

How to fix that?
Well, I guess it's not possible at all. podman would have to replace nameserver 127.0.0.53 with something that is forwarded or hosted on the host. But systemd-resolved is listening on the loopback interface only and does not allow (AFAIK) to change / configure that.

[Edit]
The combination of systemd-resolved and podman is the default for Fedora users.
So privileged containers are quite unusable when corporate VPNs are in the game.

@rhatdan
Copy link
Member

rhatdan commented Dec 7, 2020

Docker has the same issue.

Although I think some new features have been added to allow users to share the hosts localhost network. At least in rootless mode.

@mheon @giuseppe WDYT?

@mheon
Copy link
Member

mheon commented Dec 7, 2020

We do have a (limited) ability to do our own DNS via dnsname; maybe using that as a forwarder to the systemd-resolved server would be sufficient?

@rhatdan
Copy link
Member

rhatdan commented Dec 7, 2020

That would work, @baude WDYT

@SISheogorath
Copy link

I would love to see a DNS proxy since my upstream DNS servers are DoT-only. Currently I pass --dns 1.1.1.1 to solve the problem, but it's a not correct to assume that systemd-resolved's configured name servers are Do53 servers.

@lfarkas
Copy link

lfarkas commented Feb 24, 2021

so on fedora (as the main developer platform for podman people) and also rhel-8 systemd-resolved is the default. so using a properly configured systemd-resolved on host can't be used as a name server for containers run by root!? and the only solution is to use --net host!

why did you close this issue???

@patrickbkr
Copy link

@rhatdan Can this issue be reopened?

From what I read above podman is still unreliable when there are non-trivial dns setups on the host.

@rhatdan
Copy link
Member

rhatdan commented Jun 1, 2021

Please open a new issue, describing your exact issues.

@brianjmurrell
Copy link

So what is the new (still open, since I still see this problem) issue that covers a localhost (127.0.0.1) DNS resolver and rootful containers?

@mheon
Copy link
Member

mheon commented Dec 8, 2022

There is none, best of my knowledge, though I suspect that will be supported from Podman 4.4 onwards via the DNS changes in Aardvark.

@jiridanek
Copy link

jiridanek commented Jan 11, 2023

Please open a new issue, describing your exact issues.

@patrickbkr, @brianjmurrell reported as #17075

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
Development

Successfully merging a pull request may close this issue.