Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fedora-toolbox:39 has a dangling /etc/resolv.conf symlink on hosts without systemd-resolved(8) #1410

Closed
ianw opened this issue Nov 22, 2023 · 15 comments
Labels
1. Bug Something isn't working

Comments

@ianw
Copy link

ianw commented Nov 22, 2023

The 39 (and 40) tag appears to have /etc/resolv.conf in the container as a dangling symlink to systemd/resolve/stub-resolv.conf

# in 39 toolbox
$ podman run --dns=none -it registry.fedoraproject.org/fedora-toolbox:39 /bin/bash -c "ls -l /etc/resolv.conf; rpm -qa | grep systemd-resolv"
lrwxrwxrwx. 1 root root 39 Nov  7 07:55 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
systemd-resolved-254.5-2.fc39.x86_64

# not in 38 toolbox
$ podman run --dns=none -it registry.fedoraproject.org/fedora-toolbox:38 /bin/bash -c "ls -l /etc/resolv.conf; rpm -qa | grep systemd-resolv"
-rw-r--r--. 1 root root 21 Oct  6 06:48 /etc/resolv.conf

# still in 40 toolbox
$ podman run --dns=none -it registry.fedoraproject.org/fedora-toolbox:40 /bin/bash -c "ls -l /etc/resolv.conf; rpm -qa | grep systemd-resolv"
lrwxrwxrwx. 1 root root 39 Nov 20 07:34 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
systemd-resolved-254.5-2.fc40.x86_64

This breaks name resolution on toolbox entry for people on systems like rhel9 that are not using systemd-resolved

@ianw ianw added the 1. Bug Something isn't working label Nov 22, 2023
@ianw
Copy link
Author

ianw commented Nov 22, 2023

I imagine this is also the cause of the problems in #1406 although it's not specified what system the toolbox is being run on there

@ianw ianw changed the title Fedora toolbox 39 has a dangling /etc/resolv.conf symlink Fedora toolbox 39 has a dangling /etc/resolv.conf symlink for systemd-resolved Nov 22, 2023
@debarshiray debarshiray changed the title Fedora toolbox 39 has a dangling /etc/resolv.conf symlink for systemd-resolved fedora-toolbox:39 has a dangling /etc/resolv.conf symlink for systemd-resolved Dec 1, 2023
@debarshiray
Copy link
Member

debarshiray commented Dec 1, 2023

This looks like a regression from the Fedora 39 Change that started building the fedora-toolbox images as part of the nightly composes, and ported them from a Dockerfile to fedora-kickstarts and pungi-fedora.

Prior to that, the fedora-toolbox images had a simple /etc/resolv.conf:

[rishi@topinka ~]$ podman run --rm --interactive --tty --env TERM=$TERM registry.fedoraproject.org/fedora-toolbox:38 /bin/bash
[root@cf4e0ff025dc /]# ls -l /etc/resolv.conf
-rw-r--r--. 1 root root 100 Dec  1 16:10 /etc/resolv.conf
[root@cf4e0ff025dc /]# cat /etc/resolv.conf
search redhat.com
nameserver 10.0.2.3
nameserver 192.168.0.1
nameserver fe80::7a98:e8ff:fe55:4180%4
[root@cf4e0ff025dc /]# 

... that got turned into a symbolic link to the host's /etc/resolv.conf (at /run/host/etc/resolv.conf) by the Toolbx container's entry point.

The logic in the entry point looks like this:

if _, err := os.Readlink("/etc/resolv.conf"); err != nil {
    if err := redirectPath("/etc/resolv.conf", "/run/host/etc/resolv.conf", false); err != nil {
        return err
    }
}

Unfortunately, os.Readlink doesn't return an error if the target of the symlink is absent, which is what leads to this situation. While we fix this, you can fix your own containers by making sure that the container's /etc/resolv.conf points to /run/host/etc/resolv.conf.

@debarshiray
Copy link
Member

I added some tests to ensure that DNS works inside the containers: #1414

It's not as exhaustive as I would like it to be, because we aren't currently running all the tests on CentOS Stream 9 and Ubuntu 22.10. However, it's a start.

We should also probably check /etc/resolv.conf itself, but that would require this bug to be fixed first.

@debarshiray
Copy link
Member

One way to fix this bug would be to always reset the container's /etc/resolv.conf to a known good value on container start. This could cause problems in cases where the user deliberately changed it inside the container, because the customization will get overwritten whenever the container is restarted.

However, we don't preserve such modifications elsewhere either, so this won't make it any worse.

I suppose we need to introduce some sort of a stamp file to mark a container as already initialized, and to not reset any configuration changes made by the user.

@ianw
Copy link
Author

ianw commented Dec 2, 2023

This looks like a regression from the Fedora 39 Change that started building the fedora-toolbox images as part of the nightly composes, and ported them from a Dockerfile to fedora-kickstarts and pungi-fedora.

Prior to that, the fedora-toolbox images had a simple /etc/resolv.conf:

One thing that we (we being mostly @wackrat :) is that this symlink is being created by systemd-resolved which appears to be a weak dependency of systemd ... so I think this was not being brought in until a recent change https://pagure.io/fedora-kickstarts/c/49306cb6eada8777eafc2fa7f93f16008c2e93a5?branch=main which starts to pull in weak dependencies.

I don't think that systemd-resolved is required in the container at all? Name resolution seems like a thing for the host. Perhaps this package should be put on a block list?

debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 2, 2023
@debarshiray
Copy link
Member

One thing that we (we being mostly @wackrat :) is that this symlink is being created by systemd-resolved which appears to be a weak dependency of systemd ... so I think this was not being brought in until a recent change https://pagure.io/fedora-kickstarts/c/49306cb6eada8777eafc2fa7f93f16008c2e93a5?branch=main which starts to pull in weak dependencies.

Yes, that's exactly it! I didn't notice that we are explicitly pulling in systemd through the fedora-container-toolbox.ks Kickstart.

It's a mistake that crept in during the move to build the images as part of the nightly composes. Earlier, systemd was only one of those packages that we re-installed, if they were already part of the fedora base image, because those had their documentation and translations stripped out.

Last time, systemd was part of the fedora base image was Fedora 36, and even though it pulled in systemd-resolved, the /etc/resolv.conf symbolic link was correct.

Anyway, do you want to submit a pull request to fedora-kickstarts to fix the fedora-toolbox image? Something like:

$ git diff
diff --git a/fedora-container-toolbox.ks b/fedora-container-toolbox.ks
index 89e8ee9d38b9..5e86e5f4cba7 100644
--- a/fedora-container-toolbox.ks
+++ b/fedora-container-toolbox.ks
@@ -82,7 +82,7 @@ shadow-utils
 -shared-mime-info
 -sssd-client
 sudo
-systemd
+-systemd-resolved
 tar # https://bugzilla.redhat.com/show_bug.cgi?id=1409920
 tcpdump
 time

I don't think that systemd-resolved is required in the container at all? Name resolution seems like a thing for the host. Perhaps this package should be put on a block list?

Yes, you are right. We point (or at least try to) the container's /etc/resolv.conf to the host's version of the file, anyway.

@debarshiray
Copy link
Member

One way to fix this bug would be to always reset the container's /etc/resolv.conf to a known good value on container start. This could cause problems in cases where the user deliberately changed it inside the container, because the customization will get overwritten whenever the container is restarted.

However, we don't preserve such modifications elsewhere either, so this won't make it any worse.

Apart from fixing the fedora-toolbox image, I think we should also make the entry point more aggressive in resetting the container's /etc/resolv.conf to a known good value on container start. Ultimately, toolbox(1) can be used with so many unknown images out there in the wild that we can't possibly ensure that all the images are correct in every detail.

Pull requests welcome. :)

@ianw
Copy link
Author

ianw commented Dec 4, 2023

Anyway, do you want to submit a pull request to fedora-kickstarts to fix the fedora-toolbox image?

https://pagure.io/fedora-kickstarts/pull-request/1010

@ianw
Copy link
Author

ianw commented Dec 4, 2023

Apart from fixing the fedora-toolbox image, I think we should also make the entry point more aggressive in resetting the container's /etc/resolv.conf to a known good value on container start. Ultimately, toolbox(1) can be used with so many unknown images out there in the wild that we can't possibly ensure that all the images are correct in every detail.

I see that the container starts with --dns=none (

"--dns", "none",
) meaning podman doesn't get in the way.

The current logic seems to boil down to

  • if it's a file in the container; symlink it to the bind mount of the host /etc/resolv.conf
  • if it's a symlink in the container, leave it alone; based on the assumption somebody configured this

I think the assumption we broke here was that it in the default case, there was a symlink so toolbox assumed it was setup and left it alone?

I think perhaps asserting in the container that the resolv.conf is a plain file might be enough?

@debarshiray
Copy link
Member

debarshiray commented Dec 7, 2023

Anyway, do you want to submit a pull request to fedora-kickstarts to fix the fedora-toolbox image?

https://pagure.io/fedora-kickstarts/pull-request/1010

Thanks, @ianw !

The current logic seems to boil down to

if it's a file in the container; symlink it to the bind mount of the host /etc/resolv.conf
if it's a symlink in the container, leave it alone; based on the assumption somebody configured this

Yes, correct.

I think the assumption we broke here was that it in the default case, there was a symlink so toolbox assumed it was setup and left it alone?

Yes, correct.

I think perhaps asserting in the container that the resolv.conf is a plain file might be enough?

Umm... what do you mean by asserting? You mean leaving the current logic as it is?

I was considering making the current logic more aggressive like this:

if resolvConfTarget, err := os.Readlink("/etc/resolv.conf"); err != nil || resolvConfTarget != "/run/host/etc/resolv.conf" {
    if err := redirectPath("/etc/resolv.conf", "/run/host/etc/resolv.conf", false); err != nil {
        return err
    }
}

This has the downside of overwriting some custom user-made modifications to the container's /etc/resolv.conf. However, I think that we need a more explicit way to support those. One idea is to have the entry point create a stamp file for the container after the first run that will survive container restarts. As long as this stamp file is present, subsequent runs of the entry point won't touch the configuration.

On the plus side, being more aggressive with enforcing the known good configuration will protect us from the subtleties of unknown host and image combinations. We can defend against the unknown by further improving our CI by running the full test suite on more host operating systems (eg., we only run a few tests on CentOS Stream 9 and Ubuntu 22.04 today), but there are only so many combinations that we will be able to test.

debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 17, 2023
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 17, 2023
@debarshiray
Copy link
Member

@ianw Does this fix the problem when using the faulty fedora-toolbox images on RHEL 9 hosts: #1425 ? It seemed to work for me, but a confirmation will be good. :)

debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 18, 2023
On some Toolbx images with systemd-resolved.service(8), like the
fedora-toolbox image for Fedora 39 onwards, /etc/resolv.conf can end up
being a symbolic link inside the container that expects the host
operating system to also use systemd-resolved.service(8):
  $ ls -l /etc/resolv.conf
  lrwxrwxrwx. 1 root root 39 Nov 28 08:50 /etc/resolv.conf ->
    ../run/systemd/resolve/stub-resolv.conf

This happens because /etc/resolv.conf is already a symbolic link inside
the image, and, hence, the container's entry point doesn't change it to
point at the host's copy of the file at /run/host/etc/resolv.conf.

If the host OS doesn't use systemd-resolved.service(8), like Red Hat
Enterprise Linux 9, then this leads to a dangling symbolic link and
breaks DNS queries.

Note that the presence of systemd-resolved.service(8) in the recent
fedora-toolbox is a regression arising from the ToolbxReleaseBlocker
Change [1] for Fedora 39 where the image was rewritten to in terms of
fedora-kickstarts and pungi-fedora instead of a Container/Dockerfile.
By mistake, systemd crept in as an RPM needed by the image [2], which
in turn pulled in the systemd-resolved RPM as a weak dependency [3].

Hopefully, that will get fixed.  However, it's also not practical to
keep track of all the Toolbx images out there in the wild, so it's
wise to make toolbox(1) more resilient to such things.

This will have the downside of overwriting some custom user-made
modifications to the container's /etc/resolv.conf.  While that's
unfortunate, it's more important to have Toolbx images produce working
containers on a wide range of host OSes.  It will be better to come up
with a more explicit way to support custom user-made modifications to
the container's configuration.  Perhaps with a persistent stamp file.

[1] https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker

[2] fedora-kickstarts commit 48e2c3b5598de32f
    https://pagure.io/fedora-kickstarts/c/48e2c3b5598de32f

[3] fedora-kickstarts commit 49306cb6eada8777
    https://pagure.io/fedora-kickstarts/c/49306cb6eada8777

containers#1410
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 18, 2023
On some Toolbx images with systemd-resolved.service(8), like the
fedora-toolbox image for Fedora 39 onwards, /etc/resolv.conf can end up
being a symbolic link inside the container that expects the host
operating system to also use systemd-resolved.service(8):
  $ ls -l /etc/resolv.conf
  lrwxrwxrwx. 1 root root 39 Nov 28 08:50 /etc/resolv.conf ->
    ../run/systemd/resolve/stub-resolv.conf

This happens because /etc/resolv.conf is already a symbolic link inside
the image, and, hence, the container's entry point doesn't change it to
point at the host's copy of the file at /run/host/etc/resolv.conf.

If the host OS doesn't use systemd-resolved.service(8), like Red Hat
Enterprise Linux 9, then this leads to a dangling symbolic link and
breaks DNS queries.

Note that the presence of systemd-resolved.service(8) in the recent
fedora-toolbox is a regression arising from the ToolbxReleaseBlocker
Change [1] for Fedora 39 where the image was rewritten to in terms of
fedora-kickstarts and pungi-fedora instead of a Container/Dockerfile.
By mistake, systemd crept in as an RPM needed by the image [2], which
in turn pulled in the systemd-resolved RPM as a weak dependency [3].

Hopefully, that will get fixed.  However, it's also not practical to
keep track of all the Toolbx images out there in the wild, so it's
wise to make toolbox(1) more resilient to such things.

This will have the downside of overwriting some custom user-made
modifications to the container's /etc/resolv.conf.  While that's
unfortunate, it's more important to have Toolbx images produce working
containers on a wide range of host OSes.  It will be better to come up
with a more explicit way to support custom user-made modifications to
the container's configuration.  Perhaps with a persistent stamp file.

[1] https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker

[2] fedora-kickstarts commit 48e2c3b5598de32f
    https://pagure.io/fedora-kickstarts/c/48e2c3b5598de32f

[3] fedora-kickstarts commit 49306cb6eada8777
    https://pagure.io/fedora-kickstarts/c/49306cb6eada8777

containers#1410
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 18, 2023
On some Toolbx images with systemd-resolved(8), like the fedora-toolbox
images for Fedora 39 onwards, /etc/resolv.conf can end up being a
symbolic link inside the container that expects the host operating
system to also use systemd-resolved(8):
  $ ls -l /etc/resolv.conf
  lrwxrwxrwx. 1 root root 39 Nov 28 08:50 /etc/resolv.conf ->
    ../run/systemd/resolve/stub-resolv.conf

This happens because systemd-resolved(8) already makes /etc/resolv.conf
a symbolic link inside the image, and, hence, the container's entry
point doesn't change it to point at the host's copy of the file at
/run/host/etc/resolv.conf.  Instead, it's left pointing to the host's
copy of the files maintained by systemd-resolved(8) under
/run/systemd/resolve, which happen to be also available inside the
container [1].

If the host OS doesn't use systemd-resolved(8), like Red Hat Enterprise
Linux 9, then this leads to a dangling symbolic link and breaks DNS
queries.

Note that the presence of systemd-resolved(8) in the recent
fedora-toolbox images is a regression caused by the ToolbxReleaseBlocker
Change [2] for Fedora 39 where the image was rewritten in terms of
fedora-kickstarts and pungi-fedora instead of a Container/Dockerfile.
By mistake, systemd crept in as an RPM needed by the image [3], which
in turn pulled in the systemd-resolved RPM as a weak dependency [4].

Hopefully, that will get fixed.  However, it's also not practical to
keep track of all the Toolbx images out there in the wild, so it's
wise to make toolbox(1) more resilient to such things.

This will have the downside of overwriting some custom user-made
modifications to the container's /etc/resolv.conf.  While that's
unfortunate, it's more important to have Toolbx images produce working
containers on a wide range of host OSes.  It will be better to come up
with a more explicit way to support custom user-made modifications to
the container's configuration.  Perhaps with a persistent stamp file.

[1] Commit af602c7
    containers@af602c7d227617d2
    containers#707

[2] https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker

[3] fedora-kickstarts commit 48e2c3b5598de32f
    https://pagure.io/fedora-kickstarts/c/48e2c3b5598de32f

[4] fedora-kickstarts commit 49306cb6eada8777
    https://pagure.io/fedora-kickstarts/c/49306cb6eada8777

containers#1410
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 18, 2023
On some Toolbx images with systemd-resolved(8), like the fedora-toolbox
images for Fedora 39 onwards, /etc/resolv.conf can end up being a
symbolic link inside the container that expects the host operating
system to also use systemd-resolved(8):
  $ ls -l /etc/resolv.conf
  lrwxrwxrwx. 1 root root 39 Nov 28 08:50 /etc/resolv.conf ->
    ../run/systemd/resolve/stub-resolv.conf

This happens because systemd-resolved(8) already makes /etc/resolv.conf
a symbolic link inside the image, and, hence, the container's entry
point doesn't change it to point at the host's copy of the file at
/run/host/etc/resolv.conf.  Instead, it's left pointing at the host's
copy of the files maintained by systemd-resolved(8) under
/run/systemd/resolve, which happen to be also available inside the
container [1].

If the host OS doesn't use systemd-resolved(8), like Red Hat Enterprise
Linux 9, then this leads to a dangling symbolic link and breaks DNS
queries.

Note that the presence of systemd-resolved(8) in the recent
fedora-toolbox images is a regression caused by the ToolbxReleaseBlocker
Change [2] for Fedora 39 where the image was rewritten in terms of
fedora-kickstarts and pungi-fedora instead of a Container/Dockerfile.
By mistake, systemd crept in as an RPM needed by the image [3], which
in turn pulled in the systemd-resolved RPM as a weak dependency [4].

Hopefully, that will get fixed.  However, it's also not practical to
keep track of all the Toolbx images out there in the wild, so it's
wise to make toolbox(1) more resilient to such things.

This will have the downside of overwriting some custom user-made
modifications to the container's /etc/resolv.conf.  While that's
unfortunate, it's more important to have Toolbx images produce working
containers on a wide range of host OSes.  It will be better to come up
with a more explicit way to support custom user-made modifications to
the container's configuration.  Perhaps with a persistent stamp file.

[1] Commit af602c7
    containers@af602c7d227617d2
    containers#707

[2] https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker

[3] fedora-kickstarts commit 48e2c3b5598de32f
    https://pagure.io/fedora-kickstarts/c/48e2c3b5598de32f

[4] fedora-kickstarts commit 49306cb6eada8777
    https://pagure.io/fedora-kickstarts/c/49306cb6eada8777

containers#1410
@debarshiray debarshiray changed the title fedora-toolbox:39 has a dangling /etc/resolv.conf symlink for systemd-resolved fedora-toolbox:39 has a dangling /etc/resolv.conf symlink on hosts without systemd-resolved(8) Dec 18, 2023
@debarshiray
Copy link
Member

The fedora-toolbox:40 image has now been fixed, and toolbox(1) itself has been made more resilient.

Once this has gotten some more testing, we can arrange for some backports for the fedora-toolbox:39 image.

@debarshiray
Copy link
Member

Thanks for your help getting this fixed, @ianw !

@dmitpv
Copy link

dmitpv commented Dec 23, 2023

fedora-toolbox-39 is work! ver: toolbox-0.0.99.5-1. Thank you!

@vwbusguy
Copy link

I hit this on Fedora 39 today. Looks like it might be fixed in rawhide, but not yet for 39. https://bugzilla.redhat.com/show_bug.cgi?id=2258648

offsoc pushed a commit to offsoc/Fedora-kickstarts that referenced this issue Oct 21, 2024
Since change 48e2c3b this kickstart
is pulling in systemd.

This was noticed because since
b5fc5fd started bringing in
weak-dependencies, we started installing systemd-resolved is which
created a symlinked /etc/resolv.conf in the image.  Toolbox will not
currently reset this on container start, as it is a symlink (this
behaviour is a bit complicated; see [1]).  This leads to an
incompatability running the toolbox on *non* systemd-resolved hosts
(e.g. RHEL9); you are left with a dangling symlink and no
name-resolution in the toolbox.

We do not want systemd in the toolbox image by default it; remove it
from the list.  Exclude systemd-resolved specifically, so if something
else brings in systemd we still don't include this.

[1] containers/toolbox#1410

https://pagure.io/fedora-kickstarts/pull-request/1027
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants