Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-resolved: 'DNSStubListener=no' not working because of SELinux denial #679

Closed
vrutkovs opened this issue Nov 23, 2020 · 17 comments
Closed
Labels

Comments

@vrutkovs
Copy link
Member

vrutkovs commented Nov 23, 2020

@dustymabe: 2020-12-07: UPDATED BUG DESCRIPTION

The bug is due to an SELinux denial: see #679 (comment)

Describe the bug
FCOS from testing/next-devel comes with coreos-migrate-to-systemd-resolved service, which unconditionally links /etc/resolv.conf to /run/systemd/resolve/stub-resolv.conf. In some situations this may not be desirable (i.e. OKD disables this setting for kubelet / CoreDNS compatibility).

Systemd-resolved controls symlink location itself, perhaps this service is not needed?

So when a user boots from F32 Stable to F33 OKD content resolv.conf the service is disabled and systemd-resolved correctly manages the symlink. However if FCOS testing is used the symlink is already set to stub-resolv.conf and systemd is unable to switch it back

See okd-project/okd#380.
OKD OS content - https://github.com/openshift/okd-machine-os/tree/release-4.6/overlay/etc/systemd

@dustymabe
Copy link
Member

Describe the bug
FCOS from testing/next-devel comes with coreos-migrate-to-systemd-resolved service,

Correction. All streams have this service. It quietly exits on any
system not yet on a 33 base, though.

https://github.com/coreos/fedora-coreos-config/blob/993697e74c21fc640138c192f4c84c38df807025/overlay.d/15fcos/usr/libexec/coreos-migrate-to-systemd-resolved#L10-L15

which unconditionally links /etc/resolv.conf to /run/systemd/resolve/stub-resolv.conf.

It's not unconditional. See conditions:

https://github.com/coreos/fedora-coreos-config/blob/993697e74c21fc640138c192f4c84c38df807025/overlay.d/15fcos/usr/libexec/coreos-migrate-to-systemd-resolved#L17-L25

In some situations this may not be desirable (i.e. OKD disables this setting for kubelet / CoreDNS compatibility).

Systemd-resolved controls symlink location itself, perhaps this service is not needed?

The coreos-migrate-to-systemd-resolved is only there to migrate
existing systems to use resolved if they hadn't otherwise configured
the machine (i.e. if they were already using the defaults). The
service runs once and disables itself. It will be removed from FCOS
very soon.

So when a user boots from F32 Stable to F33 OKD content resolv.conf the service is disabled and systemd-resolved correctly manages the symlink.

Which service are you referencing here? coreos-migrate-to-systemd-resolved.service or systemd-resolved.service.

coreos-migrate-to-systemd-resolved.service will not disable itself until it has run once on an F33 system.

However if FCOS testing is used the symlink is already set to stub-resolv.conf and systemd is unable to switch it back

FCOS testing is currently on 33.20201116.2.0. My understanding is that
coreos-migrate-to-systemd-resolved has no real effect on a FCOS system that was
originally created on F33+ (i.e. it should be no different than running on
Fedora 33 server with respect to systemd-resolved). On those systems the systemd-tmpfiles
entry will create /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
on early on first boot:

https://github.com/systemd/systemd/blob/5b639090d0b4a49d77ba58bebe180b2a6f8da322/tmpfiles.d/etc.conf.m4#L16

And coreos-migrate-to-systemd-resolved will
just exit and not do anything because of the conditional logic. Here's
the output from the service on the first boot of a machine:

$ journalctl -u coreos-migrate-to-systemd-resolved
-- Logs begin at Mon 2020-11-23 19:42:07 UTC, end at Mon 2020-11-23 19:52:58 UTC. --
Nov 23 19:42:13 fedora systemd[1]: Starting CoreOS Migrate to Systemd Resolved...
Nov 23 19:42:13 fedora coreos-migrate-to-systemd-resolved[902]: Removed /etc/systemd/system/multi-user.target.wants/coreos-migrate-to-systemd-resolved.service.
Nov 23 19:42:13 fedora systemd[1]: Finished CoreOS Migrate to Systemd Resolved.

The logs from the service would have output log messages if it
actually did anything. It didn't. All it did was do nothing and
disable itself so it never runs again.

Maybe the real issue we should dig in to is why the user is getting
Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied error messages.

@LorbusChris
Copy link
Contributor

LorbusChris commented Nov 23, 2020

Ok this is confusing to me:

Maybe this is just systemd-resolved fighting tmpfiles in this case?

@dustymabe
Copy link
Member

dustymabe commented Nov 24, 2020

Ok this is confusing to me:

  • tmpfiles will create the symlink at boot only (per L!), but unconditionally.

The ! means only run during boot. The L by itself will only create the file (symlink) if it doesn't already exist. In the case of systems previously (before systemd-resolved was enabled) NetworkManager was managing the /etc/resolv.conf so that file already existed on F32 based FCOS and the tmpfiles.d entry does nothing.

  • systemd-resolved will create the symlink itself, but only if DNSStubListener=no is not set.

I guess :) - I'm not super familiar with the code.

  • The migration service, while not taking DNSStubListener=no into account, looks like it was disabled in this case and didn't do anything (from openshift/okd#380 (comment))

If the user started from an F33 base, the coreos-migrate-to-systemd-resolved shouldn't do anything.

Maybe this is just systemd-resolved fighting tmpfiles in this case?

Maybe. It would be great to have a small reproducer for being able to investigate this further.

@vrutkovs
Copy link
Member Author

If the user started from an F33 base, the coreos-migrate-to-systemd-resolved shouldn't do anything.

Ah, yes, this is the case we're hitting here. Probably coreos-migrate is wrongly accused and the real issue is tmpfiles.d creating a symlink which cannot be changed by systemd-resolved.

I'll work on a reproducer and update this ticket

@dustymabe
Copy link
Member

@vrutkovs any reproducer yet?

@vrutkovs
Copy link
Member Author

vrutkovs commented Dec 4, 2020

Right, unable to reproduce:

  • Started with FCOS next F33, no dns stub worked flawlessly here
  • When I started at F32, added no-dns-stub.conf and updated to next I still get stub-resolv.conf permission denied.

So its no coreos-migrate service

@vrutkovs vrutkovs closed this as completed Dec 4, 2020
@dustymabe
Copy link
Member

dustymabe commented Dec 4, 2020

Right, unable to reproduce:

  • Started with FCOS next F33, no dns stub worked flawlessly here

which version were you using? 33.20201130.1.0 ?

  • When I started at F32, added no-dns-stub.conf and updated to next I still get stub-resolv.conf permission denied.

I'm kind of interested in what is going on here. Do you mind sharing the FCCT or Ignition file you used so I can reproduce?

@fortinj66
Copy link

fortinj66 commented Dec 5, 2020

Maybe the real issue we should dig in to is why the user is getting
Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied error messages.

This seems to be an selinux issue...

Dec 05 20:52:02 fedora systemd[1]: Starting Network Name Resolution...
Dec 05 20:52:03 fedora systemd-resolved[31573]: Positive Trust Anchors:
Dec 05 20:52:03 fedora systemd-resolved[31573]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Dec 05 20:52:03 fedora systemd-resolved[31573]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 2>
Dec 05 20:52:03 fedora systemd-resolved[31573]: Using system hostname 'fedora'.
Dec 05 20:52:03 fedora systemd-resolved[31573]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
Dec 05 20:52:03 fedora systemd[1]: Started Network Name Resolution.

Here is the really odd thing. if I setenforce 0 and systemctl restart systemd-resolved.service I get the following:

     CGroup: /system.slice/systemd-resolved.service
             └─33835 /usr/lib/systemd/systemd-resolved

Dec 05 20:54:49 fedora systemd[1]: Starting Network Name Resolution...
Dec 05 20:54:49 fedora systemd-resolved[33835]: Positive Trust Anchors:
Dec 05 20:54:49 fedora systemd-resolved[33835]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Dec 05 20:54:49 fedora systemd-resolved[33835]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 2>
Dec 05 20:54:49 fedora systemd-resolved[33835]: Using system hostname 'fedora'.
Dec 05 20:54:49 fedora systemd[1]: Started Network Name Resolution.

but look at this:

[root@fedora resolve]# ls -lZ /run/systemd/resolve
total 4
drwx------. 2 systemd-resolve systemd-resolve system_u:object_r:systemd_resolved_var_run_t:s0  60 Dec  5 20:14 netif
-rw-r--r--. 1 systemd-resolve systemd-resolve system_u:object_r:net_conf_t:s0                 692 Dec  5 20:54 resolv.conf
lrwxrwxrwx. 1 systemd-resolve systemd-resolve system_u:object_r:systemd_resolved_var_run_t:s0  11 Dec  5 20:54 stub-resolv.conf -> resolv.conf

why is stub-resolv.conf -> resolv.conf and the context system_u:object_r:systemd_resolved_var_run_t:s0

on a Fedora 33 system it looks like this:

[root@dnsd01 resolve]# ls -lZ
total 8
drwx------. 2 systemd-resolve systemd-resolve system_u:object_r:systemd_resolved_var_run_t:s0  60 Dec  5 14:04 netif
-rw-r--r--. 1 systemd-resolve systemd-resolve system_u:object_r:net_conf_t:s0                 682 Dec  5 15:49 resolv.conf
-rw-r--r--. 1 systemd-resolve systemd-resolve system_u:object_r:net_conf_t:s0                 762 Dec  5 15:49 stub-resolv.conf
[root@dnsd01 resolve]#

The security context is different and doesn't point to resolv.conf

Here is the error

[root@fedora ~]#  dmesg | grep -i -e type=1300 -e type=1400
[  242.622151] audit: type=1400 audit(1607202306.382:211): avc:  denied  { create } for  pid=3289 comm="systemd-resolve" name=".#stub-resolv.conf345509e1f79c00b5" scontext=system_u:system_r:systemd_resolved_t:s0 tcontext=system_u:object_r:systemd_resolved_var_run_t:s0 tclass=lnk_file permissive=0

Failing on creating a link file... why is it making a linked file?

@fortinj66
Copy link

fortinj66 commented Dec 5, 2020

So because of the patch described above, when DNSStubListener=no, systemd-resolved tries to create a symbolic link. However, the selinux context is bad since it is now a link...

unfortunately, semanage is no longer on coreos so I cant run:
semanage fcontext -f l -a -t net_conf_t /var/run/systemd/resolve/stub-resolv.conf

so I guess for this process to work the secontext needs to be fixed for the link...

@fortinj66
Copy link

and there is /etc/NetworkManager/dispatcher.d/30-resolv-prepender which also seems to overwrite /etc/resolv.conf. It's a popular target!

I commented out most of the code in it so that it didn't actually make a new /etc/resolv.conf but deleted it and created a link to /run/systemd/resolve/stub-resolv.conf

[root@fedora ~]# cat /etc/NetworkManager/dispatcher.d/30-resolv-prepender
#!/bin/bash
set -eo pipefail
IFACE=$1
STATUS=$2


# If $DHCP6_FQDN_FQDN is not empty and is not localhost.localdomain
[[ -n "$DHCP6_FQDN_FQDN" && "$DHCP6_FQDN_FQDN" != "localhost.localdomain" && "$DHCP6_FQDN_FQDN" =~ "." ]] && hostnamectl set-hostname --static --transient $DHCP6_FQDN_FQDN
case "$STATUS" in
    up|down|dhcp4-change|dhcp6-change)
    logger -s "NM resolv-prepender triggered by ${1} ${2}."

    # Ensure resolv.conf exists before we try to run podman
    if [[ ! -e /etc/resolv.conf ]] || ! grep -q nameserver /etc/resolv.conf; then
        #cp /var/run/NetworkManager/resolv.conf /etc/resolv.conf
         rm /etc/resolv.conf
         ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
    fi

#    NAMESERVER_IP=$(/usr/bin/podman run --rm \
#        --authfile /var/lib/kubelet/config.json \
#        --net=host \
#        registry.svc.ci.openshift.org/origin/4.6-2020-12-04-130559@sha256:f07aa22828d1fd5089a60ddd9118fa0eb2e5afb4a5097a65ba89a1382c71dad0 \
#        node-ip \
#        show \
#        "10.102.5.2" \
#        "10.102.5.3")
#    DOMAIN="dev-c1v4.os.maeagle.corp"
#    if [[ -n "$NAMESERVER_IP" ]]; then
#        logger -s "NM resolv-prepender: Prepending 'nameserver $NAMESERVER_IP' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)"
#        sed -e "/^search/d" \
#            -e "/Generated by/c# Generated by KNI resolv prepender NM dispatcher script\nsearch $DOMAIN\nnameserver $NAMESERVER_IP" \
#            /var/run/NetworkManager/resolv.conf > /etc/resolv.tmp
#    fi
#    mv -f /etc/resolv.tmp /etc/resolv.conf
    ;;
    *)
    ;;
esac

I also added a file /etc/systemd/resolved.conf.d/static_dns.conf with the following:

 cat /etc/systemd/resolved.conf.d/static_dns.conf
[Resolve]
DNS=10.102.5.122
Domains=dev-c1v4.os.maeagle.corp

which are the local ip and domain. This results in:

[root@fedora ~]# ls -al /etc/resolv.conf
lrwxrwxrwx. 1 root root 32 Dec  6 00:02 /etc/resolv.conf -> /run/systemd/resolve/resolv.conf

[root@fedora ~]# cat  /run/systemd/resolve/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.102.5.122
nameserver 10.99.111.1
nameserver 10.99.111.2
search dev-c1v4.os.maeagle.corp maeagle.corp

[root@fedora ~]# ping api-int
PING api-int.dev-c1v4.os.maeagle.corp (10.102.5.2) 56(84) bytes of data.
64 bytes from api-int.dev-c1v4.os.maeagle.corp (10.102.5.2): icmp_seq=1 ttl=64 time=0.215 ms
64 bytes from api-int.dev-c1v4.os.maeagle.corp (10.102.5.2): icmp_seq=2 ttl=64 time=0.188 ms

--- api-int.dev-c1v4.os.maeagle.corp ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1059ms
rtt min/avg/max/mdev = 0.188/0.201/0.215/0.013 ms

and DNS is properly resolved...

and this is with the standard /etc/nsswitch.conf with no changes to hosts:

 cat /etc/nsswitch.conf
#
# /etc/nsswitch.conf
#
# Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# Valid databases are: aliases, ethers, group, gshadow, hosts,
# initgroups, netgroup, networks, passwd, protocols, publickey,
# rpc, services, and shadow.
#
# Valid service provider entries include (in alphabetical order):
#
#       compat                  Use /etc files plus *_compat pseudo-db
#       db                      Use the pre-processed /var/db files
#       dns                     Use DNS (Domain Name Service)
#       files                   Use the local files in /etc
#       hesiod                  Use Hesiod (DNS) for user lookups
#
# See `info libc 'NSS Basics'` for more information.
#
# Commonly used alternative service providers (may need installation):
#
#       ldap                    Use LDAP directory server
#       myhostname              Use systemd host names
#       mymachines              Use systemd machine names
#       mdns*, mdns*_minimal    Use Avahi mDNS/DNS-SD
#       resolve                 Use systemd resolved resolver
#       sss                     Use System Security Services Daemon (sssd)
#       systemd                 Use systemd for dynamic user option
#       winbind                 Use Samba winbind support
#       wins                    Use Samba wins support
#       wrapper                 Use wrapper module for testing
#
# Notes:
#
# 'sssd' performs its own 'files'-based caching, so it should generally
# come before 'files'.
#
# WARNING: Running nscd with a secondary caching service like sssd may
#          lead to unexpected behaviour, especially with how long
#          entries are cached.
#
# Installation instructions:
#
# To use 'db', install the appropriate package(s) (provide 'makedb' and
# libnss_db.so.*), and place the 'db' in front of 'files' for entries
# you want to be looked up first in the databases, like this:
#
# passwd:    db files
# shadow:    db files
# group:     db files

# In order of likelihood of use to accelerate lookup.
passwd: sss files altfiles systemd
shadow:     files
group: sss files altfiles systemd
hosts: files resolve [!UNAVAIL=return] myhostname dns
services:   files sss
netgroup:   sss
automount:  files sss

aliases:    files
ethers:     files
gshadow:    files
# Allow initgroups to default to the setting for group.
# initgroups: files
networks:   files dns
protocols:  files
publickey:  files
rpc:        files

So I think it is quite feasible to use systemd-resolved almost out of the box with the fix for the selinux context described above

--John

@dustymabe
Copy link
Member

To me it looks like when using:

[Resolve]
DNSStubListener=no

systemd-resolved tries to symlink /run/systemd/resolve/stub-resolv.conf -> resolv.conf (where that resolv.conf is the one in the same directory) and it's failing because of SELinux permissions. I guess it's trying to make that symlink because it already knows (assumes?) that the /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf symlink already exists and it's just symlinking the symlink so that it's not effectively using the stub resolver (since we said DNSStubListener=no). So it's trying to get us to /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf -> ./resolv.conf.

I found this bug and added a comment. Let's see if we can get it fixed: https://bugzilla.redhat.com/show_bug.cgi?id=1896796

@dustymabe dustymabe changed the title [next] coreos-migrate-to-systemd-resolved is not compatible with resolved 'DNSStubListener=no' systemd-resolved: 'DNSStubListener=no' not working because of SELinux denial Dec 7, 2020
@dustymabe dustymabe reopened this Dec 7, 2020
@fortinj66
Copy link

And yes, that is exactly what is happening...

I found this bug and added a comment. Let's see if we can get it fixed: https://bugzilla.redhat.com/show_bug.cgi?id=1896796

Awesome... Thank you

--John

@zyclonite
Copy link

seems this bug breaks instances which did already run through several update cycles and are doing the latest coreos stable update (first 33 release)

for "newer" instances running one of the latest 32er releases the upgrade works

would be great to find a future prove solution which does not break again on the next update

@zyclonite
Copy link

my current quickfix was to just create the symlink /run/systemd/resolve/stub-resolv.conf -> /run/systemd/resolve/resolv.conf
but systemd-resolved still complains as it tries to create that link on startup

@dustymabe
Copy link
Member

@zyclonite - Any chance you are hitting the SELinux issue metioned in https://discussion.fedoraproject.org/t/fedora-coreos-rebasing-to-fedora-33-features-and-known-issues/25474 ?

@zyclonite
Copy link

@dustymabe brilliant hint, that was exactly my problem - only that node had an accidentally changed policy and was never cleaned up

@dustymabe
Copy link
Member

The fix for this went into testing stream release 33.20201201.2.1 and stable stream release 33.20201201.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants