Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container doesn't start because user.Current() fails inside the entry point #1001

Open
patriziobrunops opened this issue Jan 25, 2022 · 19 comments
Labels
1. Bug Something isn't working 2. Container Initialization Related to setting up the container environment, libc-compatibiltiy and such

Comments

@patriziobrunops
Copy link

Describe the bug
When trying to enter the container fedora-toolbox-35 created using

toolbox create --distro fedora --release 35

I get the error

$ toolbox enter fedora-toolbox-35
Error: invalid entry point PID of container fedora-toolbox-35

To have a better error message I tried to attach to the container using podman instead of toolbox and got the error:

$ podman start --attach --interactive fedora-toolbox-35
Error: failed to get the current user: user: lookup userid 0: invalid argument

This error is returned by toolbox whenever Golang's user.Current() gets an error when calling getpwuid_r.

Steps how to reproduce the behaviour

  1. toolbox create --distro fedora --release 35
  2. toolbox enter fedora-toolbox-35

Expected behaviour
toolbox should start the default user shell inside a container.

Actual behaviour
toolbox exits with error code 1 and prints the error Error: invalid entry point PID of container fedora-toolbox-35.

Screenshots
If applicable, add screenshots to help explain your problem.

Output of toolbox --version (v0.0.90+)

toolbox version 0.0.99.3

Toolbox package info (rpm -q toolbox)

toolbox-0.0.99.3-2.fc34.x86_64

Output of podman version

Version:      3.4.2
API Version:  3.4.2
Go Version:   go1.16.8
Built:        Sun Nov 14 00:16:48 2021
OS/Arch:      linux/amd64

Podman package info (rpm -q podman)

podman-3.4.2-1.fc34.x86_64

Info about your OS
Fedora Silverblue 34

Additional context
Add any other context about the problem here.
When did the issue start occurring? After an update (what packages were updated)?
If the issue is about operating with containers/images (creating, using, deleting,..), share here what image you used. If you're unsure, share here the output of toolbox list -i (shows all toolbox images on your system).

IMAGE ID      IMAGE NAME                                    CREATED
ab8bc106d4a7  <none>                                        2 months ago
d8a734db8c5c  registry.fedoraproject.org/fedora-toolbox:34  7 months ago
40b181c70b73  registry.fedoraproject.org/fedora-toolbox:35  4 weeks ago

If you see an error message saying: Error: invalid entry point PID of container <name-of-container>, add to the ticket output of command podman start --attach <name-of-container>.

@patriziobrunops patriziobrunops added the 1. Bug Something isn't working label Jan 25, 2022
@angiglesias
Copy link

angiglesias commented Jan 30, 2022

Hi @patriziobrunops I've been tracing this bug and looks like the problem you are experiencing is this error

root@toolbox /]# toolbox --log-level=trace init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
Error: failed to get the current user: user: lookup userid 0: invalid argument
[root@toolbox /]# 

This error is originated on line 316 on cmd/root.go when user.Current from Golang stdlib is called. After some tinkering, looks like this error is happening because of some strange interaction between the mounted toolbox binary and libs from the host (which is based on Fedora 34) and the container environment (which is based on Fedora 35).

In addition, toolbox is built to use the libs from the host instead of the ones provided on the container, so I suppose what I am going to do is not the proper use

[root@toolbox /]# ldd /usr/bin/toolbox 
	linux-vdso.so.1 (0x00007ffc35fa7000)
	libpthread.so.0 => /run/host/usr/lib64/libpthread.so.0 (0x00007f99f3d68000)
	libc.so.6 => /run/host/usr/lib64/libc.so.6 (0x00007f99f3b99000)
	/run/host/usr/lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f99f4577000)
[root@toolbox /]# 

I tried to play with LD_PRELOAD to load the libs from the container with no success

root@toolbox /]# LD_PRELOAD=/lib64/libc.so.6:/lib64/libpthread.so.0:/lib64/ld-linux-x86-64.so.2  /usr/bin/toolbox --help
Floating point exception (core dumped)

Then, I tried another thing, I built toolbox on a fedora 35 container but without the external linker flags to force it to use the libs in the container. After inspecting the properties of the container created by toolbox I recreated one manually mounting my modified binary:

# The podman create command 
podman create --dns none --env TOOLBOX_PATH=/usr/bin/toolbox --env XDG_RUNTIME_DIR=/run/user/1000 --hostname toolbox --ipc host --label com.github.containers.toolbox=true --mount type=devpts,destination=/dev/pts --name fedora-toolbox-35-manual --network host --no-hosts --pid host --privileged --security-opt label=disable --ulimit host --userns keep-id --user root:root --volume /:/run/host:rslave --volume /dev:/dev:rslave --volume /run/dbus/system_bus_socket:/run/dbus/system_bus_socket --volume /var/home/test-vm:/var/home/test-vm:rslave --volume /home/test-vm/Documentos/toolbox/builddir/src/toolbox:/usr/bin/toolbox:ro --volume /run/user/1000:/run/user/1000 --volume /run/avahi-daemon/socket:/run/avahi-daemon/socket --volume /run/.heim_org.h5l.kcm-socket:/run/.heim_org.h5l.kcm-socket --volume /run/pcscd/pcscd.comm:/run/pcscd/pcscd.comm --volume /run/media:/run/media:rslave --volume /etc/profile.d/toolbox.sh:/etc/profile.d/toolbox.sh:ro registry.fedoraproject.org/fedora-toolbox:35 toolbox init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
832f1b233412cd1a7d1209f79da0faae1c0d5b4aa979d193dacd9fa6ecaf8d9a

The only change here is --volume /home/test-vm/Documentos/toolbox/builddir/src/toolbox:/usr/bin/toolbox:ro instead of --volume /usr/bin/toolbox:/usr/bin/toolbox:ro and voilà, success!

[test-vm@fedora ~]$ toolbox list
IMAGE ID      IMAGE NAME                                    CREATED
8335be7293c7  localhost/fedora-toolbox:35-patched           50 minutes ago
40b181c70b73  registry.fedoraproject.org/fedora-toolbox:35  5 weeks ago

CONTAINER ID  CONTAINER NAME            CREATED         STATUS      IMAGE NAME
e1e65371943d  fedora-toolbox-35         4 hours ago     exited      registry.fedoraproject.org/fedora-toolbox:35
832f1b233412  fedora-toolbox-35-manual  36 seconds ago  configured  registry.fedoraproject.org/fedora-toolbox:35
[test-vm@fedora ~]$ toolbox enter fedora-toolbox-35
fedora-toolbox-35         fedora-toolbox-35-manual  
[test-vm@fedora ~]$ toolbox enter fedora-toolbox-35-manual 

Welcome to the Toolbox; a container where you can install and run
all your tools.

 - Use DNF in the usual manner to install command line tools.
 - To create a new tools container, run 'toolbox create'.

For more information, see the documentation.

⬢[test-vm@toolbox ~]$ 

If you want to give it a try, this was the change I made to the build scripts to disable the linker using the libs from the host to execute the application, and also here is the binary in a tar.gz

diff --git a/src/go-build-wrapper b/src/go-build-wrapper
index ef4aafc..0d01ad4 100755
--- a/src/go-build-wrapper
+++ b/src/go-build-wrapper
@@ -70,10 +70,15 @@ fi
 dynamic_linker="/run/host$dynamic_linker_canonical_dirname/$dynamic_linker_basename"
 
 # shellcheck disable=SC2086
+#go build \
+#        $tags \
+#        -trimpath \
+#        -ldflags "-extldflags '-Wl,-dynamic-linker,$dynamic_linker -Wl,-rpath,/run/host$libc_dlddir_canonical_dirname' -linkmode external -X github.com/containers/toolbox/pkg/version.currentVersion=$3" \
+#        -o "$2/toolbox"
 go build \
         $tags \
         -trimpath \
-        -ldflags "-extldflags '-Wl,-dynamic-linker,$dynamic_linker -Wl,-rpath,/run/host$libc_dir_canonical_dirname' -linkmode external -X github.com/containers/toolbox/pkg/version.currentVersion=$3" \
+        -ldflags "-X github.com/containers/toolbox/pkg/version.currentVersion=$3" \
         -o "$2/toolbox"
 
 exit "$?"

@angiglesias
Copy link

This is a manifestation of #832

@debarshiray
Copy link
Member

Thanks for digging into this, @angiglesias

However, we can't change how the toolbox binary is built so easily. We need to understand exactly what's causing this before we can actually make some changes:

user: lookup userid 0: invalid argument

Are you running toolbox as root or as a normal user? It looks like the latter, but still, to be sure.

If it's the latter, then is your username on the host coming from /etc/passwd? If so, what does it look like?

@debarshiray debarshiray changed the title toolbox unable to start container fedora-toolbox-35 Container doesn't start because user.Current() fails inside the entry point Oct 14, 2022
@angiglesias
Copy link

@debarshiray I ran toolbox as available out of the box on Fedora Silverblue 34 to reproduce the original issue.

To test the container, I created a new container with podman with the same arguments used by toolbox (using podman inspect on the container made by toolbox). The only difference should be that I didn't use the default toolbox init-container as the container cmdline and instead used an interactive shell and invoked toolbox there.
As far as I know, the container starts as root and then init-container pivots to the user's uid, is that right?

Regarding the changes made to the build flags, of course, this was just a quick and dirty test to explore what was failing.
I'll add the blog post to my bookmarks. I find this library compatibility problem very interesting.

@patriziobrunops
Copy link
Author

patriziobrunops commented Oct 14, 2022

Thanks for digging into this, @angiglesias

However, we can't change how the toolbox binary is built so easily. We need to understand exactly what's causing this before we can actually make some changes:

@angiglesias Thank you for the help with this. At this stage I cannot spend that kind of time experimenting with toolbox. I have since updated to SB F36 and the problem has obviously disappeared.

user: lookup userid 0: invalid argument

Are you running toolbox as root or as a normal user? It looks like the latter, but still, to be sure.

normal user

If it's the latter, then is your username on the host coming from /etc/passwd? If so, what does it look like?

it comes from /etc/passwd and it's patriziobruno

@debarshiray
Copy link
Member

@patriziobrunops did something go wrong with your comment? :)

@debarshiray
Copy link
Member

As far as I know, the container starts as root and then init-container
pivots to the user's uid, is that right?

The toolbox init-container entry point is the first process started inside the container by podman start as root:root in terms of the container's user namespace. It keeps running as root:root, and doesn't change to something else.

The interactive shells offered by toolbox enter are invoked separately through podman exec and those always run as the same user on the host, which is most likely not root.

@debarshiray
Copy link
Member

To test the container, I created a new container with podman with
the same arguments used by toolbox (using podman inspect on the
container made by toolbox). The only difference should be that I
didn't use the default toolbox init-container as the container cmdline
and instead used an interactive shell and invoked toolbox there.

Yes, that's a smart way to test!

We could conditionalize this line to only run on the host:

currentUser, err = user.Current()

... because we don't use currentUser inside the container. However, before changing anything, I want to understand exactly what's going on here.

It makes sense that it's trying to look up the user for UID 0, because toolbox init-container is running as root:root inside the container. The question is, why does user.Current() think that it's an invalid argument.

Given that this is rootless Toolbx, UID 0 inside the container isn't the same UID 0 as on the host. There's a user namespace and some other non-0 UID from the host has been mapped to 0 inside the container. I am wondering if there's something wrong there.

Could you please show me the /etc/subuid and /etc/subgid files from your host?

And the UID/GID mappings for the namespace? Inside the container:

$ cat /proc/self/uid_map
...
$ cat /proc/self/gid_map
...

... and from the host:

$ cat /proc/<PID of entry point>/uid_map
...
$ cat /proc/<PID of entry point>/gid_map
...

@debarshiray
Copy link
Member

debarshiray commented Oct 25, 2022

As far as I can make out, we are going through current() in go.git/src/os/user/cgo_lookup_unix.go, which ends up calling getpwuid_r (0, ...). So, you could try to write a small C program that calls getpwuid_r (0, ...) and run it as root:root inside the container and see what's going on.

@angiglesias
Copy link

angiglesias commented Oct 25, 2022

@debarshiray Here's what I found:

  • /etc/subuid and /etc/subgid on the host:

    # /etc/subuid
    test-vm:100000:65536
    # /etc/subgid
    test-vm:100000:65536
    
  • /proc/self/uid_map and /proc/self/gid_map inside the container:

    [root@toolbox /]# cat /proc/self/uid_map 
             0          1       1000
          1000          0          1
          1001       1001      64536
    [root@toolbox /]# cat /proc/self/gid_map 
             0          1       1000
          1000          0          1
          1001       1001      64536
    
  • Test getpwuid_r inside the container with a small program:

    #include <unistd.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <errno.h>
    #include <pwd.h>
    
    int main(int argc, char *argv[])
    {
            struct passwd *result;
            struct passwd pwd;
            size_t bufsize;
            char *buf;
            int ret;
    
            bufsize = sysconf(_SC_GETPW_R_SIZE_MAX);
            if (bufsize == -1) { // value indeterminate, assign fallback
                    bufsize = 16384;
                    printf("WARNING: Assigning fallback buffer size\n");
            }
    
        buf = malloc(bufsize);
            if (buf == NULL) {
                    perror("malloc");
                    exit(EXIT_FAILURE);
            }
    
        ret = getpwuid_r(0, &pwd, buf, bufsize, &result);
            if (result == NULL) {
                    if (!ret)
                            printf("UID 0 not found!\n");
                    else {
                        	  errno = ret;
                            perror("getpwuid_r");
                    }
                    exit(EXIT_FAILURE);
            }
    
        printf("UID 0 found, Name %s\n", pwd.pw_name);
            return 0;
    }

    The result:

    [root@toolbox root]# gcc main.c -o test
    [root@toolbox root]# ./test 
    UID 0 found, Name root
    # toolbox error still present
    [root@toolbox root]# toolbox --log-level debug init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
    Error: failed to get the current user: user: lookup userid 0: invalid argument

    For additional information, here's the strace dump for toolbox
    I see in the dump that toolbox is opening some libnss subsets on the host. Could the problem be some strange interaction between different versions of the host dynamic linked libs and the libs on the container?

Update:
Looking at the strace dump, I've got a hunch that the issue is in how glibc loads dynamically the nss libs inside the container, from an older version with different flags. I'll look if I can override with LD_LIBRARY_PATH where it looks for this libraries to point to /run/host/...

@angiglesias
Copy link

angiglesias commented Oct 27, 2022

@debarshiray Doing the LD_LIBRARY_PATH trick doesn't fail there, but gives a new error a bit later because I force loading libs from the host and toolbox seems to be invoking binaries inside the container

[root@toolbox /]# LD_LIBRARY_PATH=/run/host/lib64/:/lib64 toolbox --log-level debug init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
DEBU Running as real user ID 0                    
DEBU Resolved absolute path to the executable as /usr/bin/toolbox 
DEBU TOOLBOX_PATH is /usr/bin/toolbox             
DEBU Migrating to newer Podman                    
DEBU Setting up configuration                     
DEBU Setting up configuration: file /etc/containers/toolbox.conf not found 
DEBU Setting up configuration: file /var/home/test-vm/.config/containers/toolbox.conf not found 
DEBU Resolving image name                         
DEBU Distribution (CLI): ''                       
DEBU Image (CLI): ''                              
DEBU Release (CLI): ''                            
DEBU Resolved image name                          
DEBU Image: 'fedora-toolbox:35'                   
DEBU Release: '35'                                
DEBU Resolving container name                     
DEBU Container: ''                                
DEBU Image: 'fedora-toolbox:35'                   
DEBU Release: '35'                                
DEBU Resolved container name                      
DEBU Container: 'fedora-toolbox-35'               
DEBU Creating /run/.toolboxenv                    
DEBU Monitoring host                              
DEBU Path /run/host/etc exists                    
DEBU Resolved /etc/localtime to /run/host/usr/share/zoneinfo/Europe/Madrid 
DEBU Creating regular file /etc/machine-id        
DEBU Binding /etc/machine-id to /run/host/etc/machine-id 
mount: /run/host/lib64/libc.so.6: version `GLIBC_2.34' not found (required by mount)
Error: failed to bind /etc/machine-id to /run/host/etc/machine-id

@angiglesias
Copy link

@debarshiray another update, preloading /libnss_files.so.2 from the host does the trick and works well

[root@toolbox /]# LD_PRELOAD=/run/host/lib64/libnss_files.so.2 toolbox --log-level debug init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
DEBU Running as real user ID 0                    
DEBU Resolved absolute path to the executable as /usr/bin/toolbox 
DEBU TOOLBOX_PATH is /usr/bin/toolbox             
DEBU Migrating to newer Podman                    
DEBU Setting up configuration                     
DEBU Setting up configuration: file /etc/containers/toolbox.conf not found 
DEBU Setting up configuration: file /var/home/test-vm/.config/containers/toolbox.conf not found 
DEBU Resolving image name                         
DEBU Distribution (CLI): ''                       
DEBU Image (CLI): ''                              
DEBU Release (CLI): ''                            
DEBU Resolved image name                          
DEBU Image: 'fedora-toolbox:35'                   
DEBU Release: '35'                                
DEBU Resolving container name                     
DEBU Container: ''                                
DEBU Image: 'fedora-toolbox:35'                   
DEBU Release: '35'                                
DEBU Resolved container name                      
DEBU Container: 'fedora-toolbox-35'               
DEBU Creating /run/.toolboxenv                    
DEBU Monitoring host                              
DEBU Path /run/host/etc exists                    
DEBU Preparing to redirect /etc/host.conf to /run/host/etc/host.conf 
DEBU /run/host/etc/host.conf isn't a symbolic link 
DEBU Redirecting /etc/host.conf to /run/host/etc/host.conf 
DEBU Preparing to redirect /etc/hosts to /run/host/etc/hosts 
DEBU /run/host/etc/hosts isn't a symbolic link    
DEBU Redirecting /etc/hosts to /run/host/etc/hosts 
DEBU Preparing to redirect /etc/localtime to /run/host/etc/localtime 
DEBU /run/host/etc/localtime is a symbolic link   
DEBU Redirecting /etc/localtime to /run/host/etc/localtime 
DEBU Resolved /etc/localtime to /run/host/usr/share/zoneinfo/Europe/Madrid 
DEBU Preparing to redirect /etc/resolv.conf to /run/host/etc/resolv.conf 
DEBU /run/host/etc/resolv.conf isn't a symbolic link 
DEBU Redirecting /etc/resolv.conf to /run/host/etc/resolv.conf 
DEBU Creating regular file /etc/machine-id        
DEBU Binding /etc/machine-id to /run/host/etc/machine-id 
DEBU Creating directory /run/systemd/journal      
DEBU Binding /run/systemd/journal to /run/host/run/systemd/journal 
DEBU Creating directory /run/systemd/resolve      
DEBU Binding /run/systemd/resolve to /run/host/run/systemd/resolve 
DEBU Creating directory /run/udev/data            
DEBU Binding /run/udev/data to /run/host/run/udev/data 
DEBU Creating directory /tmp                      
DEBU Binding /tmp to /run/host/tmp                
DEBU Creating directory /var/lib/flatpak          
DEBU Binding /var/lib/flatpak to /run/host/var/lib/flatpak 
DEBU Creating directory /var/lib/systemd/coredump 
DEBU Binding /var/lib/systemd/coredump to /run/host/var/lib/systemd/coredump 
DEBU Creating directory /var/log/journal          
DEBU Binding /var/log/journal to /run/host/var/log/journal 
DEBU Creating directory /var/mnt                  
DEBU Binding /var/mnt to /run/host/var/mnt        
DEBU Creating directory /sys/fs/selinux           
DEBU Binding /sys/fs/selinux to /usr/share/empty  
DEBU Preparing to redirect /media to /run/media   
DEBU /run/media isn't a symbolic link             
DEBU Redirecting /media to /run/media             
DEBU Preparing to redirect /mnt to /var/mnt       
DEBU /var/mnt isn't a symbolic link               
DEBU Redirecting /mnt to /var/mnt                 
DEBU Preparing to redirect /home to /var/home     
DEBU /var/home isn't a symbolic link              
DEBU Redirecting /home to /var/home               
DEBU Looking up group for sudo                    
DEBU Group for sudo is wheel                      
DEBU Modifying user test-vm with UID 1000:        
DEBU usermod                                      
DEBU --append                                     
DEBU --groups                                     
DEBU wheel                                        
DEBU --home                                       
DEBU /var/home/test-vm                            
DEBU --shell                                      
DEBU /bin/bash                                    
DEBU --uid                                        
DEBU 1000                                         
DEBU test-vm                                      
DEBU Removing password for user test-vm           
DEBU Removing password for user root              
passwd: Note: deleting a password also unlocks the password.
DEBU Setting KCM as the default Kerberos credential cache 
DEBU Configuring RPM to ignore bind mounts        
DEBU Setting up daily ticker                      
DEBU Setting up watches for file system events    
DEBU Finished initializing container              
DEBU Creating runtime directory /run/user/1000/toolbox 
DEBU Creating initialization stamp /run/user/1000/toolbox/container-initialized-8016 
DEBU Listening to file system and ticker events   
WARN Failed to run updatedb(8): updatedb(1) not found

@angiglesias
Copy link

@debarshiray By chance, did you have time to have a look at this dumps? Were they helpful? If it is helpful to you, I can dig more data. I just didn't want to send out more noise on this issue. If there's anything I can help with regarding this issue, please let me know

@debarshiray
Copy link
Member

Hey, @angiglesias , my apologies, I didn't have to time get back to this, but thanks for continuing to dig into it.

This is expected:

[root@toolbox /]# LD_LIBRARY_PATH=/run/host/lib64/:/lib64 toolbox --log-level debug init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
...
...     
DEBU Binding /etc/machine-id to /run/host/etc/machine-id 
mount: /run/host/lib64/libc.so.6: version `GLIBC_2.34' not found (required by mount)
Error: failed to bind /etc/machine-id to /run/host/etc/machine-id

... because Fedora 35 inside the container uses glibc-2.34. Binaries, in this case mount(8), that were built against it can't be run against the Fedora 34 host with glibc-2.33.

@debarshiray
Copy link
Member

Thanks for the strace(1) log!

I see that toolbox(1) inside the container is linking against libpthread.so and libc.so from the host:

openat(AT_FDCWD, "/run/host/usr/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
...
openat(AT_FDCWD, "/run/host/usr/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3

... but it's using the Name Service Switch (or NSS) plugins from the container:

openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3

... which is a problem. I suspect this is another fallout from DT_RUNPATH not being transitive. The toolbox(1) binary's DT_RUNPATH doesn't affect libc.so loading those NSS plugins.

And the sss plugin seems to be missing from the container:

openat(AT_FDCWD, "/usr/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

It makes me curious if the sss plugin was introduced in Fedora 35 and wonder if there are changes in /etc/nsswitch.conf between Fedora 34 and 35.

@debarshiray
Copy link
Member

We can always use DT_RPATH instead of DT_RUNPATH because it's transitive. However, it makes me cringe because DT_RPATH is supposed to be deprecated in favour of DT_RUNPATH.

@angiglesias
Copy link

Thanks for the strace(1) log!

I see that toolbox(1) inside the container is linking against libpthread.so and libc.so from the host:

openat(AT_FDCWD, "/run/host/usr/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
...
openat(AT_FDCWD, "/run/host/usr/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3

... but it's using the Name Service Switch (or NSS) plugins from the container:

openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3

... which is a problem. I suspect this is another fallout from DT_RUNPATH not being transitive. The toolbox(1) binary's DT_RUNPATH doesn't affect libc.so loading those NSS plugins.

And the sss plugin seems to be missing from the container:

openat(AT_FDCWD, "/usr/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

It makes me curious if the sss plugin was introduced in Fedora 35 and wonder if there are changes in /etc/nsswitch.conf between Fedora 34 and 35.

From here looks like there was a package providing this plugin in Fedora 34, but maybe it wasn't available on the default installation.

@angiglesias
Copy link

angiglesias commented Mar 21, 2023

@debarshiray I've poking around a little bit. When I force at runtime using the libs found in /run/host/...

LD_LIBRARY_PATH=/run/host/lib64/:/run/host/lib toolbox --log-level debug init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link

Then commands invoked from initContainer.go such mount and usermod fails because ld tries to load the older glibc from the host. Would be reasonable to implement a fallback trying to exec those commands from the host when the container local command fails?

@angiglesias
Copy link

@debarshiray @HarryMichal coming back to this issue, after reviewing ld.so(8) I think we can solve this problem using LD_LIBRARY_PATH with minimal changes to avoid the issues derived from RUNPATH not being transitive without using the deprecated way with RPATH:

--- a/src/pkg/shell/shell.go
+++ b/src/pkg/shell/shell.go
@@ -22,6 +22,7 @@ import (
        "io"
        "os"
        "os/exec"
+       "strings"
 
        "github.com/sirupsen/logrus"
 )
@@ -48,6 +49,17 @@ func RunWithExitCode(name string, stdin io.Reader, stdout, stderr io.Writer, arg
        cmd.Stdout = stdout
        cmd.Stderr = stderr
 
+       // Get current process environment
+       cmd.Env = os.Environ()
+       // Delete injected LD_LIBRARY_PATH to use local container runtime environment
+       for i, entry := range cmd.Env {
+               if strings.HasPrefix(entry, "LD_LIBRARY_PATH") {
+                       copy(cmd.Env[i:], cmd.Env[i+1:])
+                       cmd.Env = cmd.Env[:len(cmd.Env)-1]
+                       break
+               }
+       }
+
        if err := cmd.Run(); err != nil {
                if errors.Is(err, exec.ErrNotFound) {
                        return 1, fmt.Errorf("%s(1) not found", name)

And invoking init-container specifying the LD_LIBRARY_PATH pointing to the host libs:

test-vm@silverblue34 ~]$ podman run -it --rm --dns none --env TOOLBOX_PATH=/usr/bin/toolbox --env XDG_RUNTIME_DIR=/run/user/1000 --hostname toolbox --ipc host --label com.github.containers.toolbox=true --mount type=devpts,destination=/dev/pts --name fedora-toolbox-35-manual --network host --no-hosts --pid host --privileged --security-opt label=disable --ulimit host --userns keep-id --user root:root --volume /:/run/host:rslave --volume /dev:/dev:rslave --volume /run/dbus/system_bus_socket:/run/dbus/system_bus_socket --volume /var/home/test-vm:/var/home/test-vm:rslave --volume /var/home/test-vm/Documentos/toolbox/builddir/src/toolbox:/usr/bin/toolbox:ro --volume /run/user/1000:/run/user/1000 --volume /run/avahi-daemon/socket:/run/avahi-daemon/socket --volume /run/.heim_org.h5l.kcm-socket:/run/.heim_org.h5l.kcm-socket --volume /run/pcscd/pcscd.comm:/run/pcscd/pcscd.comm --volume /run/media:/run/media:rslave --volume /etc/profile.d/toolbox.sh:/etc/profile.d/toolbox.sh:ro registry.fedoraproject.org/fedora-toolbox:35 /bin/bash
bash-5.1# LD_LIBRARY_PATH=/run/host/lib64/:/run/host/lib toolbox --log-level debug init-container --gid 1000 --home /var/home/test-vm --shell /bin/bash --uid 1000 --user test-vm --monitor-host --home-link --media-link --mnt-link
DEBU Running as real user ID 0                    
DEBU Resolved absolute path to the executable as /usr/bin/toolbox 
DEBU TOOLBOX_PATH is /usr/bin/toolbox             
DEBU Migrating to newer Podman                    
DEBU Setting up configuration                     
DEBU Setting up configuration: file /etc/containers/toolbox.conf not found 
DEBU Setting up configuration: file //.config/containers/toolbox.conf not found 
DEBU Resolving image name                         
DEBU Distribution (CLI): ''                       
DEBU Image (CLI): ''                              
DEBU Release (CLI): ''                            
DEBU Resolved image name                          
DEBU Image: 'fedora-toolbox:35'                   
DEBU Release: '35'                                
DEBU Resolving container name                     
DEBU Container: ''                                
DEBU Image: 'fedora-toolbox:35'                   
DEBU Release: '35'                                
DEBU Resolved container name                      
DEBU Container: 'fedora-toolbox-35'               
DEBU Creating /run/.toolboxenv                    
DEBU Monitoring host                              
DEBU Path /run/host/etc exists                    
DEBU Preparing to redirect /etc/host.conf to /run/host/etc/host.conf 
DEBU /run/host/etc/host.conf isn't a symbolic link 
DEBU Redirecting /etc/host.conf to /run/host/etc/host.conf 
DEBU Preparing to redirect /etc/hosts to /run/host/etc/hosts 
DEBU /run/host/etc/hosts isn't a symbolic link    
DEBU Redirecting /etc/hosts to /run/host/etc/hosts 
DEBU Preparing to redirect /etc/localtime to /run/host/etc/localtime 
DEBU /run/host/etc/localtime is a symbolic link   
DEBU Redirecting /etc/localtime to /run/host/etc/localtime 
DEBU Resolved /etc/localtime to /run/host/usr/share/zoneinfo/Europe/Madrid 
DEBU Preparing to redirect /etc/resolv.conf to /run/host/etc/resolv.conf 
DEBU /run/host/etc/resolv.conf isn't a symbolic link 
DEBU Redirecting /etc/resolv.conf to /run/host/etc/resolv.conf 
DEBU Creating regular file /etc/machine-id        
DEBU Binding /etc/machine-id to /run/host/etc/machine-id 
DEBU Creating directory /run/systemd/journal      
DEBU Binding /run/systemd/journal to /run/host/run/systemd/journal 
DEBU Creating directory /run/systemd/resolve      
DEBU Binding /run/systemd/resolve to /run/host/run/systemd/resolve 
DEBU Creating directory /run/udev/data            
DEBU Binding /run/udev/data to /run/host/run/udev/data 
DEBU Creating directory /tmp                      
DEBU Binding /tmp to /run/host/tmp                
DEBU Creating directory /var/lib/flatpak          
DEBU Binding /var/lib/flatpak to /run/host/var/lib/flatpak 
DEBU Creating directory /var/lib/systemd/coredump 
DEBU Binding /var/lib/systemd/coredump to /run/host/var/lib/systemd/coredump 
DEBU Creating directory /var/log/journal          
DEBU Binding /var/log/journal to /run/host/var/log/journal 
DEBU Creating directory /var/mnt                  
DEBU Binding /var/mnt to /run/host/var/mnt        
DEBU Creating directory /sys/fs/selinux           
DEBU Binding /sys/fs/selinux to /usr/share/empty  
DEBU Preparing to redirect /media to /run/media   
DEBU /run/media isn't a symbolic link             
DEBU Redirecting /media to /run/media             
DEBU Preparing to redirect /mnt to /var/mnt       
DEBU /var/mnt isn't a symbolic link               
DEBU Redirecting /mnt to /var/mnt                 
DEBU Preparing to redirect /home to /var/home     
DEBU /var/home isn't a symbolic link              
DEBU Redirecting /home to /var/home               
DEBU Looking up group for sudo                    
DEBU Group for sudo is wheel                      
DEBU Modifying user test-vm with UID 1000:        
DEBU usermod                                      
DEBU --append                                     
DEBU --groups                                     
DEBU wheel                                        
DEBU --home                                       
DEBU /var/home/test-vm                            
DEBU --shell                                      
DEBU /bin/bash                                    
DEBU --uid                                        
DEBU 1000                                         
DEBU test-vm                                      
DEBU Removing password for user test-vm           
DEBU Removing password for user root              
passwd: Note: deleting a password also unlocks the password.
DEBU Setting KCM as the default Kerberos credential cache 
DEBU Setting up daily ticker                      
DEBU Setting up watches for file system events    
DEBU Finished initializing container              
DEBU Creating runtime directory /run/user/1000/toolbox 
DEBU Creating initialization stamp /run/user/1000/toolbox/container-initialized-19860 
DEBU Listening to file system and ticker events   
WARN Failed to run updatedb(8): updatedb(1) not found

@HarryMichal HarryMichal added the 2. Container Initialization Related to setting up the container environment, libc-compatibiltiy and such label Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working 2. Container Initialization Related to setting up the container environment, libc-compatibiltiy and such
Projects
None yet
Development

No branches or pull requests

4 participants