Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/usr/bin/toolbox linked against glibc-2.32 doesn't run on older glibc #529

Closed
juanje opened this issue Aug 14, 2020 · 7 comments
Closed
Labels
1. Bug Something isn't working

Comments

@juanje
Copy link
Contributor

juanje commented Aug 14, 2020

Describe the bug

Usually, you can run another Fedora release with toolbox by doing:

# On Fedora 32
$ toolbox create --release 31
Created container: fedora-toolbox-31
Enter with: toolbox enter --release 31
$ toolbox enter --release 31

But at Fedora Rawhide you get the following error:

# On Fedora 33
$ toolbox create --release 32
Created container: fedora-toolbox-32
Enter with: toolbox enter --release 32
$ toolbox enter --release 32
Error: invalid entry point PID of container fedora-toolbox-32

The full debug output:

$ toolbox -v enter -r 32
DEBU Running as real user ID 1000                 
DEBU Resolved absolute path to the executable as /usr/local/bin/toolbox 
DEBU Running on a cgroups v2 host                 
DEBU Checking if /etc/subgid and /etc/subuid have entries for user vagrant 
DEBU TOOLBOX_PATH is /usr/local/bin/toolbox       
DEBU Toolbox config directory is /home/vagrant/.config/toolbox 
DEBU Current Podman version is 2.1.0-dev          
DEBU Old Podman version is 2.1.0-dev              
DEBU Migration not needed: Podman version 2.1.0-dev is unchanged 
DEBU Resolving container and image names          
DEBU Container: ''                                
DEBU Image: ''                                    
DEBU Release: '32'                                
DEBU Resolved container and image names           
DEBU Container: 'fedora-toolbox-32'               
DEBU Image: 'fedora-toolbox:32'                   
DEBU Release: '32'                                
DEBU Checking if container fedora-toolbox-32 exists 
DEBU Calling org.freedesktop.Flatpak.SessionHelper.RequestSession 
DEBU Starting container fedora-toolbox-32         
DEBU Inspecting entry point of container fedora-toolbox-32 
DEBU Entry point PID is a float64                 
DEBU Entry point of container fedora-toolbox-32 is toolbox (PID=0) 
Error: invalid entry point PID of container fedora-toolbox-32

Steps how to reproduce the behaviour

  1. Enter into a Fedora Rawhide/33 system.
  2. Compile and install the current Toolbox versions from the source.
  3. Create a new container from the previoues release (32) with:
$ toolbox create --release 32
  1. Try to enter at that container by typing:
$ toolbox enter --release 32
  1. It fails.

Expected behaviour
That toolbox would enter the container normally.

Actual behaviour
It fails to perform enter on the container.

Output of toolbox --version (v0.0.90+)
toolbox version 0.0.93

Toolbox package info (rpm -q toolbox)
It was installed from the sources, no toolbox package installed.

Output of podman version

Version:      2.1.0-dev
API Version:  1
Go Version:   go1.15rc2
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64

Podman package info (rpm -q podman)
podman-2.1.0-0.169.dev.git162625f.fc33.x86_64`

Info about your OS
I tested in a virtual machine with Vagrant. The OS is Fedora 33.
It was a fresh installation from today (August 14th) and with all the packages updated.

Additional context
I tried with the releases 31 and 32. The release 33 (the same that the host) was working fine. Well, apart for the bug: #523
Also, I tied the same at another VM with Fedora 32 and it worked fine. I tried at that F32 VM with the releases 29, 31, 32 and 33, with no problems.
I notice though, that the entry point PID, the one from the error, was different. At the Rawhide system the PID was always 0, but at my system (Silverblue 32) and the VM (Fedora 32) were always a non-zero value. Something like PID=32612 and such.

For example (inside Fedora 32 VM):

$ toolbox -v enter -r 29
DEBU Running as real user ID 1000                 
DEBU Resolved absolute path to the executable as /usr/local/bin/toolbox 
DEBU Running on a cgroups v2 host                 
DEBU Checking if /etc/subgid and /etc/subuid have entries for user vagrant 
DEBU TOOLBOX_PATH is /usr/local/bin/toolbox       
DEBU Toolbox config directory is /home/vagrant/.config/toolbox 
DEBU Current Podman version is 2.0.2              
DEBU Old Podman version is 2.0.2                  
...
DEBU Starting container fedora-toolbox-29         
DEBU Inspecting entry point of container fedora-toolbox-29 
DEBU Entry point PID is a float64                 
DEBU Entry point of container fedora-toolbox-29 is toolbox (PID=33068) 
DEBU Waiting for container fedora-toolbox-29 to finish initializing 
...
DEBU --                                           
DEBU -c                                           
DEBU exec "$@"                                    
DEBU /bin/sh                                      
DEBU /bin/bash                                    
DEBU -l
@juanje juanje added the 1. Bug Something isn't working label Aug 14, 2020
@debarshiray
Copy link
Member

DEBU Entry point of container fedora-toolbox-32 is toolbox (PID=0)
Error: invalid entry point PID of container fedora-toolbox-32

This looks like the container failed to start.

Once you have attempted to enter and it failed, could you please try this command and post the logs from it:

$ podman start --attach fedora-toolbox-32

@juanje
Copy link
Contributor Author

juanje commented Aug 19, 2020

I did and this is the error message:

[vagrant@ci-node-33 ~]$ podman start --attach fedora-toolbox-32
toolbox: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by toolbox)

Here is with the debug output:

[vagrant@ci-node-33 ~]$ podman --log-level debug start --attach fedora-toolbox-32
INFO[0000] podman filtering at log level debug          
DEBU[0000] Called start.PersistentPreRunE(podman --log-level debug start --attach fedora-toolbox-32) 
DEBU[0000] Ignoring libpod.conf EventsLogger setting "/home/vagrant/.config/containers/containers.conf". Use "journald" if you want to change this setting and remove libpod.conf files. 
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf" 
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.18.0 Annotations:[] CgroupNS:private Cgroups:enabled DefaultCapabilities:[CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] DefaultSysctls:[] DefaultUlimits:[] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableLabeling:true Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] EnvHost:false HTTPProxy:false Init:false InitPath: IPCNS:private LogDriver:k8s-file LogSizeMax:-1 NetNS:slirp4netns NoHosts:false PidsLimit:2048 PidNS:private SeccompProfile:/usr/share/containers/seccomp.json ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS:host UserNSSize:65536} Engine:{CgroupCheck:true CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath:/run/user/1000/libpod/tmp/events/events.log EventsLogger:file HooksDir:[/usr/share/containers/oci/hooks.d] ImageDefaultTransport:docker:// InfraCommand:/pause InfraImage:k8s.gcr.io/pause:3.2 InitPath:/usr/libexec/podman/catatonit LockType:shm Namespace: NetworkCmdPath: NoPivotRoot:false NumLocks:2048 OCIRuntime:/usr/bin/crun OCIRuntimes:map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] RuntimePath:[] RuntimeSupportsJSON:[crun runc] RuntimeSupportsNoCgroups:[crun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 StaticDir:/home/vagrant/.local/share/containers/storage/libpod StopTimeout:10 TmpDir:/run/user/1000/libpod/tmp VolumePath:/home/vagrant/.local/share/containers/storage/volumes} Network:{CNIPluginDirs:[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] DefaultNetwork:podman NetworkConfigDir:/home/vagrant/.config/cni/net.d}} 
DEBU[0000] Using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /home/vagrant/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /home/vagrant/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/1000/containers     
DEBU[0000] Using static dir /home/vagrant/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path /home/vagrant/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs 
DEBU[0000] backingFs=xfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false 
DEBU[0000] Initializing event backend file              
DEBU[0000] using runtime "/usr/bin/runc"                
DEBU[0000] using runtime "/usr/bin/crun"                
WARN[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument 
DEBU[0000] using runtime "/usr/bin/crun"                
INFO[0000] Setting parallel job count to 7              
DEBU[0000] overlay: mount_data=lowerdir=/home/vagrant/.local/share/containers/storage/overlay/l/55ZKQ2LAMO7JLCJ4FVOE6E6TBG:/home/vagrant/.local/share/containers/storage/overlay/l/2SIOJSIO6LJDSZJHEL6FUA34V2,upperdir=/home/vagrant/.local/share/containers/storage/overlay/3109f3facec37f0bec076d921731107d23a36c918a3eb24bb03685f10179801f/diff,workdir=/home/vagrant/.local/share/containers/storage/overlay/3109f3facec37f0bec076d921731107d23a36c918a3eb24bb03685f10179801f/work,context="system_u:object_r:container_file_t:s0:c14,c464" 
DEBU[0000] mounted container "e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e" at "/home/vagrant/.local/share/containers/storage/overlay/3109f3facec37f0bec076d921731107d23a36c918a3eb24bb03685f10179801f/merged" 
DEBU[0000] Created root filesystem for container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e at /home/vagrant/.local/share/containers/storage/overlay/3109f3facec37f0bec076d921731107d23a36c918a3eb24bb03685f10179801f/merged 
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret 
DEBU[0000] Setting CGroups for container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e to user.slice:libpod:e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e 
DEBU[0000] set root propagation to "rslave"             
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d 
DEBU[0000] Created OCI spec for container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e at /home/vagrant/.local/share/containers/storage/overlay-containers/e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e/userdata/config.json 
DEBU[0000] /usr/bin/conmon messages will be logged to syslog 
DEBU[0000] running conmon: /usr/bin/conmon               args="[--api-version 1 -c e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e -u e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e -r /usr/bin/crun -b /home/vagrant/.local/share/containers/storage/overlay-containers/e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e/userdata -p /run/user/1000/containers/overlay-containers/e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e/userdata/pidfile -n fedora-toolbox-32 --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket -s -l k8s-file:/home/vagrant/.local/share/containers/storage/overlay-containers/e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e/userdata/ctr.log --log-level debug --syslog --conmon-pidfile /run/user/1000/containers/overlay-containers/e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/vagrant/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg /usr/bin/crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e]"
INFO[0000] Running conmon under slice user.slice and unitName libpod-conmon-e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e.scope 
[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied

DEBU[0000] Received: 1684                               
INFO[0000] Got Conmon PID as 1681                       
DEBU[0000] Created container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e in OCI runtime 
DEBU[0000] Attaching to container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e 
DEBU[0000] connecting to socket /run/user/1000/libpod/tmp/socket/e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e/attach 
DEBU[0000] Starting container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e with command [toolbox --verbose init-container --home /home/vagrant --monitor-host --shell /bin/bash --uid 1000 --user vagrant] 
DEBU[0000] Started container e54f97de47f50f5ffc98d514629170dce4decd308861aee4e6bccc017abfd24e 
DEBU[0000] Enabling signal proxying                     
toolbox: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by toolbox)
DEBU[0000] Called start.PersistentPostRunE(podman --log-level debug start --attach fedora-toolbox-32) 

@HarryMichal
Copy link
Member

Yep, this is not good. Basically some library pulled in by the Toolbox code uses C code requiring the use of cgo. When cgo is used during the build and a libc library is found on the system, the binary is dynamically linked to the libc library. This is normally ok but since Toolbox is used as the entry-point of containers, different versions of libc (resp. glibc) can break it. I have a proposal for a fix (#531) but I need to discuss it more with @debarshiray.

@debarshiray
Copy link
Member

Yes, as @HarryMichal mentioned, this is the problematic part:

[vagrant@ci-node-33 ~]$ podman start --attach fedora-toolbox-32
toolbox: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by toolbox)

It means that a /usr/bin/toolbox binary linked against glibc-2.32 from Fedora 33 on the host, doesn't work with an older glibc from older Fedora releases inside the container.

We end up in this situation because we bind mount /usr/bin/toolbox from the host into the container so that it can be used as the container's entry-point. This worked fine in the past when Toolbox was implemented in POSIX shell, and it would be picked up the environment's /bin/sh. However, it doesn't work so well when it's an ELF binary.

@debarshiray
Copy link
Member

I grabbed the Fedora 33 binary, unpacked it and poked at it a bit.

$ rpm2cpio ./toolbox-0.0.93-2.fc33.x86_64.rpm | cpio -idmv
...
...
$ objdump -T ./usr/bin/toolbox | grep GLIBC_2.32
0000000000000000      DO *UND*	0000000000000000  GLIBC_2.32  pthread_sigmask

Looks like there's a new implementation of pthread_sigmask in glibc-2.32.

@debarshiray debarshiray changed the title Toolbox at Rawhide (Fedora 33) doesn't run other releases /usr/bin/toolbox linked against glibc-2.32 doesn't run on older glibc Aug 20, 2020
debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 21, 2020
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available.

The Go implementation also mostly worked so far because it's largely
statically linked, with the notable exception of the standard C
library. However, recently glibc-2.32, which is used by Fedora 33
onwards, added a new version of the pthread_sigmask symbol [1] as part
of the libpthread removal project.

This means that /usr/bin/toolbox binaries built against glibc-2.32 on
newer Fedoras pick up the latest version of the symbol and fail to run
against older glibcs in older Fedoras.

One way to fix this is to disable the use of any C code from Go by
using the CGO_ENABLED environment variable [2]. However, this can
negatively impact packages like "os/user" [3] and "net" [4], where the
more featureful glibc APIs will be replaced by more limited
equivalents written only in Go.

Instead, since glibc uses symbol versioning, it's better to tell the
Go toolchain to avoid linking against any symbols from glibc-2.32.

This was accomplished by a few linker tricks:

  * The GNU ld linker's --wrap flag was used when building the Go code
    to divert pthread_sigmask invocations from Go to another function
    called __wrap_pthread_sigmask.

  * A static library was added to provide this __wrap_pthread_sigmask
    function, which forwards calls to the actual pthread_sigmask API in
    glibc. This library itself was not linked with --wrap, and
    specifies the latest permissible version of the pthread_sigmask
    symbol from glibc for each architecture. Currently, the list of
    architectures covers the ones that Fedora builds for.

  * The Go cmd/link linker was switched to external mode [5]. This
    ensures that the final object file containing all the Go code gets
    linked to the standard C library and the wrapper static library by
    the GNU ld linker for the --wrap flag to kick in.

Based on ideas from Ondřej Míchal.

[1] glibc commit c6663fee4340291c
    https://sourceware.org/git/?p=glibc.git;a=commit;h=c6663fee4340291c

[2] https://golang.org/cmd/cgo/

[3] https://golang.org/pkg/os/user/

[4] https://golang.org/pkg/net/

[5] https://golang.org/src/cmd/cgo/doc.go

containers#529
debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 21, 2020
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available.

The Go implementation also mostly worked so far because it's largely
statically linked, with the notable exception of the standard C
library. However, recently glibc-2.32, which is used by Fedora 33
onwards, added a new version of the pthread_sigmask symbol [1] as part
of the libpthread removal project:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.32
  0000000000000000      DO *UND*	0000000000000000  GLIBC_2.32
    pthread_sigmask

This means that /usr/bin/toolbox binaries built against glibc-2.32 on
newer Fedoras pick up the latest version of the symbol and fail to run
against older glibcs in older Fedoras.

One way to fix this is to disable the use of any C code from Go by
using the CGO_ENABLED environment variable [2]. However, this can
negatively impact packages like "os/user" [3] and "net" [4], where the
more featureful glibc APIs will be replaced by more limited
equivalents written only in Go.

Instead, since glibc uses symbol versioning, it's better to tell the
Go toolchain to avoid linking against any symbols from glibc-2.32.

This was accomplished by a few linker tricks:

  * The GNU ld linker's --wrap flag was used when building the Go code
    to divert pthread_sigmask invocations from Go to another function
    called __wrap_pthread_sigmask.

  * A static library was added to provide this __wrap_pthread_sigmask
    function, which forwards calls to the actual pthread_sigmask API in
    glibc. This library itself was not linked with --wrap, and
    specifies the latest permissible version of the pthread_sigmask
    symbol from glibc for each architecture. Currently, the list of
    architectures covers the ones that Fedora builds for.

  * The Go cmd/link linker was switched to external mode [5]. This
    ensures that the final object file containing all the Go code gets
    linked to the standard C library and the wrapper static library by
    the GNU ld linker for the --wrap flag to kick in.

Based on ideas from Ondřej Míchal.

[1] glibc commit c6663fee4340291c
    https://sourceware.org/git/?p=glibc.git;a=commit;h=c6663fee4340291c

[2] https://golang.org/cmd/cgo/

[3] https://golang.org/pkg/os/user/

[4] https://golang.org/pkg/net/

[5] https://golang.org/src/cmd/cgo/doc.go

containers#529
debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 21, 2020
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available.

The Go implementation also mostly worked so far because it's largely
statically linked, with the notable exception of the standard C
library. However, recently glibc-2.32, which is used by Fedora 33
onwards, added a new version of the pthread_sigmask symbol [1] as part
of the libpthread removal project:
  $ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
  0000000000000000      DO *UND*	0000000000000000  GLIBC_2.32
    pthread_sigmask

This means that /usr/bin/toolbox binaries built against glibc-2.32 on
newer Fedoras pick up the latest version of the symbol and fail to run
against older glibcs in older Fedoras.

One way to fix this is to disable the use of any C code from Go by
using the CGO_ENABLED environment variable [2]. However, this can
negatively impact packages like "os/user" [3] and "net" [4], where the
more featureful glibc APIs will be replaced by more limited
equivalents written only in Go.

Instead, since glibc uses symbol versioning, it's better to tell the
Go toolchain to avoid linking against any symbols from glibc-2.32.

This was accomplished by a few linker tricks:

  * The GNU ld linker's --wrap flag was used when building the Go code
    to divert pthread_sigmask invocations from Go to another function
    called __wrap_pthread_sigmask.

  * A static library was added to provide this __wrap_pthread_sigmask
    function, which forwards calls to the actual pthread_sigmask API in
    glibc. This library itself was not linked with --wrap, and
    specifies the latest permissible version of the pthread_sigmask
    symbol from glibc for each architecture. Currently, the list of
    architectures covers the ones that Fedora builds for.

  * The Go cmd/link linker was switched to external mode [5]. This
    ensures that the final object file containing all the Go code gets
    linked to the standard C library and the wrapper static library by
    the GNU ld linker for the --wrap flag to kick in.

Based on ideas from Ondřej Míchal.

[1] glibc commit c6663fee4340291c
    https://sourceware.org/git/?p=glibc.git;a=commit;h=c6663fee4340291c

[2] https://golang.org/cmd/cgo/

[3] https://golang.org/pkg/os/user/

[4] https://golang.org/pkg/net/

[5] https://golang.org/src/cmd/cgo/doc.go

containers#529
@debarshiray
Copy link
Member

I found a way to tell the Go toolchain to avoid the new version of the pthread_sigmask symbol from glibc-2.32. See: #534

Testing welcome.

debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 21, 2020
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available.

The Go implementation also mostly worked so far because it's largely
statically linked, with the notable exception of the standard C
library. However, recently glibc-2.32, which is used by Fedora 33
onwards, added a new version of the pthread_sigmask symbol [1] as part
of the libpthread removal project:
  $ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
  0000000000000000      DO *UND*	0000000000000000  GLIBC_2.32
    pthread_sigmask

This means that /usr/bin/toolbox binaries built against glibc-2.32 on
newer Fedoras pick up the latest version of the symbol and fail to run
against older glibcs in older Fedoras.

One way to fix this is to disable the use of any C code from Go by
using the CGO_ENABLED environment variable [2]. However, this can
negatively impact packages like "os/user" [3] and "net" [4], where the
more featureful glibc APIs will be replaced by more limited
equivalents written only in Go.

Instead, since glibc uses symbol versioning, it's better to tell the
Go toolchain to avoid linking against any symbols from glibc-2.32.

This was accomplished by a few linker tricks:

  * The GNU ld linker's --wrap flag was used when building the Go code
    to divert pthread_sigmask invocations from Go to another function
    called __wrap_pthread_sigmask.

  * A static library was added to provide this __wrap_pthread_sigmask
    function, which forwards calls to the actual pthread_sigmask API in
    glibc. This library itself was not linked with --wrap, and
    specifies the latest permissible version of the pthread_sigmask
    symbol from glibc for each architecture. Currently, the list of
    architectures covers the ones that Fedora builds for.

  * The Go cmd/link linker was switched to external mode [5]. This
    ensures that the final object file containing all the Go code gets
    linked to the standard C library and the wrapper static library by
    the GNU ld linker for the --wrap flag to kick in.

Based on ideas from Ondřej Míchal.

[1] glibc commit c6663fee4340291c
    https://sourceware.org/git/?p=glibc.git;a=commit;h=c6663fee4340291c

[2] https://golang.org/cmd/cgo/

[3] https://golang.org/pkg/os/user/

[4] https://golang.org/pkg/net/

[5] https://golang.org/src/cmd/cgo/doc.go

containers#529
@debarshiray
Copy link
Member

Closing.

Please feel free to leave a comment on the PRs or here if you think that it's still broken.

Thanks for the testing, by the way. Much appreciated!

debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 7, 2022
Fedora 32 reached End of Life on 25th May 2021:
https://docs.fedoraproject.org/en-US/releases/eol/

That's quite old because right now Fedora 35 is nearing its End of Life.

Since the tests are intended for Toolbx, not the Fedora infrastructure,
it will be better to use a newer image, because images that are too old
can get lost from registry.fedoraproject.org.  The fedora-toolbox:34
image can be a drop-in replacement for the fedora-toolbox:32 image for
the purposes of this test suite, and has the advantage of being newer.

Note that fedora-toolbox:34 is also old enough to test that the toolbox
binary runs against it's build-time ABI from the host, and not the
Toolbx container's ABI, when it's invoked as the entry point of the
container [1,2].  This is important because the subsequent commit will
add a test to ensure that.

[1] Commit 6063eb2
    containers#821

[2] Commit 6ad9c63
    containers#529

containers#1187
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 7, 2022
Commit ae43560 had added a test with a similar intention.  When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life four weeks after Fedora N is released.
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.

However, as seen recently [1], this isn't enough to catch some known
ABI compatibility issues [2,3].  These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes.  A targeted test is needed to defend
against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] containers#1180

[2] Commit 6063eb2
    containers#821

[3] Commit 6ad9c63
    containers#529

https://docs.fedoraproject.org/en-US/releases/

containers#1187
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 7, 2022
Commit ae43560 had added a test with a similar intention.  When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life four weeks after Fedora N is released.
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.

However, as seen recently [1], this isn't enough to catch some known
ABI compatibility issues [2,3].  These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes.  A targeted test is needed to defend
against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] containers#1180

[2] Commit 6063eb2
    containers#821

[3] Commit 6ad9c63
    containers#529

https://docs.fedoraproject.org/en-US/releases/

containers#1187
debarshiray added a commit to debarshiray/toolbox that referenced this issue Dec 7, 2022
Commit ae43560 had added a test with a similar intention.  When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life 4 weeks after Fedora N is released [1].
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.

However, as seen recently [2], this isn't enough to catch some known
ABI compatibility issues [3,4].  These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes.  A targeted test is needed to defend
against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] https://docs.fedoraproject.org/en-US/releases/

[2] containers#1180

[3] Commit 6063eb2
    containers#821

[4] Commit 6ad9c63
    containers#529

containers#1187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants