Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Ensure binaries built on Fedora 33 run on Fedoras 32 & 31 #534

Merged

Conversation

debarshiray
Copy link
Member

@debarshiray debarshiray commented Aug 21, 2020

The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available.

The Go implementation also mostly worked so far because it's largely
statically linked, with the notable exception of the standard C
library. However, recently glibc-2.32, which is used by Fedora 33
onwards, added a new version of the pthread_sigmask symbol [1] as part
of the libpthread removal project:

  $ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
  0000000000000000      DO *UND*        0000000000000000  GLIBC_2.32
    pthread_sigmask

This means that /usr/bin/toolbox binaries built against glibc-2.32 on
newer Fedoras pick up the latest version of the symbol and fail to run
against older glibcs in older Fedoras.

One way to fix this is to disable the use of any C code from Go by
using the CGO_ENABLED environment variable [2]. However, this can
negatively impact packages like "os/user" [3] and "net" [4], where the
more featureful glibc APIs will be replaced by more limited
equivalents written only in Go.

Instead, since glibc uses symbol versioning, it's better to tell the
Go toolchain to avoid linking against any symbols from glibc-2.32.

This was accomplished by a few linker tricks:

  • The GNU ld linker's --wrap flag was used when building the Go code
    to divert pthread_sigmask invocations from Go to another function
    called __wrap_pthread_sigmask.

  • A static library was added to provide this __wrap_pthread_sigmask
    function, which forwards calls to the actual pthread_sigmask API in
    glibc. This library itself was not linked with --wrap, and
    specifies the latest permissible version of the pthread_sigmask
    symbol from glibc for each architecture. Currently, the list of
    architectures covers the ones that Fedora builds for.

  • The Go cmd/link linker was switched to external mode [5]. This
    ensures that the final object file containing all the Go code gets
    linked to the standard C library and the wrapper static library by
    the GNU ld linker for the --wrap flag to kick in.

Based on ideas from Ondřej Míchal.

[1] glibc commit c6663fee4340291c
https://sourceware.org/git/?p=glibc.git;a=commit;h=c6663fee4340291c

[2] https://golang.org/cmd/cgo/

[3] https://golang.org/pkg/os/user/

[4] https://golang.org/pkg/net/

[5] https://golang.org/src/cmd/cgo/doc.go

#529

@softwarefactory-project-zuul
Copy link

Build failed.

@debarshiray
Copy link
Member Author

Here's a scratch build for Fedora 34: https://koji.fedoraproject.org/koji/taskinfo?taskID=49760645

I extracted the /usr/bin/toolbox binary from the Fedora 34 build (against glibc master) and was able to use it on a Fedora 31 host to create and run containers. I also built the code on Fedora 31 (against glibc-2.31) and used it on the same host to create and run containers.

The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available.

The Go implementation also mostly worked so far because it's largely
statically linked, with the notable exception of the standard C
library. However, recently glibc-2.32, which is used by Fedora 33
onwards, added a new version of the pthread_sigmask symbol [1] as part
of the libpthread removal project:
  $ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
  0000000000000000      DO *UND*	0000000000000000  GLIBC_2.32
    pthread_sigmask

This means that /usr/bin/toolbox binaries built against glibc-2.32 on
newer Fedoras pick up the latest version of the symbol and fail to run
against older glibcs in older Fedoras.

One way to fix this is to disable the use of any C code from Go by
using the CGO_ENABLED environment variable [2]. However, this can
negatively impact packages like "os/user" [3] and "net" [4], where the
more featureful glibc APIs will be replaced by more limited
equivalents written only in Go.

Instead, since glibc uses symbol versioning, it's better to tell the
Go toolchain to avoid linking against any symbols from glibc-2.32.

This was accomplished by a few linker tricks:

  * The GNU ld linker's --wrap flag was used when building the Go code
    to divert pthread_sigmask invocations from Go to another function
    called __wrap_pthread_sigmask.

  * A static library was added to provide this __wrap_pthread_sigmask
    function, which forwards calls to the actual pthread_sigmask API in
    glibc. This library itself was not linked with --wrap, and
    specifies the latest permissible version of the pthread_sigmask
    symbol from glibc for each architecture. Currently, the list of
    architectures covers the ones that Fedora builds for.

  * The Go cmd/link linker was switched to external mode [5]. This
    ensures that the final object file containing all the Go code gets
    linked to the standard C library and the wrapper static library by
    the GNU ld linker for the --wrap flag to kick in.

Based on ideas from Ondřej Míchal.

[1] glibc commit c6663fee4340291c
    https://sourceware.org/git/?p=glibc.git;a=commit;h=c6663fee4340291c

[2] https://golang.org/cmd/cgo/

[3] https://golang.org/pkg/os/user/

[4] https://golang.org/pkg/net/

[5] https://golang.org/src/cmd/cgo/doc.go

containers#529
@debarshiray debarshiray force-pushed the wip/rishi/avoid-glibc-2.32 branch from 3996f46 to 6ad9c63 Compare August 21, 2020 14:30
@debarshiray debarshiray merged commit 6ad9c63 into containers:master Aug 21, 2020
@debarshiray debarshiray deleted the wip/rishi/avoid-glibc-2.32 branch August 21, 2020 14:31
@debarshiray
Copy link
Member Author

I merged this because it seems that Rawhide is pretty turbulent right now (there's also containers/podman#7389), so it's proving difficult to get people to test.

Let's see how it holds up in the wild.

@juanje
Copy link
Contributor

juanje commented Aug 21, 2020

I just tried on Fedora 34 and I can confirm that it works.

[vagrant@ci-node-34 toolbox]$ toolbox create -r 32
Image required to create toolbox container.
Download registry.fedoraproject.org/f32/fedora-toolbox:32 (500MB)? [y/N]: y
Created container: fedora-toolbox-32
Enter with: toolbox enter --release 32
[vagrant@ci-node-34 toolbox]$ toolbox enter --release 32
/usr/bin/id: cannot find name for group ID 1000
⬢[vagrant@toolbox toolbox]$

The error before entering the container (/usr/bin/id: cannot find name for group ID 1000) is already known (#523), so can be ignored, but now I can enter into an earlier version.

The toolbox's version is the current one from master.

Thanks for the fix 😄

@debarshiray
Copy link
Member Author

Thanks for the feedback, @juanje !

debarshiray added a commit to debarshiray/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
HarryMichal pushed a commit to HarryMichal/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

Based on ideas from Alexander Larsson and Ray Strode.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Oct 21, 2021
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.

In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.

The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
  $ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    __libc_start_main
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_detach
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_create
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.34
    pthread_attr_getstacksize

This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.

Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.

Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.

Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.

Based on ideas from Alexander Larsson and Ray Strode.

[1] Commit 6ad9c63
    containers#534

[2] glibc commit 035c012e32c11e84
    https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84
    https://sourceware.org/bugzilla/show_bug.cgi?id=23323

containers#821
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants