Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: make stage3.{gmp,mpfr,mpc,isl} do what the comment says #169378

Merged
merged 5 commits into from Jun 30, 2022
Merged

stdenv: make stage3.{gmp,mpfr,mpc,isl} do what the comment says #169378

merged 5 commits into from Jun 30, 2022

Conversation

ghost
Copy link

@ghost ghost commented Apr 20, 2022

This PR consists of three commits, which should be reviewed independently (but sequentially). You can merge any prefix of the three-commit series without breaking anything.

Motivation

The comments in stdenv/linux/default.nix disagree with what the code actually does. In particular, the comments claim that the special static builds of libgmp/libmpfr/libmpc/libisl in stage3 are "only used by GCC". This not quite correct: the "special" stage3.gmp ends up linked into the ephemeral coreutils which is built in stage4 (stage4.coreutils).

This PR fixes the code to agree with the comment and intent.

This also closes #168983 which was caused by a conflict between statically-linked libraries (libgmp.a) and dynamically-linked libraries (everything else) which were both compiled with -fstack-protector. Mixing static and shared libraries in the presence of hardening flags is not generally well-supported, and is certainly not well-tested. In particular, on some platforms -fstack-protector induces a link-time dependency on libssp, which is implemented differently for statically (libssp_nonshared.a) vs dynamically (libssp.so) linked objects; this causes link-time failures in the ephemeral stage4.coreutils.

Description of changes

The first commit c527af1 forces gmp to rebuild in stage4, using the stage4 stdenv, rather than recycling stage3.gmp.

The second commit cad68ef causes the implementation of stage3 to agree with the intent and descriptive comment, by restricting the visibility of the specially-built libraries so they are visible only to gcc-unwrapped. This will cause stage4.coreutils to use the ordinary stage4.gmp rather than the customized staticified stage3.gcc-unwrapped.gmp; because of this cad68ef requires c527af1.

The final commit 9dd7908 is strictly ergonomic; it appends "-stage4" to the name of the ephemeral coreutils that is used only for stdenv bootstrapping, so it can be distinguished easily from the final coreutils. I found this to be extremely helpful when untangling what was going on here.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • powerpc64le-linux
    • mips64el-linux
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Apr 20, 2022
@ghost ghost changed the title stdenv: make it do what it says stdenv: make stage3.{gmp,mpfr,mpc,isl} it do what the comment says Apr 20, 2022
@ghost ghost changed the title stdenv: make stage3.{gmp,mpfr,mpc,isl} it do what the comment says stdenv: make stage3.{gmp,mpfr,mpc,isl} do what the comment says Apr 20, 2022
@ghost

This comment was marked as resolved.

@ghost

This comment was marked as resolved.

@ghost

This comment was marked as resolved.

@ofborg ofborg bot added 10.rebuild-linux-stdenv This PR causes stdenv to rebuild 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ labels Apr 20, 2022
Copy link
Member

@sternenseemann sternenseemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall sounds sensible.

pkgs/stdenv/linux/default.nix Outdated Show resolved Hide resolved
@ghost

This comment was marked as duplicate.

1 similar comment
@ghost
Copy link
Author

ghost commented Apr 23, 2022

@ofborg eval

@ofborg ofborg bot requested review from edolstra and dasJ April 24, 2022 00:27
@ghost
Copy link
Author

ghost commented Apr 27, 2022

@ofborg eval

@ghost
Copy link
Author

ghost commented Apr 27, 2022

The aarch64-darwin builder timeout appears to not be functioning correctly.

@ghost
Copy link
Author

ghost commented Apr 27, 2022

Last commit extends the -stageX marker to the other packages that get built built twice: libmpc, libmpfr, gmp, and isl.

Copy link
Contributor

@Mindavi Mindavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this makes sense. I personally like the -stageX indicators to easily spot any errors where these packages leak in.

Needs rebase.

Adam Joseph and others added 4 commits June 5, 2022 00:35
As explained in the comment, this ensures that stage4-coreutils does
not leak a reference to the bootstrap-files by way of libgmp.  This
will allow the next patch in this series to build stage4-coreutils
using a dynamically-linked (rather than statically-linked) libgmp.
The usage of `makeStaticLibraries` in stdenv/linux/default.nix is
prefaced by this comment:

  # Link GCC statically against GMP etc.  This makes sense because
  # these builds of the libraries are only used by GCC, so it
  # reduces the size of the stdenv closure.

However "these builds of the libraries are only used by GCC" is not
actually true.  As currently written, the stage4 coreutils links
against these customized, static-ified libraries.

Beside the fact that the code doesn't actually do what it says, this
causes other problems as well.  One example is #168983, which arises
because have a dynamically-linked binary (coreutils) which is built
from statically-linked libraries (libgmp.a); doing this causes mayhem
on platforms where `-fstack-protector` needs an auxiliary
`libssp.{so,a}` library; we end up with link failures because some
parts of the resulting binary want `libssp.so` and other parts want
`libssp_nonshared.a`.

Let's make the code actually do what the comment says, by moving these
definitions into the `gcc-unwrapped` override.  This will cause the
stage4-coreutils to link against libgmp dynamically, rather than
statically.  For this reason this commit depends on the previous
commit, which allows that to be done without creating a forbidden
reference from stdenv-final to the bootstrap-files.
During stdenv bootstrapping, coreutils is built twice.  This makes
troubleshooting very difficult, because both packages have
name="coreutils", so it is a hassle to figure out "which coreutils am
I using / is not building"?

The first of these builds is used only in stage4, and is not part of
the final stdenv.  Let's label that one with a different `name`
attribute to make it obvious which is which.
@ghost ghost marked this pull request as ready for review June 5, 2022 07:47
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin labels Jun 5, 2022
@ghost ghost requested a review from sternenseemann June 12, 2022 20:26
@ghost
Copy link
Author

ghost commented Jun 28, 2022

Ping...

This is the last PR needed in order for stdenv to build on powerpc64le-linux.

@Mindavi
Copy link
Contributor

Mindavi commented Jun 29, 2022

Tried building libwebsockets curl gmp isl boost without any issues. Now also rebuilding without this patchset, with ca-derivations to see if it's bit-by-bit the same (it may not be for unrelated reasons, but I'm curious). In principle good to go in my opinion.

@ghost
Copy link
Author

ghost commented Jun 29, 2022

Now also rebuilding without this patchset, with ca-derivations to see if it's bit-by-bit the same (it may not be for unrelated reasons, but I'm curious).

All the RPATHs will change due to the outpath of stdenv changing. You'll definitely need to run nuke-refs on both sides before doing the comparison.

@ghost
Copy link
Author

ghost commented Jun 30, 2022

I'll try the nuke-refs approach once my build gets caught up, might not be until tomorrow.

I don't think ca-refs helps here because this PR will not leave ${stdenv.cc} bit-for-bit exactly the same as before the PR. So the outpaths of ${stdenv.cc}, and therefore of stdenv, and anything with a reference to it, will change even with ca-refs.

A nuke-refs that operates on a nar piped to standard input would be the right tool here. I might write one. I can see how being able to prove that a PR changes nothing but outpaths would improve mergeability of PRs that change things deep within stdenv.

@Mindavi
Copy link
Contributor

Mindavi commented Jun 30, 2022

The outputs created before and after applying this

before

  "old hashes (f529aef3dd8daff2a8d342736974d97e1b9adde8)",
  [
    {
      "drvPath": "/nix/store/6z3cjric4i9l7rdjyn1k9c69sh8phwzn-boost-1.77.0.drv",
      "outputs": {
        "dev": "/nix/store/89azyhlgn9qc5qgdq3fijvqfwnkky0k4-boost-1.77.0-dev",
        "out": "/nix/store/is2w1z69jn37cyn5dk0dwsx7vmi5fgkd-boost-1.77.0"
      }
    },
    {
      "drvPath": "/nix/store/iqsr3dxvv3851wwnfcj57d7kqnj2bj3x-curl-7.83.1.drv",
      "outputs": {
        "bin": "/nix/store/34syrjdr1971hgv32anz6xwkwrsandry-curl-7.83.1-bin",
        "man": "/nix/store/a8qc805dn1q520mxxh4bq3iwqd4ybk0d-curl-7.83.1-man"
      }
    },
    {
      "drvPath": "/nix/store/ijr6902bvqzdkddwb8sm572mizk73msn-gmp-with-cxx-6.2.1.drv",
      "outputs": {
        "out": "/nix/store/04amblxwih41nzslcscs6jaqzr2gjdn4-gmp-with-cxx-6.2.1"
      }
    },
    {
      "drvPath": "/nix/store/fic9vh40xcikp1s75fa76l75bxzw7b59-isl-0.20.drv",
      "outputs": {
        "out": "/nix/store/2xlhbbsh8q58xmldqzsad8cr8xwl2n1j-isl-0.20"
      }
    },
    {
      "drvPath": "/nix/store/1dxmddd93df80qdd6ljkhhv2vjympafx-libwebsockets-4.3.1.drv",
      "outputs": {
        "out": "/nix/store/w610n13mv6aplf4906wqg9vkrqaq6aan-libwebsockets-4.3.1"
      }
    }
  ]

after:

  "new hashes (02630180fad510ee877fa51112b7c7b230ef2f13)",
  [
    {
      "drvPath": "/nix/store/v4dbh7p18pka11cvd7b14w75dqm9md63-boost-1.77.0.drv",
      "outputs": {
        "out": "/nix/store/is2w1z69jn37cyn5dk0dwsx7vmi5fgkd-boost-1.77.0"
      }
    },
    {
      "drvPath": "/nix/store/pq28fqdld2chxsf2cmv9nq3jara0cbbi-curl-7.83.1.drv",
      "outputs": {
        "bin": "/nix/store/52vflqgbqqris07cscqifr920llvk80k-curl-7.83.1-bin",
        "man": "/nix/store/a8qc805dn1q520mxxh4bq3iwqd4ybk0d-curl-7.83.1-man"
      }
    },
    {
      "drvPath": "/nix/store/bc1i51b70srkcavc3gmlwydk0rcl1ja3-gmp-with-cxx-6.2.1.drv",
      "outputs": {
        "out": "/nix/store/04amblxwih41nzslcscs6jaqzr2gjdn4-gmp-with-cxx-6.2.1"
      }
    },
    {
      "drvPath": "/nix/store/78d5aqk0vm5ic6yij6ynygijqfymjhcd-isl-0.20.drv",
      "outputs": {
        "out": "/nix/store/2xlhbbsh8q58xmldqzsad8cr8xwl2n1j-isl-0.20"
      }
    },
    {
      "drvPath": "/nix/store/qig24v5jsjc9mf57a523cm66hvhaddqc-libwebsockets-4.3.1.drv",
      "outputs": {
        "out": "/nix/store/dcsk0imymx88lsqrkx0mz9hdc8zdpv0n-libwebsockets-4.3.1"
      }
    }
  ],

I did a diffoscope of curl and I think this is a case of #151475, since the only difference I see is the Build-ID that's injected. Libwebsockets differs because openssl differs, which also has separateDebugInfo and also has a different Build-ID.

Not sure about the extra boost output that's listed... But otherwise, boost, gmp, isl all give the same output, so that's reassuring.

Note that I indeed did not diff stdenv.cc, maybe that could be interesting to see too.

@Mindavi
Copy link
Contributor

Mindavi commented Jun 30, 2022

I don't see a big risk anymore that this will cause regressions, so let's go for it. Both your testing and mine is sufficient to give me confidence that this is ok and an improvement. Thanks for pushing through!

@Mindavi Mindavi merged commit 2fcdf54 into NixOS:staging Jun 30, 2022
@Mindavi
Copy link
Contributor

Mindavi commented Jul 2, 2022

I see this in perl:

old

····#·pwd·in·their·path·but·it·is·not·/nix/store/hm0rii2ps75absqv8nlb3lcl8gn2a8xl-coreutils-9.1/bin/pwd·or·/usr/nix/store/hm0rii2ps75absqv8nlb3lcl8gn2a8xl-coreutils-9.1/bin/pwd?

new:

····#·pwd·in·their·path·but·it·is·not·/nix/store/bb76llghh056xyf0jzg2nvalyaaiy5c0-coreutils-stage4-9.1/bin/pwd·or·/usr/nix/store/bb76llghh056xyf0jzg2nvalyaaiy5c0-coreutils-stage4-9.1/bin/pwd?

Seems like stage4 is leaking into stdenv? Those annotations are surely useful to find these kinds of issues :)

Path is perl-5.34.1/lib/perl5/5.34.1/x86_64-linux-thread-multi/Cwd.pm

Edit: Oh! I think that's because of the inherit in the last stage... oops?

# stage5.gcc -> stage4.coreutils -> stage3.glibc -> bootstrap
gmp = lib.makeOverridable (super.gmp.override { stdenv = self.stdenv; }).overrideAttrs (a: { pname = "${a.pname}-stage4"; });

# coreutils gets rebuilt both here and also in the final stage; we rename this one to avoid confusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not true that it gets rebuilt in the final stage.

Copy link
Author

@ghost ghost Jul 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct.

I'm unable to reproduce the situation I was experiencing two months ago, where coreutils was getting built twice.

Unfortunately I don't have much more time today, so I think the best course of action is to simply revert the commit that added the "-stage4" marker. If it is still necessary (which seems unlikely) I will resubmit it as a standalone PR. I'll have more time to look at this tomorrow morning.

I'm starting to suspect that the problem may have been caused by my own site-specific overlays, which have changed a lot in the last two months. I maintain them as extra commits on top of nixpkgs/master with frequent rebasing, so I don't have a version history for them. I need to start keeping one.

Copy link
Author

@ghost ghost Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out the reason. We should still revert the commit (i.e. we should merge #179961).

I will explain in more detail tomorrow.

@ghost
Copy link
Author

ghost commented Jul 2, 2022

Investigating.

@ghost
Copy link
Author

ghost commented Jul 2, 2022

#179961

@ghost
Copy link
Author

ghost commented Jul 3, 2022

#179961

Postmortem in case anybody is curious why this happened.

@ghost ghost deleted the stdenv-makeStaticLibraries-do-what-comment-says branch January 23, 2024 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants