Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI for deterministic / reproducible builds #75362

Open
infinity0 opened this issue Aug 10, 2020 · 14 comments
Open

CI for deterministic / reproducible builds #75362

infinity0 opened this issue Aug 10, 2020 · 14 comments
Labels
A-reproducibility Area: Reproducible / deterministic builds A-testsuite Area: The testsuite used to check the correctness of rustc C-enhancement Category: An issue proposing an enhancement or a PR with one. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@infinity0
Copy link
Contributor

infinity0 commented Aug 10, 2020

#34902 was finally closed as we got a positive result on tests.r-b.org for rustc 1.44.1 on Debian Unstable, where we test for build-path-independent reproducibility. However for rustc 1.45.0 the test turned negative again. (The codenamed suites e.g. "bullseye", "buster", only test for build-path-dependent reproducibility, i.e. are less strict / less ideal)

On the same bug report, @jgalenson had reported build-path-independent reproducibility for a configuration closer to rust's own builds, but this also occasionally regresses e.g. #69352.

The Debian builds and rust's own builds are slightly different; the probably-most-significant one being that Debian uses the system LLVM, so that part is excluded from reproducibility tests. Other than that, it is important to set remap-debuginfo in config.toml as well as -ffile-prefix-map and -fdebug-prefix-map for the C parts of the build, including in CFLAGS / CXXFLAGS.

Since many/most contributors are not aware of all of the details needed to retain build-path-independent reproducibility, it would be good to have some CI to ensure this property in the long run. Running a full build twice is costly, but perhaps some other solution would be just as effective, e.g. running a stage1 build twice, or running it for the beta channel, and/or running with a pre-built LLVM.

@jonas-schievink jonas-schievink added A-testsuite Area: The testsuite used to check the correctness of rustc C-enhancement Category: An issue proposing an enhancement or a PR with one. A-reproducibility Area: Reproducible / deterministic builds labels Aug 10, 2020
@joshtriplett joshtriplett added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label Aug 10, 2020
@joshtriplett
Copy link
Member

joshtriplett commented Aug 10, 2020

@infinity0 I would love to see this. Yes, it'll require some extra CI time, but it's worth doing. Reproducible builds are important.

Right now, the critical path in bors is actually macOS builds. I think we could do two builds on every other platform in the time it takes to do one on macOS, without slowing down CI.

In an ideal world, if this test fails, we'd provide diffoscope output as part of the CI log.

@Mark-Simulacrum
Copy link
Member

I don't think we can afford to double build on every other platform -- even just doubling CI time on the linux builders would be pretty unfortunate, even though they're not currently the ones blocking auto builds, having them run longer reduces spaces in the pool for PR builders and such too. (Unfortunately we don't currently have metrics to know how much space we have on average).

That said, I do think we can do this on at least some of the dist builders to start. I would pick ones we deem most important (e.g., perhaps x86_64 linux to start).

With regards to implementation, dist builders should already have most of the remapping etc turned on, I believe, and anything that isn't should be reasonable to add (though could be hard, we'd have to see, especially because most dist builders use fairly old toolchains).

Before commenting on how I would suggest going about implementing this in our CI, I would like to hear more about "the details needed to retain build-path-independent reproducibility" -- what would this mean for us? What sort of tooling, environment differences, etc, would we want?

I would also suggest that we start by checking in CI that a smaller crate's reproducibility holds, e.g., std built by the stage 2 compiler. Building std is fast -- usually only 2-3 minutes. But maybe in practice reproducibility in std is rarely broken? Do we have some idea of what we would need/want to test in the long run (e.g., rustc and std only? Cargo and the other tools as well?)

@mati865
Copy link
Contributor

mati865 commented Aug 10, 2020

What about job that would run biweekly or similar when Homu queue is empty?

@Mark-Simulacrum
Copy link
Member

We have no precedent for such and personally I think it's much less useful, nor would that need to be in-tree necessarily.

I also don't think it's out of the picture to double up or builds or so, I just want a better understanding of the requirements before (possibly) agreeing to such.

@RalfJung
Copy link
Member

An alternative, at least initially, would be to do something like https://github.com/RalfJung/miri-test-libstd: out-of-tree CI which tests the latest nightly every morning.

But I am not sure if there is a free CI service that actually provides enough time budge for building rustc twice.

@pietroalbini
Copy link
Member

During this week's infrastructure team meeting we agreed to dedicate a builder for this! The builder will replace the existing full-bootstrap one. The main requirement we have is that this builder shouldn't take longer than the current slowest builder.

@infinity0
Copy link
Contributor Author

I would like to hear more about "the details needed to retain build-path-independent reproducibility" -- what would this mean for us?

It really depends on what feature someone is adding. It's much harder to track it down after the fact, as the debugging on #59542 shows. Off the top of my head, some issues that have affected rustc reproducibility in the past:

To re-iterate, I expect there will be other issues in the future completely unrelated to those in the above list. It really has to be done on a case-by-case basis for each PR.

It's important also to record what input dependencies (crates, build tools) and their versions. For example #59542 was a bug in cargo not rustc itself.

@Mark-Simulacrum
Copy link
Member

Hm so that's not quite what I meant but is still helpful.

I guess the right question is - would you (or someone else) be up for submitting a PR with config for testing reproducible builds? If not, what would need to be done to make that possible?

I think the infra team doesn't have anyone who has the background knowledge to actually implement these tests properly, but we'd love to help someone do it.

@infinity0
Copy link
Contributor Author

@Mark-Simulacrum a basic test that just builds in two different paths, then compares the resulting artifacts, would already be useful and cover most regressions. The two top-level source directories should have different names - very occasionally builds could depend on that inappropriately.

To get more sophisticated you can vary things like system time, username, timezone, locale, hostname, etc. An in-depth list is here. Many of these are just simple envvar changes; some of them can be done with libc interceptors like libfaketime, or via superuser access, or if running the builds in separate VMs.

@chemsaf3
Copy link

saw these on reddit and might be related here too as paths will cause differ hashes

rust-lang/cargo#9311
#40552
#75263

@jonhoo
Copy link
Contributor

jonhoo commented May 13, 2022

To leave a breadcrumb here for others, note that libLLVM, libcompiler_builtins, and librustc_driver all also embed the path to libc and libstdc++'s include directories in their debuginfo:

$ cd ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib
$ strings libLLVM-14-rust-1.60.0-stable.so | grep -F '/include' | sort -u
/checkout/src/llvm-project/llvm/include/llvm/ADT/GenericCycleImpl.h
/tmp/gcc-5.5.0/libstdc++-v3/../include
/tmp/gcc-build/gcc/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/backward
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits
/usr/include
/usr/include/bits
/usr/include/sys
$ strings librustc_driver-75e5f32fc3580f6c.so | grep -F '/include' | sort -u
/tmp/gcc-5.5.0/libstdc++-v3/../include
/tmp/gcc-build/gcc/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/backward
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/parallel
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/tr1
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits
/usr/include
/usr/include/bits
/usr/include/sys
$ strings rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-acf5ff6e9595d982.rlib | grep -F '/include' | sort -u
/rustroot/lib/clang/13.0.0/include

We may want to make remap-debuginfo = true also remap those paths.

@jonhoo
Copy link
Contributor

jonhoo commented May 19, 2022

And some additional bits of info, note that Clang 14 has a but where prefix remappings do not get applied to files passed in using absolute paths, which happens in compiler_builtins for the assembly files like floatundisf.S. The fix is here, but hasn't landed in an actual Clang release yet. Earlier Clang versions don't have this problem, but they lack this fix, which means they miss other paths that need remapping (though I don't have concrete examples where this matters for the Rust build).

@cbeuw
Copy link
Contributor

cbeuw commented Jun 15, 2022

I made a CI test suite that runs Debian's reprotest on compiling the most downloaded crates from crates.io (currently top 100): https://github.com/cbeuw/lotus

This is not quite CI for reproducibly bootstrapping rustc, though clearly Rust and Cargo being able to reproducibility build arbitrary things is a necessary condition for that.

The good news is, out of the top 100 crates, 91 are fully reproducible. 7 are non-reproducible but they all run third-party codes at build time (typenum's build script is a major culprit as it uses OUT_DIR extensively): https://github.com/cbeuw/lotus/runs/6907339891?check_suite_focus=true#step:7:29732

This includes the output binary (executable or library) and rmeta files. We're actually very close to having the entire target directory being reproducible, there are only 3 places that aren't, and they are not files whose reproducibility really matters.

So it's really the build process like the bootstrap crate that we need to focus on to make rustc bootstrapping reproducible - which is hardly trivial, but it's better than cargo and rustc themselves having issues.

@sundeep-kokkonda
Copy link
Contributor

To leave a breadcrumb here for others, note that libLLVM, libcompiler_builtins, and librustc_driver all also embed the path to libc and libstdc++'s include directories in their debuginfo:

$ cd ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib
$ strings libLLVM-14-rust-1.60.0-stable.so | grep -F '/include' | sort -u
/checkout/src/llvm-project/llvm/include/llvm/ADT/GenericCycleImpl.h
/tmp/gcc-5.5.0/libstdc++-v3/../include
/tmp/gcc-build/gcc/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/backward
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits
/usr/include
/usr/include/bits
/usr/include/sys
$ strings librustc_driver-75e5f32fc3580f6c.so | grep -F '/include' | sort -u
/tmp/gcc-5.5.0/libstdc++-v3/../include
/tmp/gcc-build/gcc/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/backward
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/parallel
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/tr1
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits
/usr/include
/usr/include/bits
/usr/include/sys
$ strings rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-acf5ff6e9595d982.rlib | grep -F '/include' | sort -u
/rustroot/lib/clang/13.0.0/include

We may want to make remap-debuginfo = true also remap those paths.

Hi @jonhoo

While working on reproducible builds, I observed that librustc_driver is differed when build path is changed. I set remap-debuginfo = true but still the generated binaries are different. There are some hints from community in issue #102299 to add --remap-path-prefix. So, I added these flags as below (in ~/.cargo/config) along with remap-debuginfo but still the binaries are different.

[build]
rustflags = ["--remap-path-prefix=/home/temp/rust/buildA=~", "--remap-path-prefix=/home/temp/rust/buildB=~"]

Is ~/.cargo/config the right place to set rustflags? or I should use any specific .rs file? Can you help me on it's usage...

LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 21, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 21, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 21, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 21, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 22, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.

Changes:

* Use --remap-path-prefixes to make paths consistent.
* Set build path to /tmp/build-yhm.
* Include Cargo.lock files to compile the same dependencies.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 22, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.

Changes:

* Use --remap-path-prefixes to make paths consistent.
* Set build path to /tmp/build-yhm.
* Include Cargo.lock files to compile the same dependencies.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 22, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.

Changes:

* Use --remap-path-prefixes to make paths consistent.
* Set build path to /tmp/build-yhm.
* Include Cargo.lock files to compile the same dependencies.
LeanderBB pushed a commit to LeanderBB/you-have-mail that referenced this issue May 22, 2023
Rustc still does not have complete support for fully reproducible builds
(rust-lang/rust#75362). Attempt to narrow the
gap by following the advice from
(rust-lang/rust#102299) and overwrite all
panic paths with a different, consistent value.

Changes:

* Use --remap-path-prefixes to make paths consistent.
* Set build path to /tmp/build-yhm.
* Include Cargo.lock files to compile the same dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-reproducibility Area: Reproducible / deterministic builds A-testsuite Area: The testsuite used to check the correctness of rustc C-enhancement Category: An issue proposing an enhancement or a PR with one. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests