-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI for deterministic / reproducible builds #75362
Comments
@infinity0 I would love to see this. Yes, it'll require some extra CI time, but it's worth doing. Reproducible builds are important. Right now, the critical path in bors is actually macOS builds. I think we could do two builds on every other platform in the time it takes to do one on macOS, without slowing down CI. In an ideal world, if this test fails, we'd provide diffoscope output as part of the CI log. |
I don't think we can afford to double build on every other platform -- even just doubling CI time on the linux builders would be pretty unfortunate, even though they're not currently the ones blocking auto builds, having them run longer reduces spaces in the pool for PR builders and such too. (Unfortunately we don't currently have metrics to know how much space we have on average). That said, I do think we can do this on at least some of the dist builders to start. I would pick ones we deem most important (e.g., perhaps x86_64 linux to start). With regards to implementation, dist builders should already have most of the remapping etc turned on, I believe, and anything that isn't should be reasonable to add (though could be hard, we'd have to see, especially because most dist builders use fairly old toolchains). Before commenting on how I would suggest going about implementing this in our CI, I would like to hear more about "the details needed to retain build-path-independent reproducibility" -- what would this mean for us? What sort of tooling, environment differences, etc, would we want? I would also suggest that we start by checking in CI that a smaller crate's reproducibility holds, e.g., std built by the stage 2 compiler. Building std is fast -- usually only 2-3 minutes. But maybe in practice reproducibility in std is rarely broken? Do we have some idea of what we would need/want to test in the long run (e.g., rustc and std only? Cargo and the other tools as well?) |
What about job that would run biweekly or similar when Homu queue is empty? |
We have no precedent for such and personally I think it's much less useful, nor would that need to be in-tree necessarily. I also don't think it's out of the picture to double up or builds or so, I just want a better understanding of the requirements before (possibly) agreeing to such. |
An alternative, at least initially, would be to do something like https://github.com/RalfJung/miri-test-libstd: out-of-tree CI which tests the latest nightly every morning. But I am not sure if there is a free CI service that actually provides enough time budge for building rustc twice. |
During this week's infrastructure team meeting we agreed to dedicate a builder for this! The builder will replace the existing |
It really depends on what feature someone is adding. It's much harder to track it down after the fact, as the debugging on #59542 shows. Off the top of my head, some issues that have affected rustc reproducibility in the past:
To re-iterate, I expect there will be other issues in the future completely unrelated to those in the above list. It really has to be done on a case-by-case basis for each PR. It's important also to record what input dependencies (crates, build tools) and their versions. For example #59542 was a bug in cargo not rustc itself. |
Hm so that's not quite what I meant but is still helpful. I guess the right question is - would you (or someone else) be up for submitting a PR with config for testing reproducible builds? If not, what would need to be done to make that possible? I think the infra team doesn't have anyone who has the background knowledge to actually implement these tests properly, but we'd love to help someone do it. |
@Mark-Simulacrum a basic test that just builds in two different paths, then compares the resulting artifacts, would already be useful and cover most regressions. The two top-level source directories should have different names - very occasionally builds could depend on that inappropriately. To get more sophisticated you can vary things like system time, username, timezone, locale, hostname, etc. An in-depth list is here. Many of these are just simple envvar changes; some of them can be done with libc interceptors like libfaketime, or via superuser access, or if running the builds in separate VMs. |
saw these on reddit and might be related here too as paths will cause differ hashes |
To leave a breadcrumb here for others, note that $ cd ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib
$ strings libLLVM-14-rust-1.60.0-stable.so | grep -F '/include' | sort -u
/checkout/src/llvm-project/llvm/include/llvm/ADT/GenericCycleImpl.h
/tmp/gcc-5.5.0/libstdc++-v3/../include
/tmp/gcc-build/gcc/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/backward
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits
/usr/include
/usr/include/bits
/usr/include/sys
$ strings librustc_driver-75e5f32fc3580f6c.so | grep -F '/include' | sort -u
/tmp/gcc-5.5.0/libstdc++-v3/../include
/tmp/gcc-build/gcc/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/backward
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/parallel
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/tr1
/tmp/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits
/usr/include
/usr/include/bits
/usr/include/sys
$ strings rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-acf5ff6e9595d982.rlib | grep -F '/include' | sort -u
/rustroot/lib/clang/13.0.0/include We may want to make |
And some additional bits of info, note that Clang 14 has a but where prefix remappings do not get applied to files passed in using absolute paths, which happens in |
I made a CI test suite that runs Debian's This is not quite CI for reproducibly bootstrapping rustc, though clearly Rust and Cargo being able to reproducibility build arbitrary things is a necessary condition for that. The good news is, out of the top 100 crates, 91 are fully reproducible. 7 are non-reproducible but they all run third-party codes at build time ( This includes the output binary (executable or library) and So it's really the build process like the |
Hi @jonhoo While working on reproducible builds, I observed that
Is ~/.cargo/config the right place to set |
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value. Changes: * Use --remap-path-prefixes to make paths consistent. * Set build path to /tmp/build-yhm. * Include Cargo.lock files to compile the same dependencies.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value. Changes: * Use --remap-path-prefixes to make paths consistent. * Set build path to /tmp/build-yhm. * Include Cargo.lock files to compile the same dependencies.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value. Changes: * Use --remap-path-prefixes to make paths consistent. * Set build path to /tmp/build-yhm. * Include Cargo.lock files to compile the same dependencies.
Rustc still does not have complete support for fully reproducible builds (rust-lang/rust#75362). Attempt to narrow the gap by following the advice from (rust-lang/rust#102299) and overwrite all panic paths with a different, consistent value. Changes: * Use --remap-path-prefixes to make paths consistent. * Set build path to /tmp/build-yhm. * Include Cargo.lock files to compile the same dependencies.
#34902 was finally closed as we got a positive result on tests.r-b.org for rustc 1.44.1 on Debian Unstable, where we test for build-path-independent reproducibility. However for rustc 1.45.0 the test turned negative again. (The codenamed suites e.g. "bullseye", "buster", only test for build-path-dependent reproducibility, i.e. are less strict / less ideal)
On the same bug report, @jgalenson had reported build-path-independent reproducibility for a configuration closer to rust's own builds, but this also occasionally regresses e.g. #69352.
The Debian builds and rust's own builds are slightly different; the probably-most-significant one being that Debian uses the system LLVM, so that part is excluded from reproducibility tests. Other than that, it is important to set
remap-debuginfo
inconfig.toml
as well as-ffile-prefix-map
and-fdebug-prefix-map
for the C parts of the build, including inCFLAGS
/CXXFLAGS
.Since many/most contributors are not aware of all of the details needed to retain build-path-independent reproducibility, it would be good to have some CI to ensure this property in the long run. Running a full build twice is costly, but perhaps some other solution would be just as effective, e.g. running a stage1 build twice, or running it for the beta channel, and/or running with a pre-built LLVM.
The text was updated successfully, but these errors were encountered: