-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up bors cycle time #8456
Comments
Maybe! But I'm also interested in figuring out the sources of the slowdown in the first place. |
The purpose of the -all builds is primarily to ensure that the build system continues to support cross-compiling, both in cross-target and multi-host configurations. In the multi-host configuration where you are building a compiler for both 64 and 32-bit targets there's no easy way to avoid building 4 complete sets of libraries. We could make the multi-host build more efficient in a few ways, primarily by not building all three stages on the secondary hosts - just jumping straight to stage2, and also by bootstrapping the secondary hosts from an earlier stage of the primary. We could also not run the tests or run fewer tests in the cross builds - what's most important is that the build system continues to function. The valgrind builds could run single-threaded if they are not already. #7425 is about optimizing the compiler on the nopt bots. |
Would it be possible to add the *-crossXX builders which actually run tests, but only one host -> target architecture, and then also have the *-all builders but not running any tests. That way we could make sure that the cross-build system works along with running tests against the builds, but in parallel instead of serialized. Also, #8450 should close #7425 after the configuration is changed on the bots. |
#4572 / generally reorganizing the metadata-read path for small files is likely the next biggest win. |
After implementing suggestion 1, and some runtime optimizations, we're down from 3h to 1h40m at the longest time. Currently the longest builders are the *-all-opt builders. The next slowest is *-opt-vg clocking in at 1h20m followed by linux-32-nopt-t clocking in around 1h10m. For the next speedup win, it looks like the -all-opt builders are running tests in configurations which are already tested by the *-opt builders. Specifically:
Basically these combinations are being run:
Oddly enough, these combinations are not being run
@brson, would it be possible to eliminate the T-x86_64-H-x86_64 tests from being run on the grounds that the functionality is already tested elsewhere? Also, is it an oversight or is it intended that the H-i686 tests aren't running? |
I'll look into this today. I also think there's reason to believe the tests are not parallelizing due to the new scheduler, and possibly there are negative interactions with valgrind on the -vg bots. |
There are other strategic moves in terms of the buildbot that we could do here, though I am hesitant to do any ahead of simply fixing the speed bottlenecks in the compiler and testsuite. For reference sake:
|
@alexcrichton the reason all four configurations are not tested is essentially an historical accident - I never paid too much attention to what Under the cross configuration the tests could mostly be eliminated, but I would like to continue to ensure that the makefiles work (tests.mk is crazy complicated) so would like to keep some subset of the cross tests running. Actually though when thinking of the most important use case here of ensure that the android cross tests work we should actually be doing the cross tests differently, testing the build arch compiling the cross target arch (e.g. x86_64->i686, x86_64->x86_64), not all host archs testing the build arch (e.g. x86_64->x86_64, i686->x86_64). |
Closing this for now. The cycle time is around 1hr 10min now, which is very tolerable. The queue does not have a huge backup nowadays, so I don't think that this is a pressing matter any more. More specific issues may be opened in the future, but I don't think that this blanket issue is necessary any more. |
Distributed build systems for Rust: http://discuss.rust-lang.org/t/distributed-build-systems-for-rust/400 |
check `use_self` in `pat` fixes rust-lang#6955 changelog: check `use_self` in `pat`
Right now the cycle time for a passing pull request is 3+ hours, which is up from about 2 hours a week ago, and up from ~1hr if I remember correctly from a few months ago. This isn't really an artifact of rust getting slower, but rather @bors working even harder at testing. It does look like there's some easy low-hanging fruit which could speed up cycle time by at least an hour or two.
Currently the slowest builds are *-nopt (3+ hours), *-all (~2 hours), and *-vg (~1.5 hours). The baseline build time seems to be ~1 hour, and I think that we can get back to that level of speed with these suggestions:
For *-nopt builds, build the compiler with optimizations, but don't run tests with optimizations. This could be enabled by Allow disabling optimizations in tests only #8450, and I believe that with the runtime now unoptimized for *-nopt builds we're seeing a huge slowdown.
Parallelize the *-all builds. From what I understand, these builds essentially target 4 separate builds on linux/mac: H64 -> T32, H64 -> T64, H32 -> T64, H32 -> T32 (where H == host word size, T == target word size). Two of these are already tested by other builders (H64 -> T64, H32 -> T32), and the remaining two don't really need to run on the same builder, they could be built in parallel.
Basically I think we could get the same test coverage by removing the *-all builders and introducing *-cross32 and *-cross64 builders instead (where the host is different from the target architecture)
Either resolve Investigate running tests under Address Sanitizer #749 (should run a lot faster than valgrind), or run only a subset of the tests under valgrind. This is kind of tricky to do either of these, but I think that in the immediate future it's more reasonable to run a subset of the tests in valgrind. I'm not sure if others would agree, but I'd think that we could run only the run-pass, libXtest, and run-fail suites on valgrind bots and get mostly the same signal as running valgrind over the entire build.
Perhaps the snapshot builders could run valgrind for the entire suite, but inevitably we'd get a failure which would then be difficult to debug (but I think would be worth it).
I think that with these three possibilities combined, we could get the cycle time back down to an hour and start processing a lot more builds in one day.
@graydon, @brson, would it be possible to configure the bots this way?
The text was updated successfully, but these errors were encountered: