-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base CGU estimates on basic block count, not statement count. #113047
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 64ee4e1 with merge 57da7fc899d24430772acd1c0df9059d593714e0... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (57da7fc899d24430772acd1c0df9059d593714e0): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 662.496s -> 666.596s (0.62%) |
Interesting, run https://github.com/rust-lang/rust/actions/runs/5375275677 still spinning while should be completed a while ago. https://github.com/rust-lang/rust/actions/runs/5375275677/jobs/9751558060#step:25:1784
|
Seems like some kind of GH bug, all the workflow jobs have actually completed, but the workflow run isn't marked as completed yet. |
@klensy: I got a "PR run failed" email saying three jobs had failed, but when I visited here they were still spinning. |
Some explanation: after much effort, I concluded that all efforts to improve CGU formation are basically doomed because the CGU size estimates aren't very good. There's no point doing anything clever if the estimates are regularly off by a factor of 2 or more in both directions (underestimating and overestimating). So I wrote code to compare the CGU estimates to the actual compile times, and computed an error value based on the mean absolute deviation. The goal was not to to come up with a perfect measure of error, but something reasonably good that would let me evaluate alternative CGU size estimation functions. I tried quite a few different estimation functions, and found that basing the estimates on basic block counts rather than statement counts was a big win. This makes a certain amount of sense, because basic blocks are pretty easy for compilers to handle, and a basic block with 20 statements is not that much harder than one with 5 statements, for example, because all the complexity comes from the control flow between basic blocks. (Picture me waving my hands right now.) Here is the data showing the error measure for the old (statement-based) and new (basic-block based) estimate functions. The error for the new function is better in every case except two (marked with
This PR implements the new estimator, but the results are worse than I'd hoped.
I think the reason why we didn't see bigger wall time improvement is that even with the improved estimate, it only takes one bad estimate to throw the whole thing off, especially if that is a bad underestimate -- you can end up with a single "long pole" CGU that takes much longer than the other CGUs to compile. Anyway, I thought this was worth explaining, and I'd be interested to hear if others think the perf results make this worth merging. cc @rust-lang/wg-compiler-performance @wesleywiser @mw |
Thanks for the explanation, that's very interesting!
Is that end-to-end wall time? I could also imagine that inline functions play a non-trivial role here:
It would be interesting to know if the cost estimates for debug builds (which do almost no inlining) are more accurate than for release builds (which do inlining, but probably also other super-linear optimizations). What I'd really like to see at some point is the effect of throwing some machine learning on the problem. That seems a good fit for dealing with such a fuzzy, statistical problem. It would be a nice research project 🙂 |
It is walltime for the "gen" phase (generating the LLVM IR) plus the "opt" phase (LLVM doing its work) plus the "lto" phase (opt builds only).
Perhaps, though I wonder if overfitting to specific machines would be a problem. |
Yes, that's true. On the other hand, I think, we already have some overfitting to the existing benchmarks and benchmark machines going on. FWIW, the change in this PR looks like an overall improvement to me. |
I will close this. I have been working on collecting more MIR stats, for others with data science expertise to take a look at. Hopefully that will lead to a better estimation function. |
r? @ghost