-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Vec
derefing inlinable
#52704
Make Vec
derefing inlinable
#52704
Conversation
r? @shepmaster (rust_highfive has picked a reviewer for you, use r? to override) |
Have you tested these changes? Can you provide a before/after benchmark? I was under the impression that any type with a generic was automatically |
Hmm... right, good point; in which case perhaps those should actually be I managed to reproduce the non-inlining behavior with upstream Rust in a toy benchmark; I'll retest with |
These methods are so trivial that it seems very unlikely to me for What seems more likely to me is that the adage "generic functions are always inlinable" isn't completely true anymore. IIUC #48779 made it so that if e.g. |
I can confirm that this change does make a difference. Before:
After:
The benchmark: #![feature(test)]
extern crate test;
use std::ops::Deref;
use test::Bencher;
#[bench]
fn bench_deref(b: &mut Bencher) {
let vec: Vec<u32> = (0..1000).into_iter().collect();
b.iter(|| {
let mut sum: u32 = 0;
for index in 0..vec.len() {
let slice = vec.deref();
sum = sum.wrapping_add(slice[index]);
}
sum
});
} and the profile from
I've verified with a profiler that the deref isn't inlined before, and is inlined after. What's also interesting is that if I change
So it looks like enabling incremental compilation completely kills the compiler's ability to inline generics which don't have the (Also, in this particular benchmark we can also see that |
I'll loop in @alexcrichton and @michaelwoerister due to #48779 then. In this case, I'm guessing that the answer is going to be "if you want maximum performance" you can disable incremental compilation" (i.e., great for your redistributed, published binary that you create once a month). |
It's as known bug in the compiler that incremental release builds perform very poorly compared to normal release builds. While |
This is one of those rare-ish cases where I need reasonable performance in my dev builds (otherwise the resulting binary is too slow at runtime to be useful for anything), and since those are the dev builds I also need reasonably fast recompilation. I just checked, and currently
I somewhat disagree with this, considering we already have over two thousand |
This hasn't been true for quite a while now, at least for a year or maybe two.
Incremental ThinLTO should indeed help with that. But it's a non-trivial feature and I'm currently tasked with working on other things, so don't keep your fingers crossed for it to appear in the compiler in the next few weeks.
Personally, I'm fine with adding |
@koute the fact of the matter is that incremental release builds simply aren't tuned for performance right now. The standard library and large chunks of the Rust ecosystem are written in a way that if one crate is split into multiple cgus then ThinLTO is required to recover the performance loss from lack of inlining. Multiple codegen units is enabled by default in release builds today and ThinLTO is there to recover the performance. The ThinLTO passes are disabled in incremental mode as they aren't compatible with incremental compilation. As @michaelwoerister mentions enabling this is a significant chunk of work. The fact that |
@alexcrichton So could you please explain how is this any different than any other of the two thousand It was always the case that I mean absolutely no offence, but, for example, why was adding this okay (I picked the commit at random; I didn't deliberately pick one of yours):
but adding the same here (Especially when it has a measurable practical impact!) is not acceptable anymore? If the answer to that is "well, we didn't have ThinLTO planned back then" then I need to ask: since we don't need I apologize if I seem somewhat brash, but it is a little disheartening to see such resistance to seemingly harmless and uncontroversial change such as this when it's not even the first of its kind. ): |
I agree with @koute here I think - we've had a ~constant stream of PRs inlining things as necessary for years. |
The
Thus, if you have a function like Translation of generics is slightly different. If you have a function like Now why haven't we applied this patch already? It's terrible if Ok, so why is this patch needed? As mentioned before, incremental release builds do not run ThinLTO. This means that the inlining does not happen in all CGUs in a crate, because rustc only translates Ok, so why have we had a "stream of
To me, this PR falls in none of those buckets. While it may naively appear it falls in the first bucket (tested to show performance gains) I do not consider it to be in that bucket. This is a bug in the compiler that incremental release builds are so slow. There are likely hundreds of tiny generic methods in the standard library which we have explicitly not tagged with We've trained Rust programmers ever since the beginning that generics imply efficient cross crate inlining, no |
Count me in that bucket, and I like to think I have a reasonable grasp on the language... Since this is now above my paygrade... |
@koute do you have thoughts on my comment above? |
@alexcrichton Thanks for the detailed explanation! It did make things a lot clearer. I agree with your reasoning, although purely personally I would still merge this in, at least temporarily. There are probably a lot of such functions in the stdlib, but, actually, I don't believe we need to mark too many of them as I think it's worthwile to add the annotations for the most egregious cases like this which impact the majority of the ecosystem, especially as @michaelwoerister said it's going to take a while to get ThinLTO working with incremental compilation. But, now I understand why you don't want it merged, so feel free to close this PR. There's one other tangentially-related thing I'd like to ask - if we have ThinLTO to inline functions across the CGUs, then wouldn't it be reasonable (after ThinLTO works with incremental builds) to make |
I suspect that this is something we'll look into. There's still the question of cross-crate inlining. Right now, |
Ok! In that case yeah I'm going to close this for now. |
#53673 has now merged which implements incremental ThinLTO. This, for example, fixes the benchmark in #52704 (comment) |
Vec
derefing is consuming a non-insignificant amount of time (a few percent of the whole runtime) in one of my apps when building without LTO, so it should be worthwhile to mark these#[inline]
.