-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: tests/codegen/issues/issue-101082.rs
fails with -Ctarget-cpu=x86-64-v3
#131563
Comments
Part of #121571 added UB checks on indexing, and AIUI these survive to LLVM IR due to core's Some of that was undone in #126299, but the changes in |
`IndexRange::len` is justified as an overall invariant, and `take_prefix` and `take_suffix` are justified by local branch conditions. A few more UB-checked calls remain in cases that are only supported locally by `debug_assert!`, which won't do anything in distributed builds, so those UB checks may still be useful. We generally expect core's `#![rustc_preserve_ub_checks]` to optimize away in user's release builds, but the mere presence of that extra code can sometimes inhibit optimization, as seen in rust-lang#131563.
They shouldn't survive. That attribute is supposed to get them past MIR optimizations, but when we lower to LLVM IR with debug assertions disabled there should be at most a zombie I added the A-LLVM label because I suspect that the desirable sequence of LLVM behavior here was relying on MIR inlining. It wouldn't be the first time, and impeding the MIR inliner is the primary thing that these assertions do when disabled. This produces that bad IR:
And this produces the good IR (the normal
|
`IndexRange::len` is justified as an overall invariant, and `take_prefix` and `take_suffix` are justified by local branch conditions. A few more UB-checked calls remain in cases that are only supported locally by `debug_assert!`, which won't do anything in distributed builds, so those UB checks may still be useful. We generally expect core's `#![rustc_preserve_ub_checks]` to optimize away in user's release builds, but the mere presence of that extra code can sometimes inhibit optimization, as seen in rust-lang#131563.
Well, it looks like a lot more here with But yes, I would still hope that LLVM could chew through it, since it does with other CPUs. AFAICS our IR does not change from the target-cpu, apart from the expected function attributes. |
The IR size difference for The IR for that link looks a lot better in nightly, I wonder if that's #129283. #126299 is also helping even if you just go to 1.81 I think. Also the fact that there's a UB check in that IR at all is a bug. I'm looking into it. |
Filed #131578 |
I've locally added enough post-mono MIR optimizations to grind down the ; <core::ops::index_range::IndexRange as core::slice::index::SliceIndex<[T]>>::get_unchecked_mut
; Function Attrs: inlinehint nonlazybind uwtable
define internal { ptr, i64 } @"_ZN104_$LT$core..ops..index_range..IndexRange$u20$as$u20$core..slice..index..SliceIndex$LT$$u5b$T$u5d$$GT$$GT$17get_unchecked_mut17hfdf5440029c89a21E"(i64 noundef %0, i64 noundef %1, ptr noundef %slice.0, i64 noundef %slice.1) unnamed_addr #0 {
start:
%self = alloca [16 x i8], align 8
store i64 %0, ptr %self, align 8
%2 = getelementptr inbounds i8, ptr %self, i64 8
store i64 %1, ptr %2, align 8
%offset = load i64, ptr %self, align 8, !noundef !3
%3 = getelementptr inbounds i8, ptr %self, i64 8
%self1 = load i64, ptr %3, align 8, !noundef !3
%len = sub nuw i64 %self1, %offset
%ptr = getelementptr inbounds i64, ptr %slice.0, i64 %offset
%4 = insertvalue { ptr, i64 } poison, ptr %ptr, 0
%5 = insertvalue { ptr, i64 } %4, i64 %len, 1
ret { ptr, i64 } %5
} As far as I can tell, this is at least as good input to LLVM as we provided on 1.79, but we still have a missed optimization with I've also tried your branch. I cannot even find this function in the LLVM IR, because in your branch it always gets inlined in MIR, which is basically what I was speculating originally about LLVM relying on MIR inlining. So I think there's a deeper problem here with LLVM on this |
Yeah, this looks like a phase ordering problem to me. The code is vectorized, then unrolled, and then SROA doesn't fold away the alloca because of the masked loads, which are only converted to plain loads by a later InstCombine run. Easiest way to fix it is probably to support masked loads in SROA. Ideally we would fully unroll the loop though, but it looks like we can't determine the trip count at that point: https://llvm.godbolt.org/z/q1jzfsYeT The issue is that |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-medium |
@nikic is there enough info for an LLVM issue on this? |
Avoid superfluous UB checks in `IndexRange` `IndexRange::len` is justified as an overall invariant, and `take_prefix` and `take_suffix` are justified by local branch conditions. A few more UB-checked calls remain in cases that are only supported locally by `debug_assert!`, which won't do anything in distributed builds, so those UB checks may still be useful. We generally expect core's `#![rustc_preserve_ub_checks]` to optimize away in user's release builds, but the mere presence of that extra code can sometimes inhibit optimization, as seen in rust-lang#131563.
Code
I tried this test with
-Ctarget-cpu=x86-64-v3
(which we have on by default in the upcoming RHEL 10):tests/codegen/issues/issue-101082.rs
I expected to see this happen:
FileCheck
passInstead, this happened:
As of
rustc 1.83.0-nightly (52fd99839 2024-10-10)
, that LLVM IR is:Reducing to
x86-64-v2
does get the expected output:Version it worked on
It most recently worked on: Rust 1.79.0
Version with regression
rustc --version --verbose
:Note that the original issue #101082 was fixed by an LLVM upgrade. That version didn't change between 1.79.0 and 1.80.0, but there were some additional cherry-picks: rust-lang/llvm-project@rustc-1.79.0...rustc-1.80.0
However,
cargo-bisect-rustc
narrowed down to something else.Bisection
searched nightlies: from nightly-2024-04-28 to nightly-2024-10-11
regressed nightly: nightly-2024-05-26
searched commit range: 36153f1...1ba35e9
regressed commit: 48f0011 (#121571)
bisected with cargo-bisect-rustc v0.6.9
Host triple: x86_64-unknown-linux-gnu
Reproduce with:
@rustbot modify labels: +A-codegen +regression-from-stable-to-stable -regression-untriaged
The text was updated successfully, but these errors were encountered: