-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal eq
compilation on structs compared to equivalent C++ code
#106269
Comments
With more aggressive compilation flags I am seeing (latest Nightly): example::eq:
mov eax, dword ptr [rdi]
cmp eax, dword ptr [rsi]
sete al
ret |
What flags did you use? |
-O -Z mir-opt-level=4 |
Looks like the first two comparisons gets flattened into one block, and that prevents detection of the whole pattern. |
|
#106294 might improve this. I noticed in https://rust.godbolt.org/z/q768EYdqq that it's doing %0 = load <2 x i8>, ptr %s1, align 1, !dbg !11
%1 = load <2 x i8>, ptr %s2, align 1, !dbg !12
%2 = icmp eq <2 x i8> %0, %1, !dbg !11
%3 = extractelement <2 x i1> %2, i64 0, !dbg !11
%4 = extractelement <2 x i1> %2, i64 1, !dbg !11 which might be no longer needed once it's There's also something going on here related to the short-circuiting that I don't understand. If I change it to pub fn eq(s1: &S, s2: &S) -> bool {
(s1.a == s2.a) & (s1.b == s2.b) & (s1.c == s2.c) & (s1.d == s2.d)
} Then it optimizes as expected define noundef zeroext i1 @_ZN7example2eq17h1d431ecac5604099E(ptr noalias nocapture noundef readonly align 1 dereferenceable(4) %s1, ptr noalias nocapture noundef readonly align 1 dereferenceable(4) %s2) unnamed_addr #0 !dbg !6 {
%0 = load i32, ptr %s1, align 1, !dbg !11
%1 = load i32, ptr %s2, align 1, !dbg !12
%2 = icmp eq i32 %0, %1, !dbg !13
ret i1 %2, !dbg !14
} https://rust.godbolt.org/z/ThfqaW3cr It's definitely marked |
@scottmcm Short-circuiting is relevant in that moving everything into one block is generally non-profitable. There is a special pass that detects this pattern (MergeICmps) and converts it into memcmp/bcmp, which is then later expanded into a target-specific efficient comparison sequence. This is a backend IR pass, so you don't see it in However, it requires a pretty specific pattern, and does not handle the case where the control flow has been partially flattened and one pair of comparisons uses a select rather than a branch. Your X86 example adds insult to injury by actually vectorizing those two i8 comparisons: https://llvm.godbolt.org/z/qPe6Kfq6e That looks pretty clearly non-profitable: https://llvm.godbolt.org/z/dT85WE8xn Edit: Filed llvm/llvm-project#59867 for the SLPVectorizer issue. |
I suppose this opt would be specifically enabled by our optimization semantics allowing spurious reads. No change since the |
Looks like this is fixed now, we should have a test for this if we don't already |
Add test for issue 106269 Closes rust-lang#106269 Made this an assembly test as the LLVM codegen is still quite verbose and doesn't really indicate the behaviour we want
Rollup merge of rust-lang#124299 - clubby789:106269-test, r=nikic Add test for issue 106269 Closes rust-lang#106269 Made this an assembly test as the LLVM codegen is still quite verbose and doesn't really indicate the behaviour we want
The
eq
implementation produces worse code than the equivalent C++ code (Rust, C++)I tried this code:
I expected to see this happen:
The resulting assembly should load and compare a single u64:
Instead, this happened:
The assembly loads and compares each u8 individually:
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: