-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large performance drop in compiled binary in stable rust 1.45.2 vs 1.44.0 on x86_64 linux #76247
Comments
It would be interesting to see if this is fixed on nightly, since that has the LLVM 11 bump. |
No significant change with nightly and LLVM 11
|
Hey Cleanup Crew ICE-breakers! This bug has been identified as a good cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @senden9 @shekohex @sinato @spastorino @turboladen @woshilapin @yerke |
This seems to have happened in the LLVM 10 upgrade. @rustbot ping llvm |
Hey LLVM ICE-breakers! This bug has been identified as a good cc @camelid @comex @cuviper @DutchGhost @hdhoang @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique |
I have created a script that returns nonzero exit code in case that bianry run time is larger than 3.0 seconds (reasonable threshold on my system) that can be used as script with
My particular result so far is
Waiting for the rest to complete... |
found 7 bors merge commits in the specified range installing 0aa6751 installing 9310e3b installing 7f79e98 installing 82911b3 searched toolchains 0aa6751 through 9310e3b Regression in 82911b3 searched nightlies: from nightly-2020-05-20 to nightly-2020-07-02 |
Changing the code a bit: - dst[index] &= src1[index] < src2[index];
+ dst[index] = dst[index] as u8 & (src1[index] < src2[index]) as u8 != 0; makes the compiler generate an even better code on both 1.44.0 and 1.45.2: .LBB0_6:
movq xmm1, qword ptr [r8 + rsi]
movq xmm2, qword ptr [r8 + rsi + 8]
movdqu xmm3, xmmword ptr [r10 + 2*rsi]
movdqu xmm4, xmmword ptr [r10 + 2*rsi + 16]
movdqu xmm5, xmmword ptr [rdx + 2*rsi]
pcmpgtw xmm5, xmm3
packsswb xmm5, xmm5
movdqu xmm3, xmmword ptr [rdx + 2*rsi + 16]
pcmpgtw xmm3, xmm4
packsswb xmm3, xmm3
pand xmm1, xmm0
pand xmm1, xmm5
pand xmm2, xmm0
pand xmm2, xmm3
movq qword ptr [r8 + rsi], xmm1
movq qword ptr [r8 + rsi + 8], xmm2
add rsi, 16
cmp rcx, rsi
jne .LBB0_6 It suggests LLVM is struggling with the conversion between boolean vectors and integer vectors. |
@viktorchvatal @pcpthm do you know if an issue about this exist on LLVM? can you file one there maybe?. |
This was discussed during today's compiler meeting removing nomination. |
Well, I do not feel to have the knowledge to file a LLVM bug.. yet.. What is the recommended way of doing so, anyway? Take the LLVM IR from the rust 1.44 and find a performance regression in the LLVM itself using just the IR? Also note that LLVM IR generated by rustc 1.44 and 1.45.2 slightly vary |
It looks like the output is okay again since 1.52? https://rust.godbolt.org/z/nojnhW393 Probably fixed by one of the LLVM upgrades in the meantime. |
Changing labels to remove priority and adding E-needs-test |
I have experienced more than 300% longer execution time in specific functions that use loops along with indexing into slices. After several hours of work with a profiler, I was able to isolate the problem from a 60K lines codebase into the following short program
Code is also available in the following repository
With rust 1.44.0, I observe excecution time around 1.7 sec
Rust versions 1.45.2 and current stable 1.46.0 produce binaries that run more than 6.0 seconds with the same source code
I use several more functions like
cmp_gt_and
in a core of image processing software that also show similar performance drop.Has anything significantly changed between rustc 1.44 and 1.45 that may have impacted the code so significantly? Maybe LLVM 10 has a different behavior? Any thoughts how to modify the code to gain the performance back with the current compiler or other things to try in order to clarify the problem? For some time, I can stick with 1.44 to keep the performance.
Function
cmp_gt_and
also appears to have much shorter assembly code with rustc 1.44 than with its successors, not sure if that is the reason for the performnace drop, though:Rustc 1.44.0
Rustc 1.45.2
The text was updated successfully, but these errors were encountered: