Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Inlined loops not fully evaluating with const-eable dependencies #101082

Closed
sno2 opened this issue Aug 27, 2022 · 10 comments · Fixed by #109895
Closed

Regression: Inlined loops not fully evaluating with const-eable dependencies #101082

sno2 opened this issue Aug 27, 2022 · 10 comments · Fixed by #109895
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@sno2
Copy link

sno2 commented Aug 27, 2022

I tried this code:

pub fn arr() -> usize {
  let values = [23, 16, 54, 3, 60, 9];
  let mut acc = 0;
  for item in values { 
    acc += item;
  }
  acc
}

I expected to see this happen: The loop should be inlined because values is const-known. For example, here is the generated assembly on Rust 1.63.0 (Godbolt) and Rust Beta (Godbolt):

example::arr:
        mov     eax, 165
        ret

Instead, this happened: On nightly, the loop is inlined. However, the entire function is not computed despite the stable and beta Rust inlining it. Here is the generated assembly on Nightly (Godbolt):

.LCPI0_0:
        .quad   23
        .quad   16
.LCPI0_1:
        .quad   54
        .quad   3
.LCPI0_2:
        .quad   60
        .quad   9
example::arr:
        sub     rsp, 72
        movaps  xmm0, xmmword ptr [rip + .LCPI0_0]
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmm0, xmmword ptr [rip + .LCPI0_1]
        movaps  xmmword ptr [rsp + 16], xmm0
        movaps  xmm0, xmmword ptr [rip + .LCPI0_2]
        movaps  xmmword ptr [rsp + 32], xmm0
        mov     qword ptr [rsp + 56], 6
        xor     ecx, ecx
        xor     edx, edx
        jmp     .LBB0_1
.LBB0_3:
        lea     rdi, [rcx + 1]
        mov     qword ptr [rsp + 48], rdi
        mov     rdx, qword ptr [rsp + 8*rcx]
        mov     esi, 1
        mov     rcx, rdi
        add     rdx, rax
        test    rsi, rsi
        je      .LBB0_5
.LBB0_1:
        mov     rax, rdx
        cmp     rcx, 5
        jbe     .LBB0_3
        xor     esi, esi
        add     rdx, rax
        test    rsi, rsi
        jne     .LBB0_1
.LBB0_5:
        add     rsp, 72
        ret

Meta

n/a - disclosed at usages

Keywords: const inline loop, const loop, const loop not evaluated

Relevant discord thread: https://discord.com/channels/273534239310479360/592856094527848449/1013050670942793808

@sno2 sno2 added the C-bug Category: This is a bug. label Aug 27, 2022
@sno2 sno2 changed the title Regression: Inlined loops not fully evaluating const-eable dependencies Regression: Inlined loops not fully evaluating with const-eable dependencies Aug 27, 2022
@Rageking8
Copy link
Contributor

@rustbot label regression-from-stable-to-nightly

@rustbot rustbot added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 27, 2022
@steffahn
Copy link
Member

@rustbot label I-slow

@rustbot rustbot added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Aug 27, 2022
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high t-compiler

@rustbot rustbot added P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 31, 2022
@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Aug 31, 2022
@nikic nikic self-assigned this Sep 1, 2022
@nikic
Copy link
Contributor

nikic commented Sep 1, 2022

Partial fix: llvm/llvm-project@43e7d9a This at least allows the loop to be unrolled, though it only gets fully constant folded in the backend.

@nikic
Copy link
Contributor

nikic commented Sep 1, 2022

  %_2.sroa.4.0.iter.sroa_idx = getelementptr inbounds i8, ptr %iter, i64 48
  store i64 0, ptr %_2.sroa.4.0.iter.sroa_idx, align 8
  %_2.sroa.5.0.iter.sroa_idx = getelementptr inbounds i8, ptr %iter, i64 56
  store i64 6, ptr %_2.sroa.5.0.iter.sroa_idx, align 8
  %self1.i = getelementptr inbounds %"core::array::iter::IntoIter<usize, 6>", ptr %iter, i64 0, i32 1
  %_4.i.i = getelementptr inbounds %"core::array::iter::IntoIter<usize, 6>", ptr %iter, i64 0, i32 1, i32 1
  %_4.val.i.i = load i64, ptr %_4.i.i, align 8, !alias.scope !2
  %0 = load i64, ptr %self1.i, align 8, !alias.scope !2

Oh my, if it isn't yet another casualty of LLVM's dumb typed GEP representation. This might get fixed by #98615 as well.

@nikic
Copy link
Contributor

nikic commented Sep 1, 2022

This turns out to be surprisingly tricky due to phase ordering issues. Even with canonical GEPs, the loop bounds only become known after full unrolling, by which point it is too late. I think the best shot at this would be to generalize LICM scalar promotion to perform load promotion in the presence of potentially aliasing loads. https://reviews.llvm.org/D133111 is preparation to move in that direction.

@nikic
Copy link
Contributor

nikic commented Sep 2, 2022

https://reviews.llvm.org/D133192 for improved LICM promotion. Still needs to be combined with canonical GEPs to work.

@pnkfelix
Copy link
Member

pnkfelix commented Dec 16, 2022

@nikic what are your thoughts on this ticket? Your partial fix was merged, but the other LLVM changes have not.

Meanwhile, it seems like #98615 has stalled, but nikic r+'ed it apart from wanting a FIXME removed, does that sound right? (I went ahead and added a commit to remove the FIXME.) And its not clear whether #98615 would actually resolve this, is that correct?

@nikic
Copy link
Contributor

nikic commented Dec 16, 2022

@pnkfelix We need both #98615 and https://reviews.llvm.org/D133192 to fully resolve this, as far as I remember. We should probably land #98615 ASAP to give us a chance to address regressions from it in LLVM 16.

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Apr 3, 2023
@nikic
Copy link
Contributor

nikic commented Apr 3, 2023

Fixed by the LLVM 16 upgrade.

@sno2 sno2 closed this as completed Apr 3, 2023
JohnTitor added a commit to JohnTitor/rust that referenced this issue Apr 11, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants