Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if Result::is_ok() followed immediately by Result::unwrap() fails to optimize out the second err check #85771

Open
TheBlueMatt opened this issue May 28, 2021 · 9 comments
Labels
A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@TheBlueMatt
Copy link

In the following trivial example, LLVM fails to optimize the unwrap() away. While its somewhat contrived in the sense that a match or if let should be preferred, more complicated logic ends up with similar blocks eventually.

pub fn print_if_some(arg: Result<u64, u32>) {
    if arg.is_ok() {
        let v = arg.unwrap();
        if v == 42 {
            println!("{}", v);
        }
        return;
    }
}

Godbolt link for the same is https://godbolt.org/z/xrT1GfG7e you can see the test and jne for panic'ing.

@TheBlueMatt TheBlueMatt added the C-bug Category: This is a bug. label May 28, 2021
@workingjubilee workingjubilee changed the title if(Result::is_ok()) followed immediately by a Result::unwrap() fails to optimize out the err check if Result::is_ok() followed immediately by Result::unwrap() fails to optimize out the second err check May 28, 2021
@jhpratt

This comment has been minimized.

@rustbot rustbot added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels May 28, 2021
@scottmcm
Copy link
Member

Hmm, I'd hoped that MIR would clean some things up here (there was some work done for if matches!, IIRC), but no matter what I do (-Z inline-mir=yes -Z inline-mir-threshold=999999 -Z unsound-mir-opts=yes -Z mir-opt-level=3), the Result::unwrap just isn't getting mir-inlined: https://godbolt.org/z/54EoTYWvT

@AngelicosPhosphoros
Copy link
Contributor

I think, the core issue is our ABI.
In current Rust ABI, small structs (smaller than 128 bits) casted to integers when passed by value to put them into a register.

This causes missing optimizations because LLVM fails to understand that i128 values are same as typed struct values.
Then it sees only that we compare "struct" in is_ok (because we pass it as reference) and then again validate i128 value.

Code before inlining that shows how different values used in LLVM
; Function Attrs: uwtable
define void @print_if_some(i128 %0) unnamed_addr #3 {
  %2 = alloca i128, align 8
  %3 = bitcast i128* %2 to %"core::result::Result<u64, u32>"*
  store i128 %0, i128* %2, align 8
  %4 = getelementptr %"core::result::Result<u64, u32>", %"core::result::Result<u64, u32>"* %3, i64 0, i32 0
  %5 = load i32, i32* %4, align 8
  %6 = call fastcc zeroext i1 @"_ZN4core6result19Result$LT$T$C$E$GT$5is_ok17hb5269faed3cb7727E"(i32 %5)
  br i1 %6, label %8, label %7

7:                                                ; preds = %12, %8, %1
  ret void

8:                                                ; preds = %1
  %9 = load i128, i128* %2, align 8
  %10 = call fastcc i64 @"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17he2644901415f1491E"(i128 %9)
  %11 = icmp eq i64 %10, 42
  br i1 %11, label %12, label %7

12:                                               ; preds = %8
  call void @do_not_elim(i64 42)
  br label %7
}

We can see that it uses i128 %2 for unwrap and Result %3 for is_ok.

Also, to ensure that this is ABI issue, I tried to use bigger Result that doesn't fit into i128:

extern "Rust" {
    fn do_not_elim(v: u128);
}

#[no_mangle]
pub fn print_if_some(arg: Result<u128, u32>) {
    if arg.is_ok() {
        let v = arg.unwrap();
        if v == 42 {
            unsafe{
                do_not_elim(v);
            }
        }
    }
}

And rustc successfully optimizes this version:

.LCPI0_0:
        .long   42
        .long   0
        .long   0
        .long   0
print_if_some:
        cmp     dword ptr [rdi], 0
        jne     .LBB0_2
        vmovdqu xmm0, xmmword ptr [rdi + 8]
        vpxor   xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vptest  xmm0, xmm0
        jne     .LBB0_2
        mov     edi, 42
        xor     esi, esi
        jmp     qword ptr [rip + do_not_elim@GOTPCREL]
.LBB0_2:
        ret

godbolt link

I think, we should go away from "cast to integer" hack and teach LLVM to pass structs in registers without bitcasting.
This would solve other issues about missed optimizations which caused by the current hack:
#91447
#85265

@AngelicosPhosphoros
Copy link
Contributor

@nikic you know LLVM better than many. Do you think if some special kind of ABI for Rust is possible?

Also, maybe we can just send structs as values and let LLVM handle this itself? As described here:
https://lists.llvm.org/pipermail/llvm-dev/2018-April/122714.html

@AngelicosPhosphoros
Copy link
Contributor

@shampoofactory Can you please check if your changes about i128 affect this issue?
#26494 (comment)

Also, it would be nice if you test such code with your changes too:

extern "Rust" {
    fn do_not_elim(v: u16);
}

#[no_mangle]
pub fn print_if_some(arg: Result<u16, u8>) {
    if arg.is_ok() {
        let v = arg.unwrap();
        if v == 42 {
            unsafe{
                do_not_elim(v);
            }
        }
    }
}

@MSxDOS
Copy link

MSxDOS commented Dec 9, 2021

Minimized:

pub fn test(arg: Result<u64, u32>) {
    if arg.is_ok() {
        arg.unwrap();
    }
}

As of now, enabling MIR inliner removes the unwrap: https://godbolt.org/z/9xMsr8MzK

@nikic
Copy link
Contributor

nikic commented Dec 9, 2021

@nikic you know LLVM better than many. Do you think if some special kind of ABI for Rust is possible?

Also, maybe we can just send structs as values and let LLVM handle this itself? As described here: https://lists.llvm.org/pipermail/llvm-dev/2018-April/122714.html

If we're talking about values with larger than register size, then yes, passing them as a struct with two elements should work -- however, it would be even better to pass them as two separate arguments, as that saves on insertvalue/extractvalue boilerplate around the call. The current i128 ABI also results in the argument being passed as two registers in the end, but the representation is very hard to optimize in IR and will only get unpacked into two values in the backend.

For values smaller than the register size, passing it as an integer can make sense as a packed representation, as something like { i8, i8, i8, i8 } would otherwise get passed in four registers rather than packed into one. Though I'm not sure how profitable that packing really is.

@nikic
Copy link
Contributor

nikic commented Dec 9, 2021

Though for this specific case, this can and probably should be fixed on the LLVM side as well. We are currently missing this canonicalization: https://alive2.llvm.org/ce/z/Qx6KWg

@shampoofactory
Copy link
Contributor

@AngelicosPhosphoros Hi, I'm not sure I'm qualified to give you a knowledgeable answer, other than to say that compiling with 1.47.0, 1.48.0, 1.57.0 and dev branch with #26494 reverted gives us identical outputs. As the #26494 regression manifests in 1.48+ we would expect changes in those compilations, which we don't see.

Source:

extern "Rust" {
    fn do_not_elim(v: u16);
}

// Commented out, otherwise cargo asm doesn't pick up the function.
// #[no_mangle]
pub fn print_if_some(arg: Result<u16, u8>) {
    if arg.is_ok() {
        let v = arg.unwrap();
        if v == 42 {
            unsafe{
                do_not_elim(v);
            }
        }
    }
}
$ cargo asm angelicos::print_if_some
angelicos::print_if_some:
    push    rax
    mov     eax, edi
    test    al, al
    jne     .LBB2_3
    test    al, 1
    jne     .LBB2_4
    and     eax, -65536
    cmp     eax, 2752512
    jne     .LBB2_3
    mov     edi, 42
    pop     rax
    jmp     qword, ptr, [rip, +, do_not_elim@GOTPCREL]
.LBB2_3:
    pop     rax
    ret
.LBB2_4:
    mov     byte, ptr, [rsp, +, 7], ah
    lea     rdi, [rip, +, .L__unnamed_1]
    lea     rcx, [rip, +, .L__unnamed_2]
    lea     r8, [rip, +, .L__unnamed_3]
    lea     rdx, [rsp, +, 7]
    mov     esi, 43
    call    qword, ptr, [rip, +, _ZN4core6result13unwrap_failed17h32ef6b3156e8fc57E@GOTPCREL]
    ud2 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants