WASM float to int performance regression since 1.53.0 #87643

CryZe · 2021-07-30T19:57:10Z

Both Rust and WASM introduced saturating float to int casts. However WASM originally only had trapping float to int casts. LLVMs internal float casts are speculatable, i.e. it can execute them early, assuming no trap ever happens. This however means LLVM needs to protect itself from WASM's trapping casts by emitting some more code around it. Once Rust introduced saturating float to int casts, rustc itself started emitting a bunch of code around the casts to saturate the values. This then led to both rustc and LLVM emitting this guard code around each cast. However since rustc already protected itself some dangerous values, LLVM didn't need to emit any of these additional instructions. This was eventually implemented. Check this previous issues and related PRs: #73591

However with the switch to LLVM 12, it was possible to throw out a lot of the manual codegen in rustc:
#84339 which acknowledges a regression in those casts

and then a follow up PR:
#84654 which supposedly fixes the regression

However it seems like there's still a regression: https://rust.godbolt.org/z/W18vGcv9T

cc @alexcrichton

CryZe · 2021-07-30T20:00:37Z

Though looking at the LLVM IR it seems like the difference is that 1.52.0 is our own rustc saturation implementation and since 1.53 it's LLVM 12's saturation implementation, which seems to be worse?

alexcrichton · 2021-07-30T20:05:18Z

Thanks for the heads up, this is behaving as "expected" although the expectation wasn't really thoroughly evaluated by me.

As you've discovered the main difference (which you can see in that diff view with --emit llvm-ir) is that with Rust 1.52 and prior (pre-those PRs) rustc manually emitted LLVM IR that did the right thing. Rust 1.53.0 and later we're calling the LLVM intrinsic for saturating float-to-int conversion.

In that sense codegen is behaving as expected, and I believe at the time I diff'd the two (rustc's old codegen and LLVM's built-in intrinsic codegen) and saw they were different but assume that the difference was negligible. Have you measured the LLVM-intrinsic-generated code to have worse performance? (I see it has a few extra instructions but I'd be curious to put concrete numbers on it if possible)

If LLVM has worse performance I think it'd be good to open an issue upstream with them and see if we can improve it upstream, but if it's critical and it's too difficult to land upstream then we can perhaps re-land the wasm-specific bits for rustc.

(it's also be best if Safari implemented the nontrapping-fptoint extensions so we could consider turning that on by default...)

CryZe · 2021-07-30T20:46:44Z

I didn't do any benchmarks, but it seems like it suffers from the same problem as our original codegen. This is roughly the new WASM translated into Rusty pseudo code:

// General ifs to do the saturation
if x.is_nan() {
    0
} else if x >= 0x1.fffffep30 {
    2147483647
} else if x >= -0x1p31 {
    // Protection against trapping
    if x.abs() < 0x1p31 {
        (int)x
    } else {
        -2147483648
    }
} else {
    -2147483648
}

This protection against trapping shouldn't be there, the saturation code already checked for all the edge cases (though technically a lot of these are selects, which they can't be if you remove the protection code, which may have some performance implications?). This is probably because this lowering of the saturation casts is backend independent and yet again the WASM backend doesn't know anything about it, so it still protects itself from the trapping. So yeah this definitely can be improved. I'll look into raising an upstream issue I guess.

It's not critical at all, it's just something that we stumbled upon in some Twitter discussion.

adrian17 · 2021-07-31T17:45:21Z

Both Rust and WASM introduced saturating float to int casts

Don't you need to opt-in on the wasm side with -C target-feature=+nontrapping-fptoint ? After adding this to your godbolt link, on nightly (but not 1.52), it'll compile down to:

example::cast:
        local.get       0
        i32.trunc_sat_f32_s
        end_function

(of course it doesn't change that without this feature, there's a regression)

CryZe · 2021-07-31T18:16:26Z

Yeah this issue is only concerned with not having the WASM feature active. It should still produce reasonably good code then. Atm it's not as good as before 1.53.

CryZe added the C-bug Category: This is a bug. label Jul 30, 2021

sanxiyn added the O-wasm Target: WASM (WebAssembly), http://webassembly.org/ label Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WASM float to int performance regression since 1.53.0 #87643

WASM float to int performance regression since 1.53.0 #87643

CryZe commented Jul 30, 2021

CryZe commented Jul 30, 2021 •

edited

Loading

alexcrichton commented Jul 30, 2021 •

edited

Loading

CryZe commented Jul 30, 2021 •

edited

Loading

adrian17 commented Jul 31, 2021 •

edited

Loading

CryZe commented Jul 31, 2021

WASM float to int performance regression since 1.53.0 #87643

WASM float to int performance regression since 1.53.0 #87643

Comments

CryZe commented Jul 30, 2021

CryZe commented Jul 30, 2021 • edited Loading

alexcrichton commented Jul 30, 2021 • edited Loading

CryZe commented Jul 30, 2021 • edited Loading

adrian17 commented Jul 31, 2021 • edited Loading

CryZe commented Jul 31, 2021

CryZe commented Jul 30, 2021 •

edited

Loading

alexcrichton commented Jul 30, 2021 •

edited

Loading

CryZe commented Jul 30, 2021 •

edited

Loading

adrian17 commented Jul 31, 2021 •

edited

Loading