-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc: Improving safe wasm float->int casts #74695
Conversation
r? @davidtwco (rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but someone else should probably take a look too.
r? @nagisa |
Actually for the 8-bit and 16-bit cases, what I did in my prototype was to emit the cast to u32 / i32 and then downcast the int from there, that should be a reasonably easy change here too. Should we do that in a follow up PR or do you want to squeeze it into this PR? (I could look into it again as well) |
// | ||
// ;; ... previous basic block | ||
// %load_result = alloca i32, align 4 | ||
// %less_or_nan = fcmp ult double %x, 0xC1E0000000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more complicated than it needs to be. I’m aware that this is not very constructive in terms of feedback as it is… but I’m experimenting with some alternative ways to write this down as LLVM IR and hopefully will come back to update this before monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s something I have so far (caveat untested, unchecked and not validated):
; check that the number is within the expected bounds and just convert if the number is in range
%inbound_lower = fcmp oge double %0, 0xC1E0000000000000
%inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000
%inbounds = and i1 %inbound_lower, %inbound_upper
br i1 %inbounds, label %convert, label %specialcase
specialcase:
%isnan = fcmp ord double %0, 0.000000e+00 ; can probably also be `!inbound_lower && !inbound_upper`?
; select a max/min value for the case where float is not nan
%result_notnan = select i1 %inbound_lower, i32 2147483647, i32 -2147483648
; nan or min/max
%result_nan = select i1 %isnan, i32 0, i32 %result_notnan
ret i32 %result_nan
convert:
%result = fptosi double %0 to i32
ret i32 %result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I ended up with in the end
;; block so far... %0 is the argument
%inbound_lower = fcmp oge double %0, 0xC1E0000000000000
%inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000
;; match (inbound_lower, inbound_upper) {
;; (true, true) => %0 can be converted without trapping
;; (false, false) => %0 is a NaN
;; (true, false) => %0 is too large
;; (false, true) => %0 is too small
;; }
;;
;; The (true, true) check, go to %convert if so.
%inbounds = and i1 %inbound_lower, %inbound_upper
br i1 %inbounds, label %convert, label %specialcase
specialcase:
;; Handle the cases where the number is NaN, too large or too small
;; Either (true, false) or (false, true)
%is_not_nan = or i1 %inbound_lower, %inbound_upper
;; Figure out which saturated value we are interested in if not `NaN`
%saturated = select i1 %inbound_lower, i32 2147483647, i32 -2147483648
;; Figure out between saturated and NaN representations
%result_nan = select i1 %is_not_nan, i32 %saturated, i32 0
br label %done
convert:
%result = call i32 @llvm.wasm.trunc.signed.i32.f64(double %0) ; fptosi double %0 to i32
br label %done
done:
%r = phi i32 [ %result, %convert ], [ %result_nan, %specialcase ]
;; continues here
Here’s a godbolt comparison of the two approaches: https://rust.godbolt.org/z/cnEGWM. Curiously the output in godbolt is very different from what was posted in the PR description, and I’m not sure why the difference (maybe i32
vs u32
?).
Anyway, comparing just the godbolt output the code generated from the snippet above looks somewhat better to my wasm-untrained eye. This code is also something LLVM is not really able to optimize much further. So at the very least doing something along the lines of this snippet can be a win in terms of the compile time.
If you don’t want to attempt to integrate something like this, that's also fine with me. In that case let me know and I’ll r+ this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice! I'm happy to update this to use that codegen. Do you know if we can natively generate phi nodes though? I wanted to do that instead of using an alloca
but the builder trait didn't seem to have a way to manually construct phi nodes.
66e5418
to
4a8d24b
Compare
Ok I've updated this to codegen in line with @nagisa's suggestions, although it still uses an |
This commit improves code generation for WebAssembly targets when translating floating to integer casts. This improvement is only relevant when the `nontrapping-fptoint` feature is not enabled, but the feature is not enabled by default right now. Additionally this improvement only affects safe casts since unchecked casts were improved in rust-lang#74659. Some more background for this issue is present on rust-lang#73591, but the general gist of the issue is that in LLVM the `fptosi` and `fptoui` instructions are defined to return an `undef` value if they execute on out-of-bounds values; they notably do not trap. To implement these instructions for WebAssembly the LLVM backend must therefore generate quite a few instructions before executing `i32.trunc_f32_s` (for example) because this WebAssembly instruction traps on out-of-bounds values. This codegen into wasm instructions happens very late in the code generator, so what ends up happening is that rustc inserts its own codegen to implement Rust's saturating semantics, and then LLVM also inserts its own codegen to make sure that the `fptosi` instruction doesn't trap. Overall this means that a function like this: #[no_mangle] pub unsafe extern "C" fn cast(x: f64) -> u32 { x as u32 } will generate this WebAssembly today: (func $cast (type 0) (param f64) (result i32) (local i32 i32) local.get 0 f64.const 0x1.fffffffep+31 (;=4.29497e+09;) f64.gt local.set 1 block ;; label = @1 block ;; label = @2 local.get 0 f64.const 0x0p+0 (;=0;) local.get 0 f64.const 0x0p+0 (;=0;) f64.gt select local.tee 0 f64.const 0x1p+32 (;=4.29497e+09;) f64.lt local.get 0 f64.const 0x0p+0 (;=0;) f64.ge i32.and i32.eqz br_if 0 (;@2;) local.get 0 i32.trunc_f64_u local.set 2 br 1 (;@1;) end i32.const 0 local.set 2 end i32.const -1 local.get 2 local.get 1 select) This PR improves the situation by updating the code generation for float-to-int conversions in rustc, specifically only for WebAssembly targets and only for some situations (float-to-u8 still has not great codegen). The fix here is to use basic blocks and control flow to avoid speculatively executing `fptosi`, and instead LLVM's raw intrinsic for the WebAssembly instruction is used instead. This effectively extends the support added in rust-lang#74659 to checked casts. After this commit the codegen for the above Rust function looks like: (func $cast (type 0) (param f64) (result i32) (local i32) block ;; label = @1 local.get 0 f64.const 0x0p+0 (;=0;) f64.ge local.tee 1 i32.const 1 i32.xor br_if 0 (;@1;) local.get 0 f64.const 0x1.fffffffep+31 (;=4.29497e+09;) f64.le i32.eqz br_if 0 (;@1;) local.get 0 i32.trunc_f64_u return end i32.const -1 i32.const 0 local.get 1 select) For reference, in Rust 1.44, which did not have saturating float-to-integer casts, the codegen LLVM would emit is: (func $cast (type 0) (param f64) (result i32) block ;; label = @1 local.get 0 f64.const 0x1p+32 (;=4.29497e+09;) f64.lt local.get 0 f64.const 0x0p+0 (;=0;) f64.ge i32.and i32.eqz br_if 0 (;@1;) local.get 0 i32.trunc_f64_u return end i32.const 0) So we're relatively close to the original codegen, although it's slightly different because the semantics of the function changed where we're emulating the `i32.trunc_sat_f32_s` instruction rather than always replacing out-of-bounds values with zero. There is still work that could be done to improve casts such as `f32` to `u8`. That form of cast still uses the `fptosi` instruction which generates lots of branch-y code. This seems less important to tackle now though. In the meantime this should take care of most use cases of floating-point conversion and as a result I'm going to speculate that this... Closes rust-lang#73591
4a8d24b
to
2c1b046
Compare
ping for a re-r? from @nagisa |
Yeah LGTM. Its too bad we cannot easily generate phi – having many casts will ultimately end up generating a larger amount of IR for LLVM to optimize, slowing down compilation. But nevertheless the improvement seems worth it. @bors r+ |
📌 Commit 2c1b046 has been approved by |
💡 This pull request was already approved, no need to approve it again.
|
📌 Commit 2c1b046 has been approved by |
☀️ Test successful - checks-actions, checks-azure |
This commit fixes an issue with rust-lang#74695 where the fptosi and fptoui specializations on wasm were accidentally used on vector types by the `simd_cast` intrinsic. This issue showed up as broken CI for the stdsimd crate. Here this commit simply skips the specialization on vector kinds flowing into `fpto{s,u}i`.
Don't try to use wasm intrinsics on vectors This commit fixes an issue with rust-lang#74695 where the fptosi and fptoui specializations on wasm were accidentally used on vector types by the `simd_cast` intrinsic. This issue showed up as broken CI for the stdsimd crate. Here this commit simply skips the specialization on vector kinds flowing into `fpto{s,u}i`.
This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the
nontrapping-fptoint
feature is not enabled, but the featureis not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in #74659.
Some more background for this issue is present on #73591, but the
general gist of the issue is that in LLVM the
fptosi
andfptoui
instructions are defined to return an
undef
value if they execute onout-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing
i32.trunc_f32_s
(forexample) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the
fptosi
instructiondoesn't trap. Overall this means that a function like this:
will generate this WebAssembly today:
This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing
fptosi
, and instead LLVM's raw intrinsic forthe WebAssembly instruction is used instead. This effectively extends
the support added in #74659 to checked casts. After this commit the
codegen for the above Rust function looks like:
For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:
So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the
i32.trunc_sat_f32_s
instruction rather than alwaysreplacing out-of-bounds values with zero.
There is still work that could be done to improve casts such as
f32
tou8
. That form of cast still uses thefptosi
instruction whichgenerates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...
Closes #73591