-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert(Int16, ::Float64) does not throw inexact exception when it should #14549
Comments
ASM is from gcc disassembly due to #14550 |
Isn't this the same issue as #10124? |
Yeah, seems like it #10124 (comment). I wasn't really sure what exactly are the undefined behaviors. The PR that closes that issue doesn't seem to cover this one though. |
Yeah, we need to audit all uses of unsafe_trunc, not just the one in |
Ah, that is interesting. It seems that undefined behaviour actually allows values outside of the range of the destination type (which in hindsight makes sense, since I guess this means we really need to do the checks before calling |
So the problem is that adding range checks pre- Ideally this would be handled at the LLVM level: either define a checked version of c.f. related rust issue: rust-lang/rust#10184 |
It's a bit of a hack, but what if we were to define: function convert(::Type{Int16},x::Float64)
u = unsafe_trunc(Int32,x) % Int16
convert(Float64,u) == x || throw(InexactError())
u
end It seems to give the same (valid) instructions on x86, does it fix the issue on ARM? We're still technically playing with undefined behaviour here, so it would be good to get this clarified upstream. |
It does seem to fix the issue on ARM and is still working on all platforms I have tested. There seems to be a ~20% performance regression on both x64 and aarch64 though. |
Any idea why it's slower? The |
You are right. I was hitting johnmyleswhite/Benchmarks.jl#36 and didn't look at allocation count. There's also a small difference due to inlining (the julia version is harder to inline, which shouldn't be an issue if we simply fix the intrinsics). There's no measurable performance difference once these two issues are fixed. There's a small difference on x64 with (patched) llvm 3.7.1 in the branch instructions and where the error branch is and IMHO the new version generates slightly better code since the error branch is at the end of the function. |
Ah good. I was thinking we could just implement these in Julia, since there's no real reason they need to be intrinsics. What if I just wrap them in an |
It will still make the function that calls them harder to inline. Not sure how big an effect it is though. |
Does |
nvm, I see you are refering to the original issue. ......................... |
I meant, the same issue as above, i.e. does |
Yeah realized that and edited my comment above... julia> convert(Int8, 200000.0)
64
julia> convert(Int8, 200000f0)
64
julia> convert(UInt8, 200000.0)
0x40
julia> convert(UInt8, 200000f0)
0x40
julia> convert(Int16, 200000.0)
3392
julia> convert(Int16, 200000f0)
3392
julia> convert(UInt16, 200000.0)
0x0d40
julia> convert(UInt16, 200000f0)
0x0d40
|
I posted a note on llvm-dev here: |
I don't know much about Swift, but it seems that they manually check every conversion: |
Ah, but it seems that |
I had seen somewhere that different levels of -O flags make a surprisingly large difference to swift's performance, this kind of thing is probably part of the reason. |
Since we still haven't fixed this, here is a recap: The problem here is that if the input is out of range in Float -> Integer conversion the behaviour is undefined, and LLVM considers returning a 32bit integer instead of a 16bit one acceptable undefined behaviour, which breaks our assumptions in the convert logic (which converts to integer, then back to float and compares this with the original result). Unless we can convince LLVM to change this, the options here are:
|
If we know that we are going to get a 32 bit result, can we just mask it down to 16 bits before converting back to float? That should be pretty cheap. |
Actually, I must have been doing something wrong: now it seems to be about the same (or possibly even faster) to do the range check. So maybe we should change this. |
Seems that LLVM implements more optimizations on aarch64 and now (LLVM-svn) this is a problem there too. |
Replaces checked_fptosi/checked_fptoui intrinsics with Julia implementations. Fixes #14549. Explain logic behind float->integer conversion checking
LLVM IR (wrapped in another function)
ASM
(interestingly this doesn't happen on AArch64)
The text was updated successfully, but these errors were encountered: