-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
64-bit wide 32 bit floating-point packed SIMD vector arithmetic produces incorrect results on Windows GNU targets #53254
Comments
I am also seeing a similar failure on apple darwin: rust-lang/packed_simd#25 but I don't know if its the same issue. |
The SIMD calculations themseves are constevaluated down to Only taking a reference of the result and panicking with it seems to begin causing issues (I’d expect the function to still be optimised out, but it is not): unsafe {
let b = f32x2(0.0, 0.0);
match (&sin_v2f32(b), &b) {
(a, b) => {
if a != b {
panic!("not equal {:?} {:?}", a, b); // not optimised out unless this panic is replaced with something that does not use `a` or `b`.
}
}
}
} because of that, the sine in the code snippet above is calculated at runtime. For the x64 windows target the following assembly gets generated: banana:
subq $200, %rsp
movaps %xmm8, 176(%rsp)
movaps %xmm7, 160(%rsp)
movaps %xmm6, 144(%rsp)
movq $0, 32(%rsp)
movq 32(%rsp), %xmm6 ; xmm6? Only xmm0-3 are used in x64 calling convention!
callq sinf
movdqa %xmm0, %xmm8
movdqa %xmm6, %xmm0
callq sinf
punpckldq %xmm0, %xmm8
movdqa %xmm6, %xmm0
callq sinf
movdqa %xmm0, %xmm7
pshufd $229, %xmm6, %xmm0
callq sinf
punpckldq %xmm0, %xmm7
; <snip> The I have no other insight for now. I will fill an LLVM bug for 4 |
The LLVM bug for 4 |
Fix for that LLVM bug is already available in Rust's LLVM but EDIT: Example from the first post works fine. |
Can this issue get LLVM label? Example reduced from #![feature(
crate_visibility_modifier,
link_llvm_intrinsics,
platform_intrinsics,
repr_simd,
stdsimd
)]
#![allow(non_camel_case_types)]
extern crate core;
use core::arch;
use core::mem;
extern "platform-intrinsic" {
crate fn simd_ne<d, e>(f: d, h: d) -> e;
}
extern "C" {
#[link_name = "llvm.pow.f32"]
fn pow_f32(f: f32, h: f32) -> f32;
}
#[repr(simd)]
#[derive(Copy, Clone)]
struct f32x2(f32, f32);
#[repr(simd)]
#[derive(Copy, Clone)]
struct i32x2(i32, i32);
impl f32x2 {
pub fn ne(self, aq: Self) -> bool {
let z: i32x2 = unsafe { simd_ne(self, aq) };
use arch::x86_64::_mm_movemask_pi8;
unsafe { _mm_movemask_pi8(mem::transmute(z)) != 0 }
}
}
macro_rules! assert_eq_ {
($a:expr, $b:expr) => {
if $a.ne($b) {
panic!()
}
};
}
pub fn main() {
let o = f32x2(1.0, 1.0);
assert_eq_!(o, o);
assert_eq_!(2.0, unsafe {&pow_f32(2.0, 1.0)});
//println!("{}", unsafe {&pow_f32(2.0, 1.0)});
} Godbolt link: |
This comment has been minimized.
This comment has been minimized.
Reduced as much as I could: #![feature(core_intrinsics, link_llvm_intrinsics, repr_simd, simd_ffi, stdsimd)]
#![allow(non_camel_case_types)]
extern crate core;
use core::mem::transmute;
#[repr(simd)]
struct __m64(i64);
#[allow(improper_ctypes)]
extern "C" {
#[link_name = "llvm.pow.f32"]
fn pow_f32(f: f32, h: f32) -> f32;
#[link_name = "llvm.x86.mmx.pmovmskb"]
fn pmovmskb(a: __m64) -> i32;
}
#[repr(simd)]
#[derive(Copy, Clone)]
struct f32x2(f32, f32);
pub fn main() {
unsafe {
let _ = pmovmskb(transmute(0_i64));
//println!("{}", pow_f32(2.0, 1.0));
if 2.0.ne(&pow_f32(2.0, 1.0)) {
core::intrinsics::abort();
};
}
} Running:
Godbolt: |
My best guess is that behavior of this last program is rooted in the MMX instruction trampling all over the x87 FPU stack, making it impossible to use x87 instructions correctly until the
If this theory is correct, this is pretty depressing. Honestly, I think the best fix would be to not use MMX intrinsics ever. Inserting I am not sure whether this is the same problem as the original issue or something separate. I guess if the packed_simd tests include some usage of MMX instructions, it might be related? |
It does but also replacing There is open thread on wg-llvm zulip thread if you want to talk about it. EDIT:
Mingw-w64 |
Has there been any update to this? It's been years since I've been able to use the Windows GNU target. |
Are you hitting this specific issue? That's surprising, as far as we know this bug only occurs in programs that deliberately use MMX instructions (which have been obsolete for many years, and not accessible at all from stable Rust). If you observe this bug (or another bug that looks like it) in other contexts, please tell us more. |
@hanna-kruppe |
Are you sure? In current packed_simd tree, I can only find one mention of |
After some spelunking, I see that there used to be some MMX code in |
Can confirm this is still a crippling issue as of NaNs everywhere. |
Please be more clear: which of the many programs posted in this and related threads are you referring to? |
My apologies. I've never been able to properly minimally reproduce this with a simple example, it only occurs in my 100k LoC work project near trig operations. I believe your theory at #53254 (comment) is on the right track, because in addition to never being able to minimally reproduce it, I've never been able to find the source of issues by printing input variables. It appears out of nowhere, randomly. Sometimes different places between multiple runs. |
One relatively simple way to validate my theory would be to disassemble your application and check if there's any mention of |
Indeed, there are 4 MMX register uses in For debug: vcmpunordps .LCPI187_26(%rip), %xmm3, %xmm4
.Ltmp9279:
.loc 70 14 69
movdq2q %xmm4, %mm0
.Ltmp9280:
.loc 52 2377 5
pmovmskb %mm0, %eax All occurrences follow No idea how to relate those For release: vcmpeqps %xmm0, %xmm7, %xmm0
movdq2q %xmm0, %mm0
vxorps %xmm0, %xmm0, %xmm0
pmovmskb %mm0, %eax All occurrences follow |
It looks like packed_simd is still using While it is now under an SSE guard, this is still an MMX intrinsic. |
I do use |
@hanna-kruppe example in this thread has been minimized from |
@novacrazy Maybe just try removing this line? https://github.com/rust-lang/packed_simd/blob/28b5e161ceaf0096a5e9f387f46441ed6494290e/src/codegen/reductions/mask/x86.rs#L143 I looks like the fallback implementation is to do a bitcast and compare, which I would expect LLVM to already optimize well. |
Aha! @nikic, commenting out all of the 64-bit masks (not just No more MMX register uses in the assembly and no more NaNs! |
I think we can close this issue, there is nothing to fix on Rust side. Also MMX may get removed from LLVM: https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Goodbye-MMX-Maybe |
So, is this fixed by rust-lang/stdarch#890? I can see neither EDIT: Ah, I had to unfold some diffs. I guess there's no way for |
@RalfJung that's true but those are unstable features and the fix for |
Yeah, this is definitely an okay breaking change, I was just trying to understand the situation. Maybe an issue should be opened in Also I'll close this issue based on @mati865's comment above. |
Opened rust-lang/packed_simd#287 |
crossref: rust-lang/packed_simd#72
The following operations of the
f32x2
vector type produce incorrect results on gnu windows targets ({i686, x86_64}-pc-windows-gnu
) in debug builds:sin
,cos
,fma
, scalar (+,-,*,/, unary -):I can't reproduce locally but the following could help somebody to try to reproduce this on Windows, and maybe figure out what's going on.
This playground example might be used to reproduce this though (I can't know since I don't have access to the platform - it would be good to know if that reproduces the issue or not to narrow down the root cause):
The text was updated successfully, but these errors were encountered: