-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiled executable fails to launch when built with AVX and LTO enabled #44056
Comments
Related: rust-embedded/cortex-m#44 |
Good analysis @yvt. I guess that bitcast is dodgy, creating an unaligned pointer. I wonder if that comes from an LLVM pass or rust codegen. @whitequark I'm not sure that is related is it? |
The bitcast is fine, at least if LTO is not applied. As shown in #40454, those |
I guess because it expects a |
Ah yes, the bitcast is fine, its the |
Narrowed down the code to reproduce the issue. The issue can be reproduced with #![feature(lang_items)]
#![feature(start)]
#![feature(libc)]
#![feature(repr_simd)]
#![feature(const_fn)]
#![feature(thread_local)]
#![no_std]
#![no_main]
use core::mem;
extern crate libc;
struct Hoge(u64, u64, u64, u64);
#[thread_local]
static mut STATIC_VAR: Hoge = Hoge(0, 0, 0, 0);
#[no_mangle]
pub extern fn main(_argc: i32, _argv: *const *const u8) -> i32 {
let mut local_var = Hoge(0, 0, 0, 0);
unsafe {
mem::swap(&mut local_var, &mut STATIC_VAR); // CRASH! (sometimes)
local_var.0 as i32
}
}
#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] fn panic_fmt() -> ! { loop {} } It wasn't LLVM expects global variables (including thread local ones) are aligned as it is specified:
It does not have to be
And it outputs a Mach-O section for TLVs with a proper alignment requirement:
The problem is that, // allocate buffer and fill with template
void* buffer = malloc(size); The aforementioned program fails if the returned By the way, LDC devs seem to have experienced an similar issue. |
@yvt According to the LDC report, it has already been fixed on Xcode 8, but we are still seeing this bug? 😕 Would it be due to the distributed BTW none of the following will fix the issue: putting |
Is this something we could perhaps work around by allocating larger thread locals on our end and then doing the alignment ourselves? |
@alexcrichton Yes, methods like this would work. The overhead of A compiler support would be required for types with even larger alignment requirements. (related to #33626) @kennytm This can be verified by running the following program, which sporadically crashes even if compiled with Xcode 8: #include <stdio.h>
#include <string.h>
__thread char tb = 42;
__thread char zb32[32]; // LLVM opt adds `align 32`
int main()
{
printf("%p %p\n", &tb, zb32);
*(__m256 *)zb32 = _mm256_set_ps(0, 0, 0, 0, 0, 0, 0, 0);
} $ gcc prog3.c -march=native -O3 -o poisson-rng
$ while ./poisson-rng; do :; done
0x7fcb22c02760 0x7fcb22c02780
0x7f93d7402760 0x7f93d7402780
0x7ffefc5026c0 0x7ffefc5026e0
0x7fde87c02760 0x7fde87c02780
0x7fd1af402760 0x7fd1af402780
0x7fcc506006b0 0x7fcc506006d0
Segmentation fault: 11 |
…xcrichton Do not allow LLVM to increase a TLS's alignment on macOS. This addresses the various TLS segfault on macOS 10.10. Fix rust-lang#51794. Fix rust-lang#51758. Fix rust-lang#50867. Fix rust-lang#48866. Fix rust-lang#46355. Fix rust-lang#44056.
A generated executable occasionally fails to launch when built with the rustc options
-Ctarget-feature=+avx -Copt-level=2 -Clto
.I tried this code:
Compiled with the following shell script:
#!/bin/sh rustc main.rs -Ctarget-feature=+avx -C opt-level=3 -Clto -g
When I ran the generated executable
main
repeatedly, the execution of the program stalled (did not terminate nor output anything; did not even enter themain
function) 5 out of 100 times.When I ran the executable from
lldb
, I could see thatEXC_BAD_ACCESS
had occured because it attempted to load a 32-byte block from an unaligned memory usingvmovdqa
(which requires the operand address to be 32-byte aligned).Meta
rustc --version --verbose
:The output of
sample
(a tool that comes with macOS) when the program is stalled:Analysis
The offending instruction is supposedly a part of
libcore::ptr::swap_nonoverlapping_bytes
, which is called during the execution oflibstd::thread::local::LocalKey::init
, which is called when the runtime is being initialized.After the optimization, this call to the intrinsic function
copy_nonoverlapping
is translated into the following LLVM instruction:This is translated into the following x86_64 instruction:
The text was updated successfully, but these errors were encountered: