-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in LLVM frames on Julia 1.9.3 when precompiling CSV.jl on Intel Sapphire Rapids CPU #51482
Comments
I suppose this is the line being compiled? |
Works: 1.8.0, 1.8.5, 1.10-beta2 Crash: 1.9.0, 1.9.3 |
Could you try an asserts-enabled build with |
With the assert build you linked to (thanks for that), with
For some reason the number of |
Tried
Looks like it's been added to rr master, but no released version recognizes it yet. |
RR 5.7 just released, guess the Julia copy needs to be updated? JuliaLang/BugReporting.jl#140 |
Could you also try building julia 1.9 with assertions? |
I used an assert build here: #51482 (comment) |
I can confirm it crashes for me on 1.9.3, but it works on
This sounds like some sort of bugs in LLVM 14, this blog post suggests LLVM 15 received lots of work from Intel engineers for this family of CPUs, which may have fixed this issue. |
Is it very hard to build LLVM for Julia with a patch and try it out? Or is there a way to disable certain LLVM features as a workaround? I tried using JULIA_CPU_TARGET but was getting errors from Julia about incompatible targets. Good chance I'm not using it right though. Based on the assertion build stack trace, a switch statement in They were added in this commit: llvm/llvm-project@655ba9c#diff-eb2f176d67cdf1955a90e71e25d6d39910d723d4e0b8a9bf8dfa229d3a6b2c1e The other place where these appear is
|
I got LLVM to dump debug info, and immediately before the error is this function output, which happens to contain right at the end a CMOV_FR16X instruction. Based on the line numbers, I think it might be
|
Ah yeah, there's a problem with 16-bit floats on Sapphire Rapids and Julia 1.9. No crash in this case, but the result is wrong.
|
I've narrowed it down to a MWE:
If this package is precompiled, the segfault occurs. But if the function is used on the REPL, everything is fine. |
Parsers.jl does not fail to precompile here, but it looks like it doesn't bother precompiling |
This comment was marked as outdated.
This comment was marked as outdated.
Alright, I'm about as minimal as I can get now:
This reliably triggers the crash. If I remove the branch, or the division by 100, or change the ifelse to a conditional assignment, or remove the recursive call, it precompiles successfully. On the REPL, this function works fine, and outputs EDIT: removed PrecompileTools bit, it was dispensible. |
I managed to capture an Also during And finally, a trace using my minimal example and |
Oh, but precompilation happens in a separate process, right? So these may not be useful. I'm not seeing relevant stack frames, and it skips over the error. EDIT: I tried hacking the precompilation CLI command to include rr, but I don't think it worked right - I didn't get the segfault in the replay. |
This is the last LLVM pass output on the MWE prior to the segfault:
|
Is there anything more I can do here? |
Thank you for all the detective work you have done here. The patch you identified is a bit to large for comfort. So the question is, can we disable a cpu feature to get LLVM to stop handling FP16 as native on Sapphire Rapid? Maybe try: |
Actually yeah, that does fix it. |
And to confirm 1.10 works fine? |
Can you give #52349 a try? |
Confirmed that 1.10 works fine. That PR doesn't fix it though. FWIW the patch I identified above didn't fix this crash, it only fixed display of Float16 values, in case that helps. |
Is 1.9.x done? Should this issue be closed as not planned? |
I don't even know where to begin here. This is some deep magic.
All I did before this happened was migrating from Amazon Linux 2 to Amazon Linux 2023, and installed Julia 1.9.3 with Juliaup. Also a new Intel processor generation if that matters (Intel(R) Xeon(R) Platinum 8488C).
The only packages in the environment are Pluto v0.19.28 and CSV v0.10.11. Triggered when precompiling CSV.jl.
The text was updated successfully, but these errors were encountered: