-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building master branch crashes on Apple M1 #44107
Comments
Facing the same issue on an M1 MacBook Air, macOS 12.2. XCode 13.2 as well. |
Got the same issue and am bisecting, there were also a bunch of warnings added that weren't present before so let's see. |
That's not a segfault though, is it? |
Indeed it isn't |
@fingolfin nice to see you here too :) I am wondering if this is by design or a rule of thumb about building julia on new hardware (e.g. M1 Macs) or in general --- 1.7.2 works fine (a lot of tests still fail if you attempt run "all"... ) but master branch bombs (I think the error I get is identical to the above) |
Oh noooo, @Keno 😢 I actually tried to build the master branch just to test this PR 😆 |
This platforms is currently in Tier 3 of support, this means there is currently no CI for aarch64-apple-darwin, so the failure couldn't be caught during the review of the pull request. |
No, no I understand that --- to clarify, I meant: usually the "tagged" items are like guaranteed, but master isn't it exactly like that since it is not a release. Or do we consider master stable enough as almost like a pseudo-release? |
Then again M1 shouldn't be considered stable at all, I guess. But in general for other hardware, master vis-a-vis tags? |
Who said otherwise?
Not sure what that means. |
Don't worry about it 😄 I am glad it wasn't just me, hopefully we will be able to test that PR soon since it seemed pretty impressive! |
I see many more warnings with the latest commit on master (6c16f71).
|
Looking into this a bit, I used
I have no idea if this is a long-term solution though, and I have no idea why this would be triggered now. |
I saw this bit about musttail in the Clang 13.0.0 release note, is it relevant for this bug? |
That note is relevant, I'm just not sure why no other archs get this error, specially aarch64 linux. That ccall specifically was changed in the purity modeling PR. I wonder if it has to do with the differences in ABI that might spook LLVM out. |
@staticfloat @gbaraldi @truedichotomy would you expect this change above to matter for the tests? Running Looks pretty significant (in a good way; previously I had quite a few more tests failing) to me... (edit: sorry I missed the two errors in cmdlineargs and Pkg; some bits below)
|
I wouldn't be surprised if that broke ccall in some places. |
I spun up an ampere instance to try it out and it built fine, and it gave none of the anonymous type warnings. I wonder if that is a difference of GCC vs clang. One thing to try out is using a different another clang instead of apple clang to build to see if it makes a difference. |
I think the clang thing is a red herring; LLVM is choking on Julia-generated code, not clang-generated code. I think there's something about the LLVM IR that we're generating that is causing the issue. It could be a real problem with our IR, or it could be LLVM being overly-picky, it's hard to say, as I don't know LLVM well enough to track that down. My guess as to why it doesn't trigger on aarch64 linux is because there are a number of significant differences between the two ever since 955d427 was merged. |
What I meant about that is that i don't get those warnings anywhere else except apple. However I saw that someone got them on an x86 build so I have no idea. |
Has anyone tried to build Julia using LLVM 14 to see if it makes any difference? |
The JIT code generation for Apple M1 in Julia currently uses a different code path than all other supported architectures (see PR #43664 by @dnadlinger). Most likely this contributes to the issue and is why things work fine everywhere else. The PR by @Keno which broke things |
Hmm, at first glance, I'm not sure how this would relate to JITLink; the code generation itself by and large still uses the same code path, only the linker stage is different. One difference would be the code model setting, but surely that can't be related to this error? Unfortunately, I don't have the bandwidth to look into this right now. |
@dnadlinger sorry, what I wrote might be total nonsense: my main goal here was to point out that some things are very different on Apple M1, so it is not that surprising that an issue only occurs there and not elsewhere. |
Oh, no, you certainly have a point, it is a significant difference and possibly related. What I meant is that it isn't exactly obvious why JITLink should cause any issue either, as it should only come in after the codegen stage. Depending on how easy the issue is to trigger, it might be interesting to disable JITLink again and see if it still breaks (don't set |
After grepping llvm for the error I found https://github.com/llvm/llvm-project/blob/d1cd64ffdd832220dbe1829c2f09b880be67be31/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-musttail.ll which is suspicious. I don't know llvm enough to understand if our function would fall into that case. define nonnull {} addrspace(10)* @jlplt_ijl_new_codeinst_1127({} addrspace(10)* %0, {} addrspace(10)* %1, {} addrspace(10)* %2, {} addrspace(10)* %3, i32 %4, i64 %5, i64 %6, i8 zeroext %7, i8 zeroext %8, {} addrspace(10)* %9, i8 zeroext %10) {
top:
%11 = load atomic void ()*, void ()** @ccall_ijl_new_codeinst_1126 unordered, align 8
%12 = icmp ne void ()* %11, null
br i1 %12, label %ccall, label %dlsym
dlsym: ; preds = %top
%13 = call void ()* @ijl_load_and_lookup(i8* null, i8* getelementptr inbounds ([17 x i8], [17 x i8]* inttoptr (i64 105553146517520 to [17 x i8]*), i32 0, i32 0), i8** @jl_RTLD_DEFAULT_handle)
store atomic void ()* %13, void ()** @ccall_ijl_new_codeinst_1126 release, align 8
br label %ccall
ccall: ; preds = %dlsym, %top
%14 = phi void ()* [ %11, %top ], [ %13, %dlsym ]
%15 = bitcast void ()* %14 to {} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, i32, i64, i64, i8, i8, {} addrspace(10)*, i8)*
%16 = bitcast {} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, i32, i64, i64, i8, i8, {} addrspace(10)*, i8)* %15 to void ()*
store atomic void ()* %16, void ()** @jlplt_ijl_new_codeinst_1127_got release, align 8
%17 = musttail call nonnull {} addrspace(10)* %15({} addrspace(10)* %0, {} addrspace(10)* %1, {} addrspace(10)* %2, {} addrspace(10)* %3, i32 %4, i64 %5, i64 %6, i8 zeroext %7, i8 zeroext %8, {} addrspace(10)* %9, i8 zeroext %10)
ret {} addrspace(10)* %17
} and in the pipeline it gets here https://github.com/llvm/llvm-project/blob/c9b36807beaf120f4e06d9da3b7df7625e440825/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L6082 which is what throws the error. |
The error message has evolved somewhat from the original, here is the latest build error:
|
@staticfloat your solution by removing |
I did the same as @truedichotomy: made the change in versioninfo
Test results
|
Just curious as a non cpp person, what's the benefit of cpp tail call optimization for Julia? Is this a critical optimization to have for performance reasons? |
Wondered about that, read this which led to this. Shooting in the dark, but I surmise that it becomes an issue for recursive functions and optimization of stack memory allocation. Seems like a must for Lisp. Not sure if this is simply an optimization for Julia or if it holds potential for out-of-memory errors. |
So after a quite long session of debugging I found what causes musttail to fail, and that is https://github.com/gbaraldi/llvm-project/blob/5be8ea4bd7b0d66214c22033a7fd298544e65027/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L5702-L5707, the ccall fails that check and bugs out. Why does it only fail that in apple I have no idea. |
is building current Julia master crashing for anyone else? (i.e. commit aae68a5, on : macOS 12.2 (21D49) on Apple M1 Max, Xcode 13.2.1).
It builds fine for me on Ubuntu x86.
The text was updated successfully, but these errors were encountered: