-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM segfault in Julia-1.6.x #41916
Comments
I have reduced the crash to the following:
It happens only in Julia 1.6.2; not in 1.6.1, nor in 1.7.0-beta3. But since 1.6 is likely going to be the LTS, getting this fixed would be kinda important to us. I'll see if I can bisect the crash. |
You can also check with #41554
Should be fairly quick since it is just the commits between 1.6.1 and 1.6.2. |
I'll do both. Indeed, it's just 32 commits. Also, @wbhart is looking into turning my reduced example into one that is even simpler, possible pure Julia. We'll see. |
You also might want to try with LLVM_ASSERTIONS=1 FORCE_ASSERTIONS=1 build, since it looks like we might be generating invalid IR, and that will enable many checks for it |
I tried
Is that the right way to do it, @vtjnash ? Next I did
Here's the output of Apple's crash reporter, where the C++ symbols are demangled:
|
Hmm, it seems setting |
I am not sure I can craft a test for this, but does this patch fix it? From just looking at the code, it seems like it is needed (@yuyichao): diff --git a/src/llvm-alloc-opt.cpp b/src/llvm-alloc-opt.cpp
index ec7060bd10..65f36a0737 100644
--- a/src/llvm-alloc-opt.cpp
+++ b/src/llvm-alloc-opt.cpp
@@ -318,10 +318,15 @@ void Optimizer::initialize()
void Optimizer::optimizeAll()
{
+ size_t last_deleted = 0;
while (!worklist.empty()) {
+ while (last_deleted < removed.size())
+ removed[last_deleted++]->removeFromParent();
auto item = worklist.pop_back_val();
auto orig = item.first;
size_t sz = item.second;
+ if (orig->getParent() == nullptr)
+ continue; // already processed (and deleted) this
checkInst(orig);
if (use_info.escaped) {
if (use_info.hastypeof)
@@ -363,6 +368,8 @@ void Optimizer::optimizeAll()
// The object has no fields with mix reference access
moveToStack(orig, sz, has_ref);
}
+ while (last_deleted < removed.size())
+ removed[last_deleted++]->removeFromParent();
}
bool Optimizer::finalize()
@@ -370,7 +377,7 @@ bool Optimizer::finalize()
if (removed.empty())
return false;
for (auto inst: removed)
- inst->eraseFromParent();
+ delete inst;
return true;
}
|
@vtjnash still getting a crash with that patch, albeit a slightly different one:
|
I don't know whether this segfault is related to our original one, but one reduction of our code that does segfault is given at: https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash3 If I pull all of those files into a single file, the segfault goes away, so unfortunately it's not as minimal an example as one would like. Of course the code is totally nonsense and invalid, but I guess that it may still expose a real bug. To get it to trigger, one does the following at the REPL (with -O2):
There is a partway minimised example that exhibits our original bug at: https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash3 I will try to reduce this further. |
I have reduced the example as far as I can. You can find it at: https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash4 Unfortunately it goes away if all the files are merged into a single file. There is one more or less nonsense piece of code which I haven't been able to remove successfully. But this now exhibits our original issue, I think. To trigger it, type the following at the REPL with the branch above dev'd:
|
Sorry, I had the wrong details in the previous post. It's corrected now. |
I reduced @wbhart's code further to this single file which exhibits the crash (perhaps one of you would like to run creduce on that?) struct A end
struct B end
struct D{T}
b
D{T}() where T = new()
end
mutable struct C
d
p::A
C(f) = new(f)
end
h(::T) where T = i::B
(::A)(b::D{T}) where T = j && C(b)
(a::A)(b) = h(a)(b)
(::B)(::T) where T = D{T}()
Base.:*(x, y::C) = y.d::D
Base.:*(x, y) = *(x, x.p(y))
C(1)*0 |
Streamlined it some more |
Interesting: the original crash only occurred in 1.6.2. But this reduced file now also triggers a crash for me in 1.6.1 and 1.6.0; but not in 1.5.3, where I get this sensible error running the code:
In 1.6.0:
|
The output of creduce_julia on @fingolfin 's version of the minimum example is the same, just with some spaces removed and symbols renamed. So that should be considered the minimum example for this bug. I would suggest we still need to do a git bisect to determine the bad commit as per @KristofferC 's request, but as this bug appears in earlier versions of Julia (e.g. 1.6.0), this is no longer going to produce meaningful results I think. If there is something else you would like us to do, please let us know. Otherwise I think we have done what we can here in terms of reporting the issue. |
I am working on bisecting this |
okay, ran this and see the issue now
|
Bisect points at 27fe288 |
We might previously accidentally visit this use after deletion, if the orig_inst ended up back in the workqueue. Fixes #41916
We might previously accidentally visit this use after deletion, if the orig_inst ended up back in the workqueue. Fixes #41916
We might previously accidentally visit this use after deletion, if the orig_inst ended up back in the workqueue. Fixes #41916
We might previously accidentally visit this use after deletion, if the orig_inst ended up back in the workqueue. Fixes #41916
We might previously accidentally visit this use after deletion, if the orig_inst ended up back in the workqueue. Fixes JuliaLang#41916
We might previously accidentally visit this use after deletion, if the orig_inst ended up back in the workqueue. Fixes JuliaLang#41916
We are seeing a crash when running the test suite of our pure Julia package AbstractAlgebra.jl at current master (c00a5bde2f1e972aa36a3eefc7c5a250155f0ce4).
The issue there only occurs on Julia-1.6.2, not 1.6.1, but see further down in the thread, where we reduce it to an example which occurs in 1.6.x.
We are seeing it across a wide variety of platforms, e.g. Ubuntu 20.04 on WSL with a Ryzen 2 laptop, for a specific example. But it occurs on our Gentoo servers, MacOS and so on. Further information about those machines can be found in our issue tracker at Nemocas/AbstractAlgebra.jl#1002
We do CI of our package against Julia nightlies and I don't recall seeing this issue there.
The line of test code in question is simply a testset line, rather than an actual test within the testset, so doesn't yield helpful information at this point.
I have tried looking for any type piracy that might be an issue on our side that could contribute to killing the compiler, but haven't been able to find any.
I am working to reduce this to a minimum example, however, AbstractAlgebra is a huge package, so that may take quite some days.
I have not yet built a debug version of Julia to see if that sheds any more light, but I can do so if that is likely to be more helpful.
The text was updated successfully, but these errors were encountered: