-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang when loading GAP.jl with Julia master #901
Comments
That is probably the same issue that we are seeing with Oscar: oscar-system/Oscar.jl#2476 |
@zickgraf just to clarify: are you saying that Julia commit started it (i.e.: the commit before works)? Or are you just giving this as a bound ("since at least that commit things are not working")? |
I bisected the problem to this commit, i.e. the commit before works (with a small amount of uncertainty due to the randomness involved when trying to reproduce the hang). It's also the same commit which @benlorenz suspected in oscar-system/Oscar.jl#2476 (comment). |
@benlorenz I am sceptical about this. Which That said, of course I can easily have missed something. It would still be good to know what. That said, the commit @zickgraf pointed out make some major changes to how some things in the GC work, including how |
Ah, now I better understand the question above. So just to reaffirm: Yes, I have bisected the problem and I am 99.9% sure that JuliaLang/julia@03c4bc1 is the first bad commit :-) |
Thanks for clarifying, @zickgraf. In that case I am afraid a mere recompile won't solve the problem, so it certainly won't hurt to try it. |
I should also point out that there have been lingering GC issues related to GAP.jl for some time, as analyzed by @benlorenz , see oscar-system/Oscar.jl#2336 |
I was referring to the same commit that zickgraf mentioned, this changed the members of In valgrind I see these before the hang:
Unfortunately without any backtraces. There are also a few |
@benlorenz OK, let's give it a try then :-) |
Unfortunately the new binaries did not seem to help, still seeing hangs during the GAP.jl tests (needs a few retries until it happens). Will have another look at the debugger / logs. |
With the help of
We are in a GC event which then triggered the GapRootScanner which wants to mark gap objects somewhere on the stack? (Not really sure about the details ...)
I don't really understand where this metadata comes from, it looks a bit weird like this, what code is supposed to generate this? But I think it didn't cause problems earlier due to what I describe below: The code was changed only very slightly here: I would guess that previously the control flow skipped the
The handling of this exceptions then triggers another signal:
And the handler for this SEGV wants to wait for a safepoint:
which seems to trigger the hang (since we are already inside some GC?). To confirm my suspicion I did a small change:
This adds another check whether the metadata table seems valid, since I think Not really sure whether this would be a reasonable change for julia, I don't understand enough about the details to make a PR for this. I think it would make sense to dig into these metadata tables a bit? PS: I can provide the rr trace if needed. |
@benlorenz fantastic work, thank you! |
The code in question triea to determine if |
This should be fixed now, thanks Max! |
I can confirm that everything is working as expected again. Thanks @benlorenz and @fingolfin for the investigation and the fix! |
All credit really should go to @benlorenz ! glad to hear it works now |
Since JuliaLang/julia@03c4bc1, loading GAP.jl sometimes hangs.
Steps to reproduce:
while true; do ./julia -e "println(\"start\"); using GAP; println(\"end\");"; done
in a shell.Observe that after some time this hangs after "start". Sometimes it hangs on the first try, sometimes it hangs after 50 tries (or so). When reproducing in a REPL, the hang occurs after "Loading the library" is printed but before "and packages ..." is printed.
The text was updated successfully, but these errors were encountered: