-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock and potential assert in JitDump #12278
Conversation
Suggesting @dsouzai as the committer. |
runtime/compiler/control/JitDump.cpp
Outdated
recompilationThreadInfo->suspendCompilationThread(); | ||
while (recompilationThreadInfo->getCompilationThreadState() != COMPTHREAD_SUSPENDED) | ||
{ | ||
//compInfo->getCompilationMonitor()->notifyAll(); | ||
//compInfo->waitOnCompMonitor(recompilationThreadInfo->getCompilationThread()); | ||
} | ||
|
||
compInfo->getPersistentInfo()->setDisableFurtherCompilation(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting to draft until this is sorted out. @dsouzai this is not really ideal. See commit message from 3a7af3184ffc2715409da693906ef5eb3e0a6ee0 as to why we want this. I would like to avoid spinning here wasting cycles waiting for the diagnostic thread to suspend. Is there some monitor or something I can wait on instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to wait? What happens if we let the current thread go and the diagnostic will thread will suspend itself in the near future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is because 213d769 made it so that a diagnostic threads should be the only ones who can process JitDump compile requests. The diagnostic threads also cannot self-suspend, i.e. this line in the same commit above:
The reason we want to avoid self-suspend for diagnostic threads is because there is a timing hole between when the diagnostic thread is resumed in JitDump.cpp and when the actual JitDump compile request is added to the queue. I've encountered scenarios where within this timing hole, the diagnostic thread self-suspends (prior to 213d769) and we end up with normal compilation threads picking up JitDump method compile requests, which is definitely something we don't want. See #11772 for details of how that occured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose an alternative is to remove this assert:
And allow diagnostic threads to handle normal compiles. Then we could get away with not waiting in the code being reviewed above. However to me this is not an ideal solution as there are "special" things about the diagnostic thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code again, even with the while
loop above we have a timing hole. The diagnostic thread returns GO_TO_SLEEP_EMPTY_QUEUE
from getNextMethodToBeCompiled
if there are no methods on the queue:
But in processEntries
we change the state of the thread to COMPTHREAD_WAITING
. This means that there is a race condition between the JitDump thread signaling the diagnostic thread to suspend and the diagnostic thread executing this code:
We can encounter this situation with two threads:
- Crashed thread signals diagnostic thread to suspend as per above code being reviewed
- Meanwhile diagnostic thread is in
processEntries
just about to callgetNextMethodToBeCompiled
getNextMethodToBeCompiled
returns next action isGO_TO_SLEEP_EMPTY_QUEUE
because queue is empty- We reach
setCompilationThreadState(COMPTHREAD_WAITING);
which overwrites the signal to suspend
This scenario would exist today even without diagnostic threads right? For example if we only have one active compile thread then it will reach this line:
which means if for example the VM singaled us to terminate the thread by changing the state to COMPTHREAD_SIGNAL_TERMINATE
then this state would get overwritten in the same way as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpirvu thoughts on the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sequence of actions I see is the following:
- crashed thread calls
compileMethod()
to queue a synchronous compilation for the diagnostic thread - Diagnostic thread compiles the method and notifies waiting threads (i.e. the crashing thread). At this point the diagnostic thread is in
processEntry()
and holds the compilation queue monitor. - Diagnostic thread returns to
processEntries()
and because the stats is still ACTIVE, executesgetNextMethodToBeCompiled()
. Since there is nothing left in the queue, it indicates an action ofGO_TO_SLEEP_EMPTY_QUEYE
and its state changes toCOMPTHREAD_WAITING
. - Diagnostic thread performs a times wait on the compilation queue monitor which releases said monitor.
- Crashed thread can finally change the state of the diagnostic thread (state can only be changed with compilation queue monitor in hand) to
SIGNAL_SUSPEND
. - When new entries are added to the queue the diagnostic thread may be notified, but since its state is no longer
COMPTHREAD_WAITING
, it does nothing, it leavesprocessEntries()
and inrun()
it will executedoSuspend()
which will result in thread suspension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea: since normal compilation threads cannot executed jitdump compilation requests and diagnostic threads cannot execute normal compilation requests, why not use a separate queue for jitdump compilation requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated my branch to the latest state with the latest fixes. I cannot any longer reproduce the assert after calling suspendCompilationThread
in JitDump.cpp without the while
loop. I've also verified that in TR::CompilationInfoPerThread::processEntries()
the compilation monitor is owned by the current thread, meaning that the scenario I described previously is not possible and the scenario outlined by @mpirvu in the second-last comment is correct.
I've updated the PR to simply call suspendCompilationThread
.
Maybe @mpirvu might have a different opinion, but I don't see why we should support the jitdump for events like |
We'll go over this in tomorrow's Vitality Talk presentation. One example of when you may want to do this is for exceptions. For example failures in lambda methods don't have a defined name that you can trace, and oftentimes tracing such methods via
This would cause a JitDump when an Other common cases are NPEs, AIOOBs, etc. |
Holding on to this lock will prevent the VM from being able to shutdown properly. This problem can easily be observed via: ``` java -Xdump:jit:events=vmstart -version ``` Signed-off-by: Filip Jeremic <[email protected]>
If the user requested a JitDump via an event then the JVM continues execution normally, however the diagnostic thread remains active. This is not desirable because the diagnostic thread is looping waiting for work and can start picking up non-JitDump compilation requests which will cause an assert. To prevent this and adhere to the contract outlined in `processEntries` we suspend the diagnostic thread once the JitDump process is complete. Signed-off-by: Filip Jeremic <[email protected]>
…sticThread Signed-off-by: Filip Jeremic <[email protected]>
If we don't do this we may encounter a crash within OMR which will result in us aborting the JVM and not completing the JitDump process. Signed-off-by: Filip Jeremic <[email protected]>
f1d26fc
to
40ae89f
Compare
Jenkins test sanity all jdk11 |
bodyInfo->getStartPCAfterPreviousCompile(), | ||
jitdumpFile | ||
); | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the code that follows below? We end up calling crashedThread->javaVM->walkStackFrames(crashedThread, &walkState);
but to what end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will walk the stack calling the callback function we registered (jitDumpStackFrameIterator
). This function will then recompile all JIT methods encountered on the stack. This can be useful if the problem originates a few frames above the crashed method, and we are able to walk the entire stack frame.
Typical example of this is we call a VM helper from inside JITted code and we crash somewhere in the VM (ex. bad argument passed). We should then be able to walk the entire stack and recompile all JIT methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging, since all tests passed |
Release OMRVMThreadName lock in JitDump
Holding on to this lock will prevent the VM from being able to shutdown
properly. This problem can easily be observed via:
Suspend diagnostic thread after JitDump
If the user requested a JitDump via an event then the JVM continues
execution normally, however the diagnostic thread remains active. This
is not desirable because the diagnostic thread is looping waiting for
work and can start picking up non-JitDump compilation requests which
will cause an assert.
To prevent this and adhere to the contract outlined in
processEntries
we suspend the diagnostic thread once the JitDump process is complete.
Fixes: #12336
Signed-off-by: Filip Jeremic [email protected]