-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-6061. Add a customized uncaughtexceptionhandler for critical threads #182
Conversation
|
||
@Override | ||
public void uncaughtException(Thread t, Throwable e) { | ||
LOG.fatal("Critical thread " + t + " crashed!", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log t.getName() instead of t itself?
new Exception(e))); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line at the end of the file?
new Thread() { | ||
@Override | ||
public void run() { | ||
rmContext.getResourceManager().handleTransitionToStandBy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we can call this synchronously? Would it be more appropriate to do the transition to standby in a different thread, like in RMStateStore? If yes, may be we could update RMStateStore to use this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I see this is being done in a new thread.
That said, may be, we could consider abstracting this out so this handler and the RMStateStore just call that abstraction?
import static org.mockito.Mockito.verify; | ||
|
||
/** | ||
* This class is to test {@link RMCriticalThreadUncaughtExceptionHandler}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this javadoc is required. This is a convention we generally follow.
ExitUtil.disableSystemExit(); | ||
|
||
// Create a MockRM and start it | ||
ResourceManager resourceManager = new MockRM(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are anyways creating the MockRM, why not start it and have some thread throw an uncaught exception and verify the RM behavior in two cases (two tests?) - exit and transition to standby?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add one unit test to check if RM transition to standby.
* Transition to standby in a new thread. | ||
*/ | ||
public void handleTransitionToStandByInNewThread() { | ||
new Thread() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using an anonymous class, can we define this as a separate Thread and name it for easier debugging?
* @throws InterruptedException if any | ||
*/ | ||
@Test | ||
public void testUncaughtExceptionHandlerWithHAEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests!
} catch (Exception e) { | ||
LOG.fatal("Failed to transition RM to Standby mode.", e); | ||
ExitUtil.terminate(1, e); | ||
Thread standByTransitionThread = new Thread(new StandByTransitionThread()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not identifying this earlier. We should make this thread-safe in case this is triggered by two critical threads failing at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would it make sense to create an instance of the Runnable on transition to active, and start a new thread on a need-to basis. If all threads use a single instance of the Runnable, may be it is easier to coordinate?
standByTransitionThread.start(); | ||
} | ||
|
||
private class StandByTransitionThread implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming the Runnable a Thread sounds confusing. Can we change it to TransitionToStandbyRunnable or some such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch looks good. Minor nits on documentation before we do the final push.
import org.apache.hadoop.yarn.conf.HAUtil; | ||
|
||
/** | ||
* This class either shutdowns {@link ResourceManager} or makes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- s/shutdowns/shuts down
- s/makes RM transition/ transitions the RM
- s/if any uncaught exception.../if a critical thread throws an uncaught exception.
standByTransitionThread.start(); | ||
} | ||
|
||
private class StandByTransitionRunnable implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us add javadoc for this class, and include details on how we use the same runnable.
|
||
@Override | ||
public void run() { | ||
// Prevent from running again if it has run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more detail here: "Run this only once, even if multiple threads end up triggering this simultaneously."
} | ||
|
||
private class StandByTransitionRunnable implements Runnable { | ||
private AtomicBoolean hasRun = new AtomicBoolean(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, rename this to hasAlreadyRun? And, again add some javadoc here too?
* {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)} | ||
* in the thread entry point or after creation of threads. | ||
*/ | ||
@Public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch looks good to go, but for the one nit on Audience and Stability annotations.
Committed |
…erge as input Also updated documentation for join and partitionBy. Author: Prateek Maheshwari <[email protected]> Reviewers: Jacob Maes <[email protected]> Closes apache#182 from prateekm/documentation-cleanup
No description provided.