-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2645] [Core] Allow SparkEnv.stop() to be called multiple times without side effects. #6973
Conversation
Pulling functionality from apache spark
pull latest from apache spark
Pulling functionality from apache spark
Hi @rekhajoshm , could you explain details more in the description field? |
Yeah, this might be a good change ( |
@@ -90,17 +90,25 @@ class SparkEnv ( | |||
private var driverTmpDirToDelete: Option[String] = None | |||
|
|||
private[spark] def stop() { | |||
|
|||
if(isStopped) return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to rewrite this as
if (!isStopped) {
// stop everything
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I would have gone with just
if (isStopped) {
return
}
to avoid the extra running indent but leave it.
ok to test |
Test build #35709 has finished for PR 6973 at commit
|
Hi @rekhajoshm The PR look reasonable (with the suggested changes from andrew). Could you also (a) change the title / description to be a more descriptive, eg. "Allow sc.stop() to be called multiple times" and also (b) add a test case to thanks for working on this! |
Test build #35745 has finished for PR 6973 at commit
|
Thanks for your quick inputs and review comments @sarutak @srowen @andrewor14 @squito . @srowen – Based on my quick look yesterday, sc.stop() calls sparkEnv stop() which tries to stop the already stopped and as it is not idempotent, causes a spiral effect of uncaught_exception of 50 exit code. https://issues.apache.org/jira/browse/SPARK-2645 @squito Did as suggested.Please review.Thanks |
Test build #35748 has finished for PR 6973 at commit
|
outputCommitCoordinator.stop() | ||
rpcEnv.shutdown() | ||
} catch { | ||
case NonFatal(e) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the value in adding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sean's point is that by catching the error, its no longer getting passed further up the stack. If something unexpected goes wrong here, we most probably don't want to just log it, its likely to get overlooked and maybe indicates something seriously wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but further, I don't know if it helps to log it here either. I'd remove this try
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Looks like the only thing this does is log the exception, but if we propagate it upwards it will be observed by the upper layers anyway. @rekhajoshm can you remove this try
block?
Jenkins retest this please |
Test build #35772 has finished for PR 6973 at commit
|
// assert(!e.getMessage.contains("Server is already stopped")) | ||
threwNoOrOnlyExceptedException = false | ||
case NonFatal(e) => | ||
threwNoOrOnlyExceptedException = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exception are you expecting here? I don't see any exception when running this test locally.
If you aren't expecting any exception, then you could just have the body of the try
block in the test (any exception will fail the test).
If there is some "OK" exception, we should probably check more directly for that exception, as this will allow almost any exception to get thrown.
hmm, now I am a little confused -- that test actually passes even without your changes. Sorry this may have been my fault for suggesting a test in I'm not convinced this is still needed, but I suppose it doesn't hurt. In any case, I'd leave the test you have, and also add a case where you just stop the env multiple times. |
None of the changes here are in
|
Test build #35806 has finished for PR 6973 at commit
|
thanks @vanzin for your comments.Updated the title.Other than SparkContext, SparkEnv stop can be invoked from Scheduler, or if Driver commanded a shutdown, Executor stop would invoke SparkEnv stop, hence making SparkEnv idempotent as well. On SPARK-2645, org.apache.spark.ServerStateException: Server is already stopped, from my understanding unhandled exception surfaces as uncaught_exception (exit code: 50) @squito @srowen good point; in enthusiasm to avoid unhandled exception especially when I am not expecting any or only in erratic maneuvers, but completely agree with you.updated. |
Test build #35821 has finished for PR 6973 at commit
|
// call stop second time | ||
sc.stop() | ||
} catch { | ||
case e: Exception => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try
should be removed since it doesn't really add anything, except hiding the exception. Let it fly from the method to fail the test.
thanks @srowen for your inputs.I had missed capturing exception on fail.updated(12f66b5).try/catch/fail(e) seems to be accepted practice when failing on exception.Used org.scalatest.Matchers "noException should be thrownBy" if that is preferred (1aff39c).Or please let me know if you have another preference.
|
Test build #35874 has finished for PR 6973 at commit
|
Test build #35873 has finished for PR 6973 at commit
|
Test build #35875 has finished for PR 6973 at commit
|
@rekhajoshm I think this is still in your court. The extra try blocks should be removed IMHO. |
Test build #36022 has finished for PR 6973 at commit
|
@srowen done.can you please review/approve it?thanks. |
@rekhajoshm LGTM other than a minor comment @srowen had. |
thanks @andrewor14 for your quick input.the last commit 2ce5760 was updated for the comment from @srowen kindly check.thanks |
had missed that inline comment. updated @andrewor14 @srowen thanks |
Test build #36077 has finished for PR 6973 at commit
|
@@ -45,6 +45,8 @@ import org.apache.spark.storage._ | |||
import org.apache.spark.unsafe.memory.{ExecutorMemoryManager, MemoryAllocator} | |||
import org.apache.spark.util.{RpcUtils, Utils} | |||
|
|||
import scala.util.control.NonFatal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed now?
Thanks for your inputs @srowen @andrewor14. done.also removed unused JavaConversions._ please review, thanks. |
Test build #36169 has finished for PR 6973 at commit
|
Merging into master. Thanks for the fix @rekhajoshm |
thank you @andrewor14 @srowen for the reviews and merge. |
Fix for SparkContext stop behavior - Allow sc.stop() to be called multiple times without side effects.