-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28709][DSTREAMS] Fix StreamingContext leak through Streaming #25439
[SPARK-28709][DSTREAMS] Fix StreamingContext leak through Streaming #25439
Conversation
ok to test |
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
Outdated
Show resolved
Hide resolved
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
Outdated
Show resolved
Hide resolved
@@ -575,6 +577,8 @@ class StreamingContext private[streaming] ( | |||
try { | |||
validate() | |||
|
|||
registerProgressListener() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you are suggesting that the unregister is important, but do we need this? This seems to be a behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I believeStreamingTab
shouldn't be responsible for registering/unregistering the listener as it could be and even already used in other place (metrics). Moreover seems there is also a bug that if ui is disabled, listener isn't registered and metrics aren't reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tend to agree. The behavior change is registering listeners when when the UI is disabled, but that seems like possibly a fix (we'd have to look deeper to figure out whether something else registers it in this case, like StreamingSource
). This seems like a better place to manage it, certainly unregistering, and if it's unregistered here it should be registered here.
@@ -52,8 +52,6 @@ class InputStreamsSuite extends TestSuiteBase with BeforeAndAfter { | |||
|
|||
// Set up the streaming context and input streams | |||
withStreamingContext(new StreamingContext(conf, batchDuration)) { ssc => | |||
ssc.addStreamingListener(ssc.progressListener) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove the behavior change here( https://github.com/apache/spark/pull/25439/files#r313624949 ), it seems that we don't need to change this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Test build #109063 has finished for PR 25439 at commit
|
75935cd
to
fbfabb0
Compare
…JobProgressListener on stop
fbfabb0
to
4d5965e
Compare
Test build #109088 has finished for PR 25439 at commit
|
@dongjoon-hyun thanks for the so fast review. |
Retest this please. |
Test build #109232 has finished for PR 25439 at commit
|
Hi @zsxwing, could you please take a look ? |
@@ -575,6 +577,8 @@ class StreamingContext private[streaming] ( | |||
try { | |||
validate() | |||
|
|||
registerProgressListener() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tend to agree. The behavior change is registering listeners when when the UI is disabled, but that seems like possibly a fix (we'd have to look deeper to figure out whether something else registers it in this case, like StreamingSource
). This seems like a better place to manage it, certainly unregistering, and if it's unregistered here it should be registered here.
streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingTab.scala
Show resolved
Hide resolved
@@ -93,6 +93,7 @@ private[spark] abstract class WebUI( | |||
attachHandler(renderJsonHandler) | |||
val handlers = pageToHandlers.getOrElseUpdate(page, ArrayBuffer[ServletContextHandler]()) | |||
handlers += renderHandler | |||
handlers += renderJsonHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new change related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, handlers store reference to streaming and batch pages which store reference to StreamingJobProgressListener
which stores reference to StreamingContext
. So all the handlers should be unregistered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so that's kind of an additional bug fix, that this never recorded the handler in the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, seems that it wasn't an issue in past cause other tabs can't be detached in runtime.
Retest this please. |
Test build #109670 has finished for PR 25439 at commit
|
Merged to master |
Yes I think that's OK. We're in a 'code freeze' for the 2.4.4 release at the moment, so I hesitate to merge anything but critical fixes until it's finalized. But it could go in for 2.4.5. |
OK, thanks. |
Thank you all. Yes. This is a good backport candidate for |
@choojoyq @srowen @dongjoon-hyun 2.4.4 was released, do you plan to merge this pr into branch-2.4? |
In my application spark streaming is restarted programmatically by stopping StreamingContext without stopping of SparkContext and creating/starting a new one. I use it for automatic detection of Kafka topic/partition changes and automatic failover in case of non fatal exceptions. However i notice that after multiple restarts driver fails with OOM. During investigation of heap dump i figured out that StreamingContext object isn't cleared by GC after stopping. <img width="1901" alt="Screen Shot 2019-08-14 at 12 23 33" src="https://user-images.githubusercontent.com/13151161/63010149-83f4c200-be8e-11e9-9f48-12b6e97839f4.png"> There are several places which holds reference to it : 1. StreamingTab registers StreamingJobProgressListener which holds reference to Streaming Context directly to LiveListenerBus shared queue via ssc.sc.addSparkListener(listener) method invocation. However this listener isn't unregistered at stop method. 2. json handlers (/streaming/json and /streaming/batch/json) aren't unregistered in SparkUI, while they hold reference to StreamingJobProgressListener. Basically the same issue affects all the pages, i assume that renderJsonHandler should be added to pageToHandlers cache on attachPage method invocation in order to unregistered it as well on detachPage. 3. SparkUi holds reference to StreamingJobProgressListener in the corresponding local variable which isn't cleared after stopping of StreamingContext. Added tests to existing test suites. After i applied these changes via reflection in my app OOM on driver side gone. Closes #25439 from choojoyq/SPARK-28709-fix-streaming-context-leak-on-stop. Authored-by: Nikita Gorbachevsky <[email protected]> Signed-off-by: Sean Owen <[email protected]>
Also backported to 2.4. I resolved a minor merge conflict carefully and think I got it right. |
Thank @srowen |
… avoid CCE ### What changes were proposed in this pull request? [SPARK-27122](#24088) fixes `ClassCastException` at `yarn` module by introducing `DelegatingServletContextHandler`. Initially, this was discovered with JDK9+, but the class path issues affected JDK8 environment, too. After [SPARK-28709](#25439), I also hit the similar issue at `streaming` module. This PR aims to fix `streaming` module by adding `getContextPath` to `DelegatingServletContextHandler` and using it. ### Why are the changes needed? Currently, when we test `streaming` module independently, it fails like the following. ``` $ build/mvn test -pl streaming ... UISeleniumSuite: - attaching and detaching a Streaming tab *** FAILED *** java.lang.ClassCastException: org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to org.eclipse.jetty.servlet.ServletContextHandler ... Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0 *** 1 TEST FAILED *** [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the modified tests. And do the following manually. Since you can observe this when you run `streaming` module test only (instead of running all), you need to install the changed `core` module and use it. ``` $ java -version openjdk version "1.8.0_222" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode) $ build/mvn install -DskipTests $ build/mvn test -pl streaming ``` Closes #25791 from dongjoon-hyun/SPARK-29087. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
… avoid CCE ### What changes were proposed in this pull request? [SPARK-27122](#24088) fixes `ClassCastException` at `yarn` module by introducing `DelegatingServletContextHandler`. Initially, this was discovered with JDK9+, but the class path issues affected JDK8 environment, too. After [SPARK-28709](#25439), I also hit the similar issue at `streaming` module. This PR aims to fix `streaming` module by adding `getContextPath` to `DelegatingServletContextHandler` and using it. ### Why are the changes needed? Currently, when we test `streaming` module independently, it fails like the following. ``` $ build/mvn test -pl streaming ... UISeleniumSuite: - attaching and detaching a Streaming tab *** FAILED *** java.lang.ClassCastException: org.sparkproject.jetty.servlet.ServletContextHandler cannot be cast to org.eclipse.jetty.servlet.ServletContextHandler ... Tests: succeeded 337, failed 1, canceled 0, ignored 1, pending 0 *** 1 TEST FAILED *** [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the modified tests. And do the following manually. Since you can observe this when you run `streaming` module test only (instead of running all), you need to install the changed `core` module and use it. ``` $ java -version openjdk version "1.8.0_222" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode) $ build/mvn install -DskipTests $ build/mvn test -pl streaming ``` Closes #25791 from dongjoon-hyun/SPARK-29087. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 729b318) Signed-off-by: Dongjoon Hyun <[email protected]>
In my application spark streaming is restarted programmatically by stopping StreamingContext without stopping of SparkContext and creating/starting a new one. I use it for automatic detection of Kafka topic/partition changes and automatic failover in case of non fatal exceptions. However i notice that after multiple restarts driver fails with OOM. During investigation of heap dump i figured out that StreamingContext object isn't cleared by GC after stopping. <img width="1901" alt="Screen Shot 2019-08-14 at 12 23 33" src="https://user-images.githubusercontent.com/13151161/63010149-83f4c200-be8e-11e9-9f48-12b6e97839f4.png"> There are several places which holds reference to it : 1. StreamingTab registers StreamingJobProgressListener which holds reference to Streaming Context directly to LiveListenerBus shared queue via ssc.sc.addSparkListener(listener) method invocation. However this listener isn't unregistered at stop method. 2. json handlers (/streaming/json and /streaming/batch/json) aren't unregistered in SparkUI, while they hold reference to StreamingJobProgressListener. Basically the same issue affects all the pages, i assume that renderJsonHandler should be added to pageToHandlers cache on attachPage method invocation in order to unregistered it as well on detachPage. 3. SparkUi holds reference to StreamingJobProgressListener in the corresponding local variable which isn't cleared after stopping of StreamingContext. Added tests to existing test suites. After i applied these changes via reflection in my app OOM on driver side gone. Closes apache#25439 from choojoyq/SPARK-28709-fix-streaming-context-leak-on-stop. Authored-by: Nikita Gorbachevsky <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
In my application spark streaming is restarted programmatically by stopping StreamingContext without stopping of SparkContext and creating/starting a new one. I use it for automatic detection of Kafka topic/partition changes and automatic failover in case of non fatal exceptions.
However i notice that after multiple restarts driver fails with OOM. During investigation of heap dump i figured out that StreamingContext object isn't cleared by GC after stopping.
There are several places which holds reference to it :
How was this patch tested?
Added tests to existing test suites.
After i applied these changes via reflection in my app OOM on driver side gone.