SPARK-2380: Support displaying accumulator values in the web UI #1309

pwendell · 2014-07-06T11:58:53Z

This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.

Currently in Spark developers have been extending the TaskMetrics functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually TaskMetrics and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.

AmplabJenkins · 2014-07-06T12:00:59Z

Merged build triggered.

AmplabJenkins · 2014-07-06T12:01:05Z

Merged build started.

AmplabJenkins · 2014-07-06T12:02:43Z

Merged build finished.

AmplabJenkins · 2014-07-06T12:02:43Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16357/

pwendell · 2014-07-06T12:03:29Z

/cc @kayousterhout

rxin · 2014-07-06T20:40:54Z

This is pretty cool. Why "+="?

pwendell · 2014-07-06T22:40:29Z

@rxin at first I just had =, but then I thought it could be confusing. Because that is not showing the total value of the accumulator, it's just showing the local addition from that task.

rxin · 2014-07-07T02:15:33Z

How about a colon?

AmplabJenkins · 2014-07-07T09:11:02Z

Merged build triggered.

AmplabJenkins · 2014-07-07T09:11:12Z

Merged build started.

AmplabJenkins · 2014-07-07T09:12:49Z

Merged build finished.

AmplabJenkins · 2014-07-07T09:12:49Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16365/

AmplabJenkins · 2014-07-07T09:26:02Z

Merged build triggered.

AmplabJenkins · 2014-07-07T09:26:12Z

Merged build started.

AmplabJenkins · 2014-07-07T09:31:03Z

Merged build triggered.

AmplabJenkins · 2014-07-07T09:31:12Z

Merged build started.

AmplabJenkins · 2014-07-07T10:08:42Z

Merged build finished.

AmplabJenkins · 2014-07-07T10:08:43Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16366/

AmplabJenkins · 2014-07-07T10:13:51Z

Merged build finished.

AmplabJenkins · 2014-07-07T10:13:52Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16367/

pwendell · 2014-07-09T01:10:40Z

@kayousterhout - do you mind taking a look at this?

kayousterhout · 2014-07-09T06:28:16Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

+          event.accumUpdates.foreach { case (id, partialValue) =>
+            val acc = Accumulators.originals(id)
+            val name = acc.name
+            // To avoid UI cruft, ignore cases where value wasn't updated


Do you think this will make things hard to debug -- e.g., if someone's accumulator doesn't show up in the UI and they don't realize it's because the value wasn't updated as opposed to because they didn't set the show-in-ui variable correctly?

kayousterhout · 2014-08-04T17:46:11Z

core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala

@@ -42,6 +44,13 @@ class TaskInfo(
  var gettingResultTime: Long = 0

  /**
+   * Intermediate updates to accumulables during this task. Note that it is valid for the same
+   * accumulable to be updated multiple times in a single task or for two accumulables with the
+   * same name but different ID's to exist in a task.


super nit: no apostrophe in IDs

SparkQA · 2014-08-04T18:42:45Z

QA results for PR 1309:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17858/consoleFull

SparkQA · 2014-08-04T19:11:21Z

QA results for PR 1309:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17859/consoleFull

Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala

SparkQA · 2014-08-04T20:44:29Z

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17872/consoleFull

SparkQA · 2014-08-04T21:15:00Z

QA results for PR 1309:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17872/consoleFull

SparkQA · 2014-08-05T01:59:25Z

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17908/consoleFull

SparkQA · 2014-08-05T02:56:03Z

QA results for PR 1309:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17908/consoleFull

SparkQA · 2014-08-05T05:29:24Z

QA tests have started for PR 1309. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull

SparkQA · 2014-08-05T06:25:52Z

QA results for PR 1309:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull

Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

pwendell · 2014-08-05T07:22:59Z

Jenkins, retest this please.

SparkQA · 2014-08-05T07:24:22Z

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17933/consoleFull

SparkQA · 2014-08-05T07:29:26Z

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17934/consoleFull

SparkQA · 2014-08-05T08:14:53Z

QA results for PR 1309:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17933/consoleFull

SparkQA · 2014-08-05T08:30:35Z

QA results for PR 1309:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17934/consoleFull

pwendell · 2014-08-05T18:44:39Z

Okay I'll likely merge this soon unless there are other comments. @kayousterhout @mateiz feel free to chime in!

This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish. Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going. ![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png) Author: Patrick Wendell <[email protected]> Closes #1309 from pwendell/metrics and squashes the following commits: 8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD 93fbe0f [Patrick Wendell] Other minor fixes cc43f68 [Patrick Wendell] Updating unit tests c991b1b [Patrick Wendell] Moving some code into the Accumulators class 9a9ba3c [Patrick Wendell] More merge fixes c5ace9e [Patrick Wendell] More merge conflicts 1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics 9860c55 [Patrick Wendell] Potential solution to posting listener events 0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined 0ec4ac7 [Patrick Wendell] Java API's e95bf69 [Patrick Wendell] Stash be97261 [Patrick Wendell] Style fix 8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class 64d405f [Patrick Wendell] Adding missing file 5d8b156 [Patrick Wendell] Changes based on Kay's review. 9f18bad [Patrick Wendell] Minor style changes and tests 7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics. 0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.

This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish. Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going. ![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png) Author: Patrick Wendell <[email protected]> Closes apache#1309 from pwendell/metrics and squashes the following commits: 8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD 93fbe0f [Patrick Wendell] Other minor fixes cc43f68 [Patrick Wendell] Updating unit tests c991b1b [Patrick Wendell] Moving some code into the Accumulators class 9a9ba3c [Patrick Wendell] More merge fixes c5ace9e [Patrick Wendell] More merge conflicts 1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics 9860c55 [Patrick Wendell] Potential solution to posting listener events 0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined 0ec4ac7 [Patrick Wendell] Java API's e95bf69 [Patrick Wendell] Stash be97261 [Patrick Wendell] Style fix 8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class 64d405f [Patrick Wendell] Adding missing file 5d8b156 [Patrick Wendell] Changes based on Kay's review. 9f18bad [Patrick Wendell] Minor style changes and tests 7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics. 0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.

### What changes were proposed in this pull request? This PR aims to support K8s executor rolling policies via `spark.kubernetes.executor.rollPolicy`. ### Why are the changes needed? Users can choose one of the following policy in order to decommission one executor at every `spark.kubernetes.executor.rollInterval`. - ID: An executor with the smallest executor ID. This is most human-readable. - ADD_TIME: An executor with the smallest add-time. This is the most long-live executor. - TOTAL_GC_TIME: An executor with the biggest GC time. This could have GC issue for some reason. - TOTAL_DURATION: An executor with the biggest total task time. This executor may have busy neighbor or CPU thresholding by K8s controller. ### Does this PR introduce _any_ user-facing change? Yes, but this is a new feature. ### How was this patch tested? Pass the CIs.

pwendell added 2 commits July 6, 2014 20:40

Initial WIP example of supporing globally named accumulators.

0b72660

Example of using named accumulators for custom RDD metrics.

ad85076

Adding Json serialization and responding to Reynold's feedback

7a63abc

Minor style changes and tests

9f18bad

kayousterhout reviewed Jul 9, 2014
View reviewed changes

kayousterhout reviewed Aug 4, 2014
View reviewed changes

pwendell added 3 commits August 4, 2014 12:35

Merge remote-tracking branch 'apache/master' into metrics

1da15e3

Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala

More merge conflicts

c5ace9e

More merge fixes

9a9ba3c

pwendell added 2 commits August 4, 2014 16:14

Moving some code into the Accumulators class

c991b1b

Updating unit tests

cc43f68

Other minor fixes

93fbe0f

Merge remote-tracking branch 'apache/master' into HEAD

8815308

Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

asfgit closed this in 74f82c7 Aug 5, 2014

andrewor14 mentioned this pull request Sep 11, 2014

SPARK-1972: Added support for tracking custom task-related metrics #918

Closed

SPARK-2380: Support displaying accumulator values in the web UI #1309

SPARK-2380: Support displaying accumulator values in the web UI #1309

Conversation

pwendell commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

pwendell commented Jul 6, 2014

rxin commented Jul 6, 2014

pwendell commented Jul 6, 2014

rxin commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

pwendell commented Jul 9, 2014

kayousterhout Jul 9, 2014

Choose a reason for hiding this comment

kayousterhout Aug 4, 2014

Choose a reason for hiding this comment

SparkQA commented Aug 4, 2014

SparkQA commented Aug 4, 2014

SparkQA commented Aug 4, 2014

SparkQA commented Aug 4, 2014

SparkQA commented Aug 5, 2014

SparkQA commented Aug 5, 2014

SparkQA commented Aug 5, 2014

SparkQA commented Aug 5, 2014

pwendell commented Aug 5, 2014

SparkQA commented Aug 5, 2014

SparkQA commented Aug 5, 2014

SparkQA commented Aug 5, 2014

SparkQA commented Aug 5, 2014

pwendell commented Aug 5, 2014