Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-2380: Support displaying accumulator values in the web UI #1309

Closed
wants to merge 19 commits into from

Conversation

pwendell
Copy link
Contributor

@pwendell pwendell commented Jul 6, 2014

This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.

Currently in Spark developers have been extending the TaskMetrics functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually TaskMetrics and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.

counters

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16357/

@pwendell
Copy link
Contributor Author

pwendell commented Jul 6, 2014

/cc @kayousterhout

@rxin
Copy link
Contributor

rxin commented Jul 6, 2014

This is pretty cool. Why "+="?

@pwendell
Copy link
Contributor Author

pwendell commented Jul 6, 2014

@rxin at first I just had =, but then I thought it could be confusing. Because that is not showing the total value of the accumulator, it's just showing the local addition from that task.

@rxin
Copy link
Contributor

rxin commented Jul 7, 2014

How about a colon?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16365/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16366/

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16367/

@pwendell
Copy link
Contributor Author

pwendell commented Jul 9, 2014

@kayousterhout - do you mind taking a look at this?

event.accumUpdates.foreach { case (id, partialValue) =>
val acc = Accumulators.originals(id)
val name = acc.name
// To avoid UI cruft, ignore cases where value wasn't updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this will make things hard to debug -- e.g., if someone's accumulator doesn't show up in the UI and they don't realize it's because the value wasn't updated as opposed to because they didn't set the show-in-ui variable correctly?

@@ -42,6 +44,13 @@ class TaskInfo(
var gettingResultTime: Long = 0

/**
* Intermediate updates to accumulables during this task. Note that it is valid for the same
* accumulable to be updated multiple times in a single task or for two accumulables with the
* same name but different ID's to exist in a task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: no apostrophe in IDs

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1309:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17858/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1309:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17859/consoleFull

Conflicts:
	core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala
	core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17872/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1309:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17872/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17908/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1309:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17908/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1309. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1309:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull

Conflicts:
	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@pwendell
Copy link
Contributor Author

pwendell commented Aug 5, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17933/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1309. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17934/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1309:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17933/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1309:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Accumulator[T](@transient initialValue: T, param: AccumulatorParam[T], name: Option[String])
class AccumulableInfo (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17934/consoleFull

@pwendell
Copy link
Contributor Author

pwendell commented Aug 5, 2014

Okay I'll likely merge this soon unless there are other comments. @kayousterhout @mateiz feel free to chime in!

@asfgit asfgit closed this in 74f82c7 Aug 5, 2014
asfgit pushed a commit that referenced this pull request Aug 5, 2014
This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.

Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.

![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png)

Author: Patrick Wendell <[email protected]>

Closes #1309 from pwendell/metrics and squashes the following commits:

8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD
93fbe0f [Patrick Wendell] Other minor fixes
cc43f68 [Patrick Wendell] Updating unit tests
c991b1b [Patrick Wendell] Moving some code into the Accumulators class
9a9ba3c [Patrick Wendell] More merge fixes
c5ace9e [Patrick Wendell] More merge conflicts
1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics
9860c55 [Patrick Wendell] Potential solution to posting listener events
0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined
0ec4ac7 [Patrick Wendell] Java API's
e95bf69 [Patrick Wendell] Stash
be97261 [Patrick Wendell] Style fix
8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class
64d405f [Patrick Wendell] Adding missing file
5d8b156 [Patrick Wendell] Changes based on Kay's review.
9f18bad [Patrick Wendell] Minor style changes and tests
7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback
ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics.
0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.

Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.

![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png)

Author: Patrick Wendell <[email protected]>

Closes apache#1309 from pwendell/metrics and squashes the following commits:

8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD
93fbe0f [Patrick Wendell] Other minor fixes
cc43f68 [Patrick Wendell] Updating unit tests
c991b1b [Patrick Wendell] Moving some code into the Accumulators class
9a9ba3c [Patrick Wendell] More merge fixes
c5ace9e [Patrick Wendell] More merge conflicts
1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics
9860c55 [Patrick Wendell] Potential solution to posting listener events
0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined
0ec4ac7 [Patrick Wendell] Java API's
e95bf69 [Patrick Wendell] Stash
be97261 [Patrick Wendell] Style fix
8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class
64d405f [Patrick Wendell] Adding missing file
5d8b156 [Patrick Wendell] Changes based on Kay's review.
9f18bad [Patrick Wendell] Minor style changes and tests
7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback
ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics.
0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
### What changes were proposed in this pull request?

This PR aims to support K8s executor rolling policies via `spark.kubernetes.executor.rollPolicy`.

### Why are the changes needed?

Users can choose one of the following policy in order to decommission one executor at every `spark.kubernetes.executor.rollInterval`.
- ID: An executor with the smallest executor ID. This is most human-readable.
- ADD_TIME: An executor with the smallest add-time. This is the most long-live executor.
- TOTAL_GC_TIME: An executor with the biggest GC time. This could have GC issue for some reason.
- TOTAL_DURATION: An executor with the biggest total task time. This executor may have busy neighbor or CPU thresholding by K8s controller.

### Does this PR introduce _any_ user-facing change?

Yes, but this is a new feature.

### How was this patch tested?

Pass the CIs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants