Add code for generating dot file visualizations #2449

andygrove · 2021-05-19T13:38:02Z

Signed-off-by: Andy Grove [email protected]

This PR adds the code for generating query plan visualizations.

This is the code that was originally part of the integration test module and integrated into the benchmarks. There is also a copy of this code in our internal benchmarks repo, and the plan now is to move it back into the open-source repo as part of the profiling and qualification tooling.

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

tgravescs

I assume this would be pretty hard to test as far as comparing to the sql ui to make sure we are getting same thing?

tgravescs · 2021-05-19T14:07:19Z

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

+   * @param plan First query plan and metrics
+   * @param comparisonPlan Optional second query plan and metrics
+   * @param filename Filename to write dot graph to
+   * @param includeCodegen Include WholeStageCodegen and InputAdapter nodes, if true


why would we not want these?

I assume this would be pretty hard to test as far as comparing to the sql ui to make sure we are getting same thing?

For testing, we could generate dot file from a plan that we read from an event log, and then confirm that we see some expected nodes. I think comparing to Spark UI functionality would be tricky.

why would we not want these?

The original version excluded them, then I made it optional. I don't have a strong opinion on whether it is worth maintaining this option. It would probably have been better just to have a subgraph for the WholestageCodegen part, like Spark does. I could look at this as a follow on PR.

so for my understanding, if this is true it would include all nodes in the wholestagecodegen, but if its false it just has 1 box saying wholestagecodegen without any details?

No, all of the operators are always included. If this option is true then we also show separate nodes for InputAdapter and WholestageCodegen. If false, we remove those nodes.

I looked at this some more and I do think it would make sense to remove this option now. We want to see the duration metric for WholestageCodeGen.

I pushed a commit to remove the option

need to remove java doc for includeCodegen

nartal1 · 2021-05-19T15:56:00Z

I am fine merging this in and integrate it later once we port other dependent changes.

gerashegalov

minor comments

gerashegalov · 2021-05-20T02:36:27Z

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

+        val metrics = metricNames.flatMap(name => {
+          val l = nodeNormalized.metrics.find(_.name == name)
+          val r = comparisonNodeNormalized.metrics.find(_.name == name)
+          if (l.isDefined && r.isDefined) {


consider alternatives:

pattern match

option.foreach or option.map
to avoid the anti-pattern if (o.defined) o.get

Updated to use pattern matching

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

gerashegalov · 2021-05-20T02:43:33Z

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

+    }
+
+    /** Recursively graph the operator nodes in the spark plan */
+    def writeGraph(


consider splitting writeGraph in smaller functions. It looks too long

I made one small improvement here but don't want to go too far because I can't actually test any of this code in this repo until other PRs are merged in.

Signed-off-by: Andy Grove <[email protected]>

tgravescs · 2021-05-25T17:49:47Z

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala

+   * @param plan First query plan and metrics
+   * @param comparisonPlan Optional second query plan and metrics
+   * @param filename Filename to write dot graph to
+   * @param includeCodegen Include WholeStageCodegen and InputAdapter nodes, if true


need to remove java doc for includeCodegen

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala

...s-4-spark-tools/src/test/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDotSuite.scala

Signed-off-by: Andy Grove <[email protected]>

rapids-4-spark-tools/src/test/resources/log4j.properties

tgravescs

looks good, other then the copyright update

tgravescs · 2021-05-26T14:06:07Z

build

andygrove requested review from nartal1 and tgravescs May 19, 2021 13:38

andygrove marked this pull request as ready for review May 19, 2021 13:38

tgravescs reviewed May 19, 2021

View reviewed changes

rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala Show resolved Hide resolved

tgravescs reviewed May 19, 2021

View reviewed changes

gerashegalov reviewed May 20, 2021

View reviewed changes

andygrove added 5 commits May 25, 2021 10:21

Add code for generating dot file visualizations

a599db8

Signed-off-by: Andy Grove <[email protected]>

Fix formatting

b5a74f7

Signed-off-by: Andy Grove <[email protected]>

Add copyright header

cde4286

Signed-off-by: Andy Grove <[email protected]>

Remove includeCodeGen option and address other PR feedback

5dee53e

Signed-off-by: Andy Grove <[email protected]>

Integrate GenerateDot and add integration test

cd675b5

Signed-off-by: Andy Grove <[email protected]>

andygrove force-pushed the generate-dot branch from 5fe4ea2 to cd675b5 Compare May 25, 2021 17:35

Fix mainClass

4f308f3

Signed-off-by: Andy Grove <[email protected]>

andygrove requested review from GaryShen2008, jlowe, NvTimLiu and revans2 as code owners May 25, 2021 17:46

tgravescs reviewed May 25, 2021

View reviewed changes

andygrove added 3 commits May 25, 2021 12:09

fix scalastyle issues

9558ff7

mvn verify passes

2b7626a

Signed-off-by: Andy Grove <[email protected]>

Remove docs for includeCodegen option

df74d04

nartal1 reviewed May 26, 2021

View reviewed changes

rapids-4-spark-tools/src/test/resources/log4j.properties Outdated Show resolved Hide resolved

tgravescs reviewed May 26, 2021

View reviewed changes

andygrove added 2 commits May 26, 2021 08:03

Update copyright year

3d9ed6c

Update copyright year

fa5acf8

tgravescs approved these changes May 26, 2021

View reviewed changes

nartal1 approved these changes May 26, 2021

View reviewed changes

andygrove merged commit c304962 into NVIDIA:branch-21.06 May 26, 2021

andygrove deleted the generate-dot branch May 26, 2021 16:15

sameerz added the task Work required that improves the product but is not user facing label May 27, 2021

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Add code for generating dot file visualizations (NVIDIA#2449)

ed10c45

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Add code for generating dot file visualizations (NVIDIA#2449)

b197333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for generating dot file visualizations #2449

Add code for generating dot file visualizations #2449

andygrove commented May 19, 2021 •

edited

Loading

tgravescs left a comment

tgravescs May 19, 2021

andygrove May 19, 2021

andygrove May 19, 2021 •

edited

Loading

tgravescs May 20, 2021

andygrove May 20, 2021

andygrove May 20, 2021

andygrove May 20, 2021

tgravescs May 25, 2021

nartal1 commented May 19, 2021

gerashegalov left a comment

gerashegalov May 20, 2021

andygrove May 20, 2021

gerashegalov May 20, 2021

andygrove May 20, 2021

tgravescs May 25, 2021

tgravescs left a comment

tgravescs commented May 26, 2021

Add code for generating dot file visualizations #2449

Add code for generating dot file visualizations #2449

Conversation

andygrove commented May 19, 2021 • edited Loading

tgravescs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andygrove May 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nartal1 commented May 19, 2021

gerashegalov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs left a comment

Choose a reason for hiding this comment

tgravescs commented May 26, 2021

andygrove commented May 19, 2021 •

edited

Loading

andygrove May 19, 2021 •

edited

Loading