-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add code for generating dot file visualizations #2449
Conversation
rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this would be pretty hard to test as far as comparing to the sql ui to make sure we are getting same thing?
* @param plan First query plan and metrics | ||
* @param comparisonPlan Optional second query plan and metrics | ||
* @param filename Filename to write dot graph to | ||
* @param includeCodegen Include WholeStageCodegen and InputAdapter nodes, if true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we not want these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this would be pretty hard to test as far as comparing to the sql ui to make sure we are getting same thing?
For testing, we could generate dot file from a plan that we read from an event log, and then confirm that we see some expected nodes. I think comparing to Spark UI functionality would be tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we not want these?
The original version excluded them, then I made it optional. I don't have a strong opinion on whether it is worth maintaining this option. It would probably have been better just to have a subgraph for the WholestageCodegen part, like Spark does. I could look at this as a follow on PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so for my understanding, if this is true it would include all nodes in the wholestagecodegen, but if its false it just has 1 box saying wholestagecodegen without any details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, all of the operators are always included. If this option is true then we also show separate nodes for InputAdapter and WholestageCodegen. If false, we remove those nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this some more and I do think it would make sense to remove this option now. We want to see the duration metric for WholestageCodeGen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to remove the option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove java doc for includeCodegen
I am fine merging this in and integrate it later once we port other dependent changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
val metrics = metricNames.flatMap(name => { | ||
val l = nodeNormalized.metrics.find(_.name == name) | ||
val r = comparisonNodeNormalized.metrics.find(_.name == name) | ||
if (l.isDefined && r.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider alternatives:
- pattern match
- option.foreach or option.map
to avoid the anti-pattern if (o.defined) o.get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use pattern matching
rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDot.scala
Show resolved
Hide resolved
} | ||
|
||
/** Recursively graph the operator nodes in the spark plan */ | ||
def writeGraph( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider splitting writeGraph in smaller functions. It looks too long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made one small improvement here but don't want to go too far because I can't actually test any of this code in this repo until other PRs are merged in.
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
* @param plan First query plan and metrics | ||
* @param comparisonPlan Optional second query plan and metrics | ||
* @param filename Filename to write dot graph to | ||
* @param includeCodegen Include WholeStageCodegen and InputAdapter nodes, if true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove java doc for includeCodegen
rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala
Show resolved
Hide resolved
rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala
Show resolved
Hide resolved
rapids-4-spark-tools/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala
Show resolved
Hide resolved
...s-4-spark-tools/src/test/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDotSuite.scala
Show resolved
Hide resolved
Signed-off-by: Andy Grove <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, other then the copyright update
build |
Signed-off-by: Andy Grove [email protected]
This PR adds the code for generating query plan visualizations.
This is the code that was originally part of the integration test module and integrated into the benchmarks. There is also a copy of this code in our internal benchmarks repo, and the plan now is to move it back into the open-source repo as part of the profiling and qualification tooling.