Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code for generating dot file visualizations #2449

Merged
merged 11 commits into from
May 26, 2021

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented May 19, 2021

Signed-off-by: Andy Grove [email protected]

This PR adds the code for generating query plan visualizations.

This is the code that was originally part of the integration test module and integrated into the benchmarks. There is also a copy of this code in our internal benchmarks repo, and the plan now is to move it back into the open-source repo as part of the profiling and qualification tooling.

@andygrove andygrove requested review from nartal1 and tgravescs May 19, 2021 13:38
@andygrove andygrove marked this pull request as ready for review May 19, 2021 13:38
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this would be pretty hard to test as far as comparing to the sql ui to make sure we are getting same thing?

* @param plan First query plan and metrics
* @param comparisonPlan Optional second query plan and metrics
* @param filename Filename to write dot graph to
* @param includeCodegen Include WholeStageCodegen and InputAdapter nodes, if true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we not want these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this would be pretty hard to test as far as comparing to the sql ui to make sure we are getting same thing?

For testing, we could generate dot file from a plan that we read from an event log, and then confirm that we see some expected nodes. I think comparing to Spark UI functionality would be tricky.

Copy link
Contributor Author

@andygrove andygrove May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we not want these?

The original version excluded them, then I made it optional. I don't have a strong opinion on whether it is worth maintaining this option. It would probably have been better just to have a subgraph for the WholestageCodegen part, like Spark does. I could look at this as a follow on PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for my understanding, if this is true it would include all nodes in the wholestagecodegen, but if its false it just has 1 box saying wholestagecodegen without any details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, all of the operators are always included. If this option is true then we also show separate nodes for InputAdapter and WholestageCodegen. If false, we remove those nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this some more and I do think it would make sense to remove this option now. We want to see the duration metric for WholestageCodeGen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to remove the option

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove java doc for includeCodegen

@nartal1
Copy link
Collaborator

nartal1 commented May 19, 2021

I am fine merging this in and integrate it later once we port other dependent changes.

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

val metrics = metricNames.flatMap(name => {
val l = nodeNormalized.metrics.find(_.name == name)
val r = comparisonNodeNormalized.metrics.find(_.name == name)
if (l.isDefined && r.isDefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider alternatives:

  • pattern match
  • option.foreach or option.map
    to avoid the anti-pattern if (o.defined) o.get

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use pattern matching

}

/** Recursively graph the operator nodes in the spark plan */
def writeGraph(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider splitting writeGraph in smaller functions. It looks too long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made one small improvement here but don't want to go too far because I can't actually test any of this code in this repo until other PRs are merged in.

Signed-off-by: Andy Grove <[email protected]>
* @param plan First query plan and metrics
* @param comparisonPlan Optional second query plan and metrics
* @param filename Filename to write dot graph to
* @param includeCodegen Include WholeStageCodegen and InputAdapter nodes, if true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove java doc for includeCodegen

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, other then the copyright update

@tgravescs
Copy link
Collaborator

build

@andygrove andygrove merged commit c304962 into NVIDIA:branch-21.06 May 26, 2021
@andygrove andygrove deleted the generate-dot branch May 26, 2021 16:15
@sameerz sameerz added the task Work required that improves the product but is not user facing label May 27, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants