-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dot graph to include stages and remove some duplication #2761
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
1 similar comment
build |
Something odd happened with the previous CI job. It skipped everything and I don't know why. |
@andygrove can you take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested locally with some TPC-DS queries and it worked well. It's great having the stage metrics on the diagram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks fine to me, can we add more to the GenerateDotSuite to make sure the job/stage info is present?
@tgravescs I updated the test and I also added in some more docs to the README to explain how the stages work. |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, can you upmerge to fix the conflict
build |
This uses a modified version of the code that Apache Spark uses to generate its SQL visualization. I updated it to match the style that we currently are using.
This fixes #2711
This fixes #2712
It does not add in job information it could do it, but it is not that critical.
It is not perfect with how it dedupes stages/etc but it is as good as Apache Spark is, because it reuses the code.
This rips out the compare code, that was not being used, and I plan to work on getting a stage to stage comparison setup next so we can more easily see which stages, in large queries both between the CPU and the GPU, and equivalent. That way I can more quickly see what is showing speedups or not.