Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optimized AQE support so that exchanges run on GPU where possible #462

Merged
merged 28 commits into from
Aug 13, 2020

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Jul 29, 2020

This PR implements optimized AQE support for Spark 3.0.1 and 3.1.0, where shuffle and broadcast exchanges stay on the GPU where supported.

@andygrove andygrove added the performance A performance related task/issue label Jul 29, 2020
@andygrove andygrove added this to the Jul 20 - Jul 31 milestone Jul 29, 2020
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove changed the title [WIP] Rebasing AQE work on SPARK-32332 and shim layer [WIP] Implement optimized AQE support so that exchanges run on GPU where possible Jul 31, 2020
@andygrove andygrove force-pushed the adaptive-query-SPARK-32332 branch from 9a3b53f to 85a709b Compare July 31, 2020 16:41
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove changed the title [WIP] Implement optimized AQE support so that exchanges run on GPU where possible Implement optimized AQE support so that exchanges run on GPU where possible Jul 31, 2020
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

This PR also closes #492

}

if (!canThisBeReplaced) {
buildSide.willNotWorkOnGpu("the BroadcastHashJoin this feeds is not on the GPU")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here.

@abellina
Copy link
Collaborator

abellina commented Aug 8, 2020

I’m not seeing anything big stand out, just a couple of nits above. Thanks @andygrove

Signed-off-by: Andy Grove <[email protected]>
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

Signed-off-by: Andy Grove <[email protected]>
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

@tgravescs @abellina I believe all issues are addressed. I have also enabled integration tests for TPC-H Q2 with AQE now that the GpuFilter fix has been merged.

@andygrove
Copy link
Contributor Author

build

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually need all the classes with ShuffleManagerShimBase as we could just have a function in Spark300Shims, but I think its fine as they are actively messing with Shuffle, so something else is bound to change there

@andygrove
Copy link
Contributor Author

build

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good pending Jenkins

@andygrove andygrove dismissed abellina’s stale review August 13, 2020 21:42

Changes were addressed.

@andygrove andygrove merged commit b86fd32 into NVIDIA:branch-0.2 Aug 13, 2020
@andygrove andygrove deleted the adaptive-query-SPARK-32332 branch August 13, 2020 21:44
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…ssible (NVIDIA#462)

* Implement optimized support for AQE

Signed-off-by: Andy Grove <[email protected]>

* revert scalastyle config change

Signed-off-by: Andy Grove <[email protected]>

* prep for review

Signed-off-by: Andy Grove <[email protected]>

* remove temp debug println

Signed-off-by: Andy Grove <[email protected]>

* enable AQE tests

Signed-off-by: Andy Grove <[email protected]>

* fix regression with 3.1.0

Signed-off-by: Andy Grove <[email protected]>

* address some formatting issues from PR review

Signed-off-by: Andy Grove <[email protected]>

* address some formatting issues from PR review

Signed-off-by: Andy Grove <[email protected]>

* resolve TODO comment and more formatting fixes

Signed-off-by: Andy Grove <[email protected]>

* remove code duplication

Signed-off-by: Andy Grove <[email protected]>

* remove blank line

Signed-off-by: Andy Grove <[email protected]>

* address some style feedback

Signed-off-by: Andy Grove <[email protected]>

* fix odd indenting in one file

Signed-off-by: Andy Grove <[email protected]>

* updated configs doc

Signed-off-by: Andy Grove <[email protected]>

* revert format changes to optimizeGpuPlanTransitions

Signed-off-by: Andy Grove <[email protected]>

* run python tpch tests with aqe on and off

Signed-off-by: Andy Grove <[email protected]>

* enable tpch query 2 test in scala and python

Signed-off-by: Andy Grove <[email protected]>

* Revert "enable tpch query 2 test in scala and python"

This reverts commit bcd9783.

Signed-off-by: Andy Grove <[email protected]>

* fix indent

Signed-off-by: Andy Grove <[email protected]>

* enable AQE testing for TPC-H query 2

Signed-off-by: Andy Grove <[email protected]>

* fix error in python test

Signed-off-by: Andy Grove <[email protected]>

* rename two source files and remove a blank linke
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…ssible (NVIDIA#462)

* Implement optimized support for AQE

Signed-off-by: Andy Grove <[email protected]>

* revert scalastyle config change

Signed-off-by: Andy Grove <[email protected]>

* prep for review

Signed-off-by: Andy Grove <[email protected]>

* remove temp debug println

Signed-off-by: Andy Grove <[email protected]>

* enable AQE tests

Signed-off-by: Andy Grove <[email protected]>

* fix regression with 3.1.0

Signed-off-by: Andy Grove <[email protected]>

* address some formatting issues from PR review

Signed-off-by: Andy Grove <[email protected]>

* address some formatting issues from PR review

Signed-off-by: Andy Grove <[email protected]>

* resolve TODO comment and more formatting fixes

Signed-off-by: Andy Grove <[email protected]>

* remove code duplication

Signed-off-by: Andy Grove <[email protected]>

* remove blank line

Signed-off-by: Andy Grove <[email protected]>

* address some style feedback

Signed-off-by: Andy Grove <[email protected]>

* fix odd indenting in one file

Signed-off-by: Andy Grove <[email protected]>

* updated configs doc

Signed-off-by: Andy Grove <[email protected]>

* revert format changes to optimizeGpuPlanTransitions

Signed-off-by: Andy Grove <[email protected]>

* run python tpch tests with aqe on and off

Signed-off-by: Andy Grove <[email protected]>

* enable tpch query 2 test in scala and python

Signed-off-by: Andy Grove <[email protected]>

* Revert "enable tpch query 2 test in scala and python"

This reverts commit bcd9783.

Signed-off-by: Andy Grove <[email protected]>

* fix indent

Signed-off-by: Andy Grove <[email protected]>

* enable AQE testing for TPC-H query 2

Signed-off-by: Andy Grove <[email protected]>

* fix error in python test

Signed-off-by: Andy Grove <[email protected]>

* rename two source files and remove a blank linke
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Signed-off-by: spark-rapids automation <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
3 participants