-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine GpuHashAggregateExec.setupReference #2917
Conversation
build |
Could you add to the description for this PR around what exactly was changed and why? Please also note this PR is in flight: #2859, and also makes changes to |
Signed-off-by: sperlingxx <[email protected]>
46559ff
to
3dcce50
Compare
Signed-off-by: sperlingxx <[email protected]>
build |
I added the description. And I rebased this branch onto latest main branch. |
Signed-off-by: sperlingxx <[email protected]>
build |
@revans2 yeap, I am looking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the added details @sperlingxx. I am mostly curious on the update expressions, as we used to use them before in the PartialMerge + Partial case https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala#L679. I think the comments need to be adjusted but it would be nice to make sure that you found that input projections is all that is needed in that case (perhaps the shapes match?)
Tagging @kuhushukla and @nartal1 as they may be more familiar with both distinct and pivot.
Had other minor nits.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala
Outdated
Show resolved
Hide resolved
// Pick merge non-distinct for PartialMerge | ||
val mergeExpressionsNonDistinct = | ||
nonDistinctAggExpressions | ||
// - PartialMerge with Partial mode: we use the inputProjections or distinct update expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments mention distinct update expressions
, but I am not finding the use of update expressions in your change, only input projections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this part along with comments, in order to verify and clarify "input projections is all that is needed in that case".
// - Final or PartialMerge-only mode: we pick the columns in the order as handed to us. | ||
// - Partial or Complete mode: we use the inputProjections or distinct update expressions. | ||
val boundInputReferences = | ||
if (modeInfo.hasPartialMerge && modeInfo.uniqueModes.contains(Partial)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (modeInfo.hasPartialMerge && modeInfo.uniqueModes.contains(Partial)) { | |
if (modeInfo.hasPartialMerge && modeInfo.hasPartialMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bonus points => hasPartialMerge
=> hasPartialMergeMode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used modeInfo.uniqueModes.contains(Partial)
instead of modeInfo.hasPartialMode
here, because hasPartialMode is set as hasPartialMerge || uniqueModes.contains(Partial)
. I adjusted AggregateModeInfo
to make its members more consistent with their names.
Signed-off-by: sperlingxx <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a couple of nits, but they are minor. Approving ahead of time. Thanks for making the tweaks.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala
Outdated
Show resolved
Hide resolved
Co-authored-by: Alessandro Bellina <[email protected]>
Co-authored-by: Alessandro Bellina <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: sperlingxx [email protected]
Current PR refined
GpuHashAggregateExec.setupReference
to supportTypedImperativeAggregate
, which is required by #2916 . The primary work of current PR is reorganizing the code ofboundInputReferences
.Originally, we workaround
PartialMerge
modes when we constructboundInputReferences
. To be specific, the code piece dealing with aggregations which contain bothPartialMerge
andPartial
mode is not generic enough. It regardsGpuPivotFirst
andGpuAverage
as exceptional cases:With the new implementation, we avoid all kinds of workaround. Instead, we fit all kinds of aggregation modes into three categories by the stage of AggregateExec in aggregation stack:
The first stage of aggregation stack, in terms of aggregation modes, it may contain aggregate expressions with
Partial
orComplete
mode. It may also contain no aggregate expressions. For the first aggregation stage, the input projections are necessary, because it consume the outputs of non-Aggregate plans.The final-like stages, including three conditions:
nonDistinctAggAttributes
(the second stage of aggregation stack with one distinct)For final-like stages, we just pass through all output attributes of child plan.
The third stage of aggregation stack with one distinct, in terms of aggregation modes, it contains both
Partial
mode (for distinctAgg) andPartialMerge
mode (for nonDistinctAgg). For this stage, we need to switch the position of distinctAttributes and nonDistinctAttributes to match the output schema of the previous stage: