[FEA] Optimize unnecessary sorts when replacing SortAggregate #601

jlowe · 2020-08-21T14:55:59Z

Is your feature request related to a problem? Please describe.
I noticed a query that had a GPU plan with this sequence of nodes somewhere in it:
GpuSort --> GpuHashAggregate --> GpuColumnarExchange --> GpuSort --> GpuHashAggregate

GpuHashAggregate makes no assumptions on the order of its input, so the two sorts aren't accomplishing anything useful. It looks like the original CPU plan used a sort-based aggregate and the plugin blindly replaced the sort with GpuSort and SortAggregate with HashAggregate. If the plugin is going to use HashAggregate to replace SortAggregate then it should remove the unnecessary sorts.

Describe the solution you'd like
The plugin removes unnecessary sorts in the plan when replacing a sort aggregate with a hash aggregate. This is similar to the behavior the plugin already performs when replacing a SortMergeJoin with a GpuShuffledHashJoin.

Note that a sort may be required after the aggregate if subsequent nodes in the plan need the output ordering guaranteed by SortAggregate. We should be able to leverage the same "insert sorts when needed" logic that is being performed when replacing SortMergeJoin with GpuShuffledHashJoin.

Describe alternatives you've considered
We could try to leverage libcudf's sort-based aggregate functionality, but we probably only want to leverage it when we know the data happens to already be sorted on the grouping keys before the aggregate. I don't think we want to explicitly sort on the keys, as libcudf will already do this if necessary when performing an aggregate that must be performed using sorted data.

…IDIA#601) Signed-off-by: spark-rapids automation <[email protected]> Signed-off-by: spark-rapids automation <[email protected]>

jlowe added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Aug 21, 2020

sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Aug 25, 2020

revans2 self-assigned this Sep 8, 2020

revans2 added this to the Aug 31 - Sep 11 milestone Sep 8, 2020

revans2 mentioned this issue Sep 8, 2020

Remove sorts when replacing sort aggregate if possible #686

Merged

revans2 closed this as completed in #686 Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Optimize unnecessary sorts when replacing SortAggregate #601

[FEA] Optimize unnecessary sorts when replacing SortAggregate #601

jlowe commented Aug 21, 2020

[FEA] Optimize unnecessary sorts when replacing SortAggregate #601

[FEA] Optimize unnecessary sorts when replacing SortAggregate #601

Comments

jlowe commented Aug 21, 2020