[FEA] Optimize unnecessary sorts when replacing SortAggregate #601
Labels
feature request
New feature or request
P1
Nice to have for release
performance
A performance related task/issue
Milestone
Is your feature request related to a problem? Please describe.
I noticed a query that had a GPU plan with this sequence of nodes somewhere in it:
GpuSort --> GpuHashAggregate --> GpuColumnarExchange --> GpuSort --> GpuHashAggregate
GpuHashAggregate makes no assumptions on the order of its input, so the two sorts aren't accomplishing anything useful. It looks like the original CPU plan used a sort-based aggregate and the plugin blindly replaced the sort with GpuSort and SortAggregate with HashAggregate. If the plugin is going to use HashAggregate to replace SortAggregate then it should remove the unnecessary sorts.
Describe the solution you'd like
The plugin removes unnecessary sorts in the plan when replacing a sort aggregate with a hash aggregate. This is similar to the behavior the plugin already performs when replacing a SortMergeJoin with a GpuShuffledHashJoin.
Note that a sort may be required after the aggregate if subsequent nodes in the plan need the output ordering guaranteed by SortAggregate. We should be able to leverage the same "insert sorts when needed" logic that is being performed when replacing SortMergeJoin with GpuShuffledHashJoin.
Describe alternatives you've considered
We could try to leverage libcudf's sort-based aggregate functionality, but we probably only want to leverage it when we know the data happens to already be sorted on the grouping keys before the aggregate. I don't think we want to explicitly sort on the keys, as libcudf will already do this if necessary when performing an aggregate that must be performed using sorted data.
The text was updated successfully, but these errors were encountered: