Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate #2971

sperlingxx · 2021-07-20T06:54:27Z

Signed-off-by: sperlingxx [email protected]

Current PR is to support GpuCollectList and GpuCollectSet as TypedImperativeAggregate, which is the task 3 of #2916. In this PR, we also introduce TypedImperativeAggExprMeta and GpuNoHashAggregateMeta to provide a general support for TypedImperativeAggregate functions.

In addition, Aggregate stacks with TypedImperativeAggregate functions may lead to unexpected crash if the stack falls back to CPU partially, because GPU data types are inconsistent with CPU counterparts. This problem will be fixed in the task 4 of #2916. To avoid this kind of unexpected crash in current, we bring up the "associated fallback" mechanism in this PR, which only affects Aggregate plans containing TypedImperativeAggregate functions.

The "associated fallback" falls back all stages of an Aggregate (logical plan) to CPU once we need to fall back any stage of the plan. The "associated fallback" will be triggered on each final stage of Aggregate which contains TypedImperativeAggregate functions. It traverses the plan tree to collect all stages of current Aggregate (logical plan), and to determine whether to fallback them entirely or not. In addition, the "associated fallback" also works when AQE is on.

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-20T06:57:21Z

build

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-20T09:03:37Z

build

jlowe · 2021-07-20T20:16:47Z

#2916 was postponed to the 21.10 release. Given we're in burndown, I think this should be retargeted to branch-21.10 when that is available (hopefully soon).

pxLi · 2021-07-21T01:17:50Z

~~Yes, I am still waiting for cudf to get their CI ready for 21.10 (planned on Jul 22). After that I can start working on setup CICD for our plugin 21.10.0~~

pre-merge for 21.10 is ready

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-21T07:14:39Z

build

sperlingxx · 2021-07-21T07:22:27Z

#2916 was postponed to the 21.10 release. Given we're in burndown, I think this should be retargeted to branch-21.10 when that is available (hopefully soon).

Re-targeted to the new branch.

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-21T09:37:38Z

build

abellina

@sperlingxx first pass through your changes.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsMeta.scala

abellina · 2021-07-22T18:55:51Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

+ * Base class for metadata around `SortAggregateExec` and `ObjectHashAggregateExec`, which may
+ * contain TypedImperativeAggregate functions in aggregate expressions.
+ */
+abstract class GpuNoHashAggregateMeta[INPUT <: SparkPlan](


so this is called GpuNoHashAggregateMeta, but ObjectHashAggregateExec inherits from it. I think we should come up with a different name for GpuNoHashAggregateMeta but I can understand why you chose this.

Because GpuSortAggregateMeta also inherits from it. When spark.sql.execution.useObjectHashAggregateExec is set to False, Spark catalyst will plan a SortAggregateExec instead of ObjectAggregateExec for Aggregate (logical plan) with TypedImperativeAggregate functions.

I also think we can come up with a better name, even a long name like. GpuTypeImperativeSupportedAggregateExecMeta

abellina · 2021-07-22T19:08:43Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

+              val column = result.getColumn(i)
+              val rapidsType = GpuColumnVector.getRapidsType(dataTypes(i))
+              // extra type conversion check for nested types
+              if ((rapidsType.equals(DType.LIST) || rapidsType.equals(DType.STRUCT)) &&


Why not call typeConversionAllowed for all columns? (i.e. not need to special case LIST and STRUCT)?

For non-nested types, it may happen type casting here, such as casting INT to LONG. Therefore, the check of typeConversionAllowed may fail when the type conversion is necessary. For nested types, no type conversion is necessary (available), which indicates the check is safe. What's more, we didn't match any children types in GpuColumnVector.getRapidsType. So, we check whether they match or not via typeConversionAllowed.

I would prefer to see us handle things in terms of DataType instead of DType. getRapidsType is something we removed when we started to work on nested types because it loses a lot of information and it can easily be misused.

As a side note, are we seeing issues with this? Are we collecting a list/struct and the types are not correct?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuShuffleExchangeExec.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

….scala Co-authored-by: Alessandro Bellina <[email protected]>

Co-authored-by: Alessandro Bellina <[email protected]>

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-23T08:17:43Z

build

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-23T10:16:01Z

build

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-28T06:16:20Z

build

abellina · 2021-07-28T18:19:22Z

Thanks @sperlingxx. I am sorry for the delay, I'll take a look again today.

revans2

This mostly looks good. My main problem with this is

In addition, Aggregate stacks with TypedImperativeAggregate functions may lead to unexpected crash if the stack falls back to CPU partially, because GPU data types are inconsistent with CPU counterparts. This problem will be fixed in the task 4 of #2916. To avoid this kind of unexpected crash in current, we bring up the "associated fallback" mechanism in this PR, which only affects Aggregate plans containing TypedImperativeAggregate functions.

We cannot have CollectList and CollectSet on by default if there are chances that we can crash.

revans2 · 2021-07-29T14:46:59Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java

@@ -486,16 +486,32 @@ private static DType toRapidsOrNull(DataType type) {
      } else {
        return DecimalUtil.createCudfDecimal(dt.precision(), dt.scale());
      }
+    } else if (supportNestedType) {


Why do we need this? We removed nested types because a DType.LIST is missing lot of information, and if we are not careful with this type of an API it can cause bugs.

I agree. So, I reverted this change.

"As a side note, are we seeing issues with this? Are we collecting a list/struct and the types are not correct?"

For CollectList and CollectSet, I believe the types are always correct. But I am not sure whether there are some aggregations which we plan to support in future producing the inconsistent nested types.

revans2 · 2021-07-29T14:50:39Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

+              val column = result.getColumn(i)
+              val rapidsType = GpuColumnVector.getRapidsType(dataTypes(i))
+              // extra type conversion check for nested types
+              if ((rapidsType.equals(DType.LIST) || rapidsType.equals(DType.STRUCT)) &&


I would prefer to see us handle things in terms of DataType instead of DType. getRapidsType is something we removed when we started to work on nested types because it loses a lot of information and it can easily be misused.

As a side note, are we seeing issues with this? Are we collecting a list/struct and the types are not correct?

revans2 · 2021-07-29T15:03:45Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

+ * Base class for metadata around `SortAggregateExec` and `ObjectHashAggregateExec`, which may
+ * contain TypedImperativeAggregate functions in aggregate expressions.
+ */
+abstract class GpuNoHashAggregateMeta[INPUT <: SparkPlan](


I also think we can come up with a better name, even a long name like. GpuTypeImperativeSupportedAggregateExecMeta

integration_tests/src/main/python/hash_aggregate_test.py

revans2 · 2021-07-29T15:15:29Z

integration_tests/src/main/python/hash_aggregate_test.py

+# Queries with multiple distinct aggregations will fallback to CPU if they also contain
+# collect aggregations. Because Spark optimizer will insert expressions like `If` and `First`
+# when rewriting distinct aggregates, while `GpuIf` and `GpuFirst` doesn't support the datatype
+# of collect aggregations (ArrayType).


Can you dd in references to the issue to support these for GpuIF and GpuFirst? If they do not exist, then could you please file them?

Yes, I filed the issue.

revans2 · 2021-07-29T15:16:52Z

integration_tests/src/main/python/hash_aggregate_test.py

+                    count(distinct b),
+                    count(distinct c)
+            from tbl group by a"""
+    assert_gpu_and_cpu_are_equal_sql(


Typically when we have a fallback test we want an assertion that verifies part of the code actually did fall back like assert_gpu_sql_fallback_collect

I added the check of fallback capture.

revans2 · 2021-07-29T15:17:51Z

integration_tests/src/main/python/hash_aggregate_test.py

+@pytest.mark.parametrize('conf', [_nans_float_conf_partial, _nans_float_conf_final], ids=idfn)
+@pytest.mark.parametrize('aqe_enabled', ['true', 'false'], ids=idfn)
+def test_hash_groupby_collect_partial_replace_fallback(data_gen, conf, aqe_enabled):
+    conf.update({'spark.sql.adaptive.enabled': aqe_enabled})


Please always copy conf before doing an update. We have seen issues with global values being modified by tests doing this and it is just good practice.

sperlingxx · 2021-07-30T04:06:23Z

This mostly looks good. My main problem with this is

In addition, Aggregate stacks with TypedImperativeAggregate functions may lead to unexpected crash if the stack falls back to CPU partially, because GPU data types are inconsistent with CPU counterparts. This problem will be fixed in the task 4 of #2916. To avoid this kind of unexpected crash in current, we bring up the "associated fallback" mechanism in this PR, which only affects Aggregate plans containing TypedImperativeAggregate functions.

We cannot have CollectList and CollectSet on by default if there are chances that we can crash.

Yes, and I believe we can get rid of the potential crashes with "associated fallback".

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-07-30T07:39:29Z

build

integration_tests/src/main/python/hash_aggregate_test.py

revans2 · 2021-07-30T12:43:22Z

integration_tests/src/main/python/hash_aggregate_test.py

+            .groupby('a')
+            .agg(f.sort_array(f.collect_list('b')), f.sort_array(f.collect_set('b'))),
+        conf=local_conf)
+    assert_gpu_fallback_collect(


This also verifies that the CPU and the GPU are equal so you don't need both parts.

revans2 · 2021-07-30T12:43:52Z

integration_tests/src/main/python/hash_aggregate_test.py

+        cpu_fallback_class_name='ObjectHashAggregateExec',
+        conf=local_conf)
+    # test with single Distinct
+    assert_gpu_and_cpu_are_equal_collect(


Here too if the tests are slightly different, then lets have a different test function for each test case.

revans2 · 2021-07-30T12:52:01Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

+    val stageMetas = mutable.ListBuffer[GpuBaseAggregateMeta[_]]()
+    // Go through all Aggregate stages to check whether all stages is GPU supported. If not,
+    // we fall back all GPU supported stages to CPU.
+    if (recursiveCheckForFallback(meta, logicalPlan, stageMetas)) {


So just to be sure that I understand this correctly. When AQE is not enabled we go through and see if we can fall back or not and if any one of them fell back then we mark all of them as needing to fall back. Is that correct? What about when AQE is enabled and the first aggregation (the partial one) may have already executed? We can mark it to fall back to the CPU, but it will do nothing because it has already executed. How do we handle that case?

To adapt AQE, I took advantage of gpuSupportedTag which was introduced by @andygrove. I added the line

wrapped.getTagValue(gpuSupportedTag).foreach(_.foreach(willNotWorkOnGpu))

in GpuTypedImperativeSupportedAggregateExecMeta.tagPlanForGpu to retrieve the information about the GPU support which was captured and cached during GpuQueryStagePrepOverrides.

I understand, but does that run over the entire plan at some point, or is it just sections of the plan. If it is the entire plan it would be good to explain that in a comment, because otherwise it looks like we have cases where we can crash.

I made up some comments on this section.

revans2 · 2021-07-30T12:52:57Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

+      meta: GpuTypedImperativeSupportedAggregateExecMeta[_]): Unit = {
+    // We only run the check for final stages which contain TypedImperativeAggregate.
+    val needToCheck = meta.agg.aggregateExpressions.exists(e =>
+      (e.mode == Final || e.mode == Complete) &&


FYI Complete means that the entire aggregation is happening in one pass. So there should be no need to check for a corresponding first part of the aggregation, because there should be none. This only shows up on databricks right now, so it is not super simple to test.

I removed Complete

We should also test this on databricks if you have not done so already.

I tested it on Databricks (https://blossom.nvidia.com/sw-gpu-spark-jenkins/view/Testing/job/lc-db/19/execution/node/60/log/). Everything looks fine.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-08-02T10:44:37Z

build

revans2

It looks good. I mostly want to be sure that we have run the tests on databricks. And it would be nice to have some of the comments in the fallback code updated to explain how it works with AQE so it is simpler to follow.

abellina · 2021-08-02T23:11:21Z

integration_tests/src/main/python/hash_aggregate_test.py

+    assert_gpu_and_cpu_are_equal_collect(
+        lambda spark: gen_df(spark, data_gen, length=100)
+            .groupby('a')
+            .agg(f.sort_array(f.collect_list('b')), f.count('b')),


It seems every collect_* function is wrapped in a sort_array. Is that on purpose? Could a comment be added somewhere on why? Especially because we have @ignore_order so I was curious.

It is because @ignore_order only ensures the order between rows, while in these cases we also need to take care of the orders of each Array produced by collect ops.
And I added this comment to the test file.

Makes sense. I’d extend ignore_order to do the sorting after the collect for the array case. The reason being you would be able to test the aggregate in another way a user is likely to invoke.

I’m ok if you want to do that as a follow up also.

I would be fine with an @ignore_array_order or something like that. I'd rather not have @ignore_order cover both.

But do it as a follow on issue if we do it at all.

abellina

I had at this point the comment on the tests, but otherwise it is looking good so far.

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2021-08-03T05:20:09Z

build

support CollectOps as TypedImperativeAggregate functions

e0bdf54

Signed-off-by: sperlingxx <[email protected]>

sperlingxx requested review from jlowe, revans2 and firestarman July 20, 2021 06:54

sperlingxx mentioned this pull request Jul 20, 2021

[FEA] Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate #2916

Closed

sperlingxx added 3 commits July 20, 2021 16:50

add GpuColumnVector.getRapidsType

2cc6ea0

Signed-off-by: sperlingxx <[email protected]>

add some comments

17958ec

Signed-off-by: sperlingxx <[email protected]>

fix scala code style

6f9c4e4

Signed-off-by: sperlingxx <[email protected]>

support associated fallback on TypedImperativeAggregate

108d122

Signed-off-by: sperlingxx <[email protected]>

sperlingxx changed the base branch from branch-21.08 to branch-21.10 July 21, 2021 07:21

add AQE support

099fc5f

Signed-off-by: sperlingxx <[email protected]>

abellina self-requested a review July 21, 2021 12:37

sameerz added the task Work required that improves the product but is not user facing label Jul 21, 2021

abellina requested changes Jul 22, 2021

View reviewed changes

sperlingxx and others added 4 commits July 23, 2021 10:50

Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides…

8a5ae94

….scala Co-authored-by: Alessandro Bellina <[email protected]>

Apply suggestions from code review

9b469be

Co-authored-by: Alessandro Bellina <[email protected]>

Merge remote-tracking branch 'origin/branch-21.10' into gpu_collect_ops

246fbf0

rework associated fallback

8c46a93

Signed-off-by: sperlingxx <[email protected]>

add some comment

edadc21

Signed-off-by: sperlingxx <[email protected]>

sperlingxx requested a review from abellina July 23, 2021 10:16

jlowe mentioned this pull request Jul 23, 2021

[WIP] Support GpuCollectList/GpuCollectSet in groupBy aggregation #2804

Closed

sperlingxx added 3 commits July 28, 2021 12:08

remove ENABLE_TYPED_IMPERATIVE_AGGREGATE

0118cd3

Signed-off-by: sperlingxx <[email protected]>

refine

dadd585

Signed-off-by: sperlingxx <[email protected]>

refine

41a1613

Signed-off-by: sperlingxx <[email protected]>

revans2 reviewed Jul 29, 2021

View reviewed changes

update

3bbc9aa

Signed-off-by: sperlingxx <[email protected]>

revans2 reviewed Jul 30, 2021

View reviewed changes

test enhencement

66084d4

Signed-off-by: sperlingxx <[email protected]>

revans2 previously approved these changes Aug 2, 2021

View reviewed changes

abellina mentioned this pull request Aug 2, 2021

Fix order of operations when using mkString in typeConversionInfo #3113

Merged

abellina reviewed Aug 2, 2021

View reviewed changes

abellina reviewed Aug 3, 2021

View reviewed changes

make up some comments

28d3fda

Signed-off-by: sperlingxx <[email protected]>

sperlingxx dismissed revans2’s stale review via 28d3fda August 3, 2021 04:26

abellina approved these changes Aug 3, 2021

View reviewed changes

revans2 approved these changes Aug 3, 2021

View reviewed changes

sperlingxx merged commit 31133f3 into NVIDIA:branch-21.10 Aug 3, 2021

pxLi mentioned this pull request Aug 4, 2021

[BUG] hash_aggregate_test TypedImperativeAggregate tests failed #3131

Closed

ttnghia mentioned this pull request Oct 14, 2021

Correct 21.10 docs such as PCBS related FAQ [skip ci] #3815

Merged

sperlingxx deleted the gpu_collect_ops branch December 2, 2021 02:41

Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate #2971

Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate #2971

Conversation

sperlingxx commented Jul 20, 2021 • edited Loading

sperlingxx commented Jul 20, 2021

sperlingxx commented Jul 20, 2021

jlowe commented Jul 20, 2021

pxLi commented Jul 21, 2021 • edited Loading

sperlingxx commented Jul 21, 2021

sperlingxx commented Jul 21, 2021

sperlingxx commented Jul 21, 2021

abellina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx commented Jul 23, 2021

sperlingxx commented Jul 23, 2021

sperlingxx commented Jul 28, 2021

abellina commented Jul 28, 2021

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx Jul 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx commented Jul 30, 2021

sperlingxx commented Jul 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx Aug 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sperlingxx commented Aug 2, 2021

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abellina left a comment

Choose a reason for hiding this comment

sperlingxx commented Aug 3, 2021

sperlingxx commented Jul 20, 2021 •

edited

Loading

pxLi commented Jul 21, 2021 •

edited

Loading

sperlingxx Jul 30, 2021 •

edited

Loading

sperlingxx Aug 2, 2021 •

edited

Loading