Support legacy behavior of parameterless count #1958

razajafri · 2021-03-17T21:20:52Z

This PR returns a Long col with a single row with a value of 0

closes #1737

Signed-off-by: Raza Jafri [email protected]

Signed-off-by: Raza Jafri <[email protected]>

razajafri · 2021-03-17T21:21:16Z

@abellina can you take a look since you originally worked on aggregates?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

razajafri · 2021-03-18T17:52:54Z

build

Signed-off-by: Raza Jafri <[email protected]>

razajafri · 2021-03-18T20:27:35Z

I have added tests. I assumed the count was tested but it wasn't the parameterless count obviously, should've known better.

@abellina have I answered your questions?
@revans2 PTAL

razajafri · 2021-03-19T17:25:00Z

build

razajafri · 2021-03-22T22:41:31Z

@abellina are you OK with this PR?

integration_tests/src/main/python/hash_aggregate_test.py

Signed-off-by: Raza Jafri <[email protected]>

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala

abellina · 2021-03-23T18:47:59Z

@razajafri just thought of an edge case. Two count() "aggs" back to back. Does your code work with this?

scala> spark.sql("select count(),count() from foo").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias('count(), None), unresolvedalias('count(), None)]
+- 'UnresolvedRelation [foo], [], false

== Analyzed Logical Plan ==
count(): bigint, count(): bigint
Aggregate [count() AS count()#255L, count() AS count()#256L]
+- SubqueryAlias foo
   +- SerializeFromObject [input[0, int, false] AS value#2]
      +- ExternalRDD [obj#1]

== Optimized Logical Plan ==
Aggregate [0 AS count()#255L, 0 AS count()#256L]
+- SerializeFromObject
   +- ExternalRDD [obj#1]

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[], output=[count()#255L, count()#256L])
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#983]
      +- HashAggregate(keys=[], functions=[], output=[])
         +- SerializeFromObject
            +- Scan[obj#1]

scala> spark.sql("select count(),count() from foo").collect
res32: Array[org.apache.spark.sql.Row] = Array([0,0])

Signed-off-by: Raza Jafri <[email protected]>

abellina · 2021-03-23T19:33:23Z

build

razajafri · 2021-03-23T19:33:27Z

@abellina PTAL

abellina · 2021-03-23T19:39:56Z

@razajafri thanks for adding tests. I am still not clear on how this works.

My theory is that it works because there is a projection at the end that just sets 0s (i.e. the scalar you are generating is not used). A quick test would be to use something other than 0 for your scalar, to see if that makes a difference.

If the above is true, I don't think it changes the impl. It would just be good to fully understand how this is propagating to the result.

razajafri · 2021-03-23T20:01:32Z

My theory is that it works because there is a projection at the end that just sets 0s (i.e. the scalar you are generating is not used). A quick test would be to use something other than 0 for your scalar, to see if that makes a difference.

You are right that's how its working. I did a quick test by using a scalar value of 2 and that had no effect on the result

abellina

@razajafri thanks for the changes and for the testing. I think this makes sense and I can't think of a simpler way at this point.

Signed-off-by: Raza Jafri <[email protected]>

razajafri · 2021-03-23T20:22:27Z

build

* Return a Long col with 0 if agg is empty Signed-off-by: Raza Jafri <[email protected]> * addressed review comments Signed-off-by: Raza Jafri <[email protected]> * improved tests Signed-off-by: Raza Jafri <[email protected]> * added two counts Signed-off-by: Raza Jafri <[email protected]> * added comment Signed-off-by: Raza Jafri <[email protected]> Co-authored-by: Raza Jafri <[email protected]>

Return a Long col with 0 if agg is empty

84f66a4

Signed-off-by: Raza Jafri <[email protected]>

abellina reviewed Mar 17, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala Show resolved Hide resolved

abellina reviewed Mar 17, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala Outdated Show resolved Hide resolved

sameerz added the task Work required that improves the product but is not user facing label Mar 18, 2021

sameerz added this to the Mar 15 - March 26 milestone Mar 18, 2021

addressed review comments

812ec2a

Signed-off-by: Raza Jafri <[email protected]>

abellina reviewed Mar 23, 2021

View reviewed changes

integration_tests/src/main/python/hash_aggregate_test.py Show resolved Hide resolved

razajafri added 2 commits March 23, 2021 10:08

Merge remote-tracking branch 'origin/branch-0.5' into count

ea5860d

Signed-off-by: Raza Jafri <[email protected]>

improved tests

6f9774b

Signed-off-by: Raza Jafri <[email protected]>

abellina reviewed Mar 23, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/aggregate.scala Outdated Show resolved Hide resolved

added two counts

41cabb9

Signed-off-by: Raza Jafri <[email protected]>

abellina previously approved these changes Mar 23, 2021

View reviewed changes

added comment

c40aa7a

Signed-off-by: Raza Jafri <[email protected]>

razajafri dismissed abellina’s stale review via c40aa7a March 23, 2021 20:18

abellina approved these changes Mar 23, 2021

View reviewed changes

razajafri merged commit 097dc97 into NVIDIA:branch-0.5 Mar 23, 2021

razajafri deleted the parameterless_count branch March 23, 2021 22:40

jlowe mentioned this pull request Mar 24, 2021

Update to new is_before_spark_311 function name #2005

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support legacy behavior of parameterless count #1958

Support legacy behavior of parameterless count #1958

razajafri commented Mar 17, 2021

razajafri commented Mar 17, 2021

razajafri commented Mar 18, 2021

razajafri commented Mar 18, 2021

razajafri commented Mar 19, 2021

razajafri commented Mar 22, 2021

abellina commented Mar 23, 2021

abellina commented Mar 23, 2021

razajafri commented Mar 23, 2021

abellina commented Mar 23, 2021

razajafri commented Mar 23, 2021

abellina left a comment

razajafri commented Mar 23, 2021

Support legacy behavior of parameterless count #1958

Support legacy behavior of parameterless count #1958

Conversation

razajafri commented Mar 17, 2021

razajafri commented Mar 17, 2021

razajafri commented Mar 18, 2021

razajafri commented Mar 18, 2021

razajafri commented Mar 19, 2021

razajafri commented Mar 22, 2021

abellina commented Mar 23, 2021

abellina commented Mar 23, 2021

razajafri commented Mar 23, 2021

abellina commented Mar 23, 2021

razajafri commented Mar 23, 2021

abellina left a comment

Choose a reason for hiding this comment

razajafri commented Mar 23, 2021