-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix incorrect output from averages with filters in partial only mode #612
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One smallish comment, otherwise lgtm.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
Addressed review comments @abellina and additionally took out a test that was redundant after this fix. Please take a look. |
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
build |
Waiting for approvals after which, I will rebase-squash and sign off. |
Signed-off-by: Kuhu Shukla <[email protected]>
a89891c
to
f291564
Compare
Squashed and signed off, re-runing CI |
build |
…DIA#612) Signed-off-by: Kuhu Shukla <[email protected]> Co-authored-by: Kuhu Shukla <[email protected]>
…DIA#612) Signed-off-by: Kuhu Shukla <[email protected]> Co-authored-by: Kuhu Shukla <[email protected]>
…IDIA#612) Signed-off-by: spark-rapids automation <[email protected]> Signed-off-by: spark-rapids automation <[email protected]>
Fixes #155 .
This is WIP for a couple reasons and also as I want to get more comments on the approach shown in the PR.
As the associated issue explains in some detail, how we are ending up with nulls being sent down to the CPU's final aggregation. The fix tries to circumvent this by special casing how averages are handled by the case-when of the filter. One thing to note is, we need this special case since we don't have a clean way to pass the filter down to the
GpuAverage
DeclarativeAggregate itself. When anull
comes into the filter we want it not pass down any further but instead default to(0.0, 0)
. This seems to be specific for averages while other aggregates likecount
require nulls. I had to use another case-when withisnotnull
to make this happen. I have some concerns on the else condition added in this PR and would like to know what others think is a better way to do this. I tried some games with having initialValues be reused in the filter it was not straight forward what do to with their data types. At the end, I am also open to use this PR to morph the solution to put the hammer on filters+avg+partial_only_conf to fall back on the CPU, but that is not my first preference. Tagging @abellina for some discussion to get this PR going.