-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Incorrect output from averages with filters in partial only mode #155
Comments
This is a tough one given the current design. Pushing the filter down to the agg function like avg() doesn't work and if we want to special case it for average in GpuAggregateExpressions wrapper, we would have to assume a few things or reach into the original aggregate function to check what it is and if it is an average. Appreciate any comments on if disabling is the way forward, or special casing would be OK. |
Could you add a little more info on what is hard about? And also the special case solution, what is the issue with that approach? I am a little confused in how having a partial cpu hash agg would impact filter. |
SUre. I tried an approach where I was forcing the knowledge that we have a filter or basically something that can cause nulls down to |
Signed-off-by: spark-rapids automation <[email protected]>
Describe the bug
When using filters on averages with conf set to run only the partial mode on the gpu, the result can be different from the CPU.
Steps/Code to reproduce bug
Adding the following test to
hash_aggregate_test.py
shows the failure behaviour.Expected behavior
cpu result =
266969243.5
gpu result =
None
The text was updated successfully, but these errors were encountered: