-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute whole column variance using numerically stable approach #16448
Conversation
We use the pairwise approach of Chan, Golub, and LeVeque (1983). - Closes rapidsai#16444
We now need to compare to a tolerance, which was probably the case before, except we were getting lucky.
Back into draft while I figure out some edges cases around NAs. |
The ddof argument is meaningless for mean reductions, so we should not consider it when determining whether the result will be valid.
Now that we don't launch a kernel for some std and variance reductions, we might just get NA rather than nan. Match old behaviour by producing a nan of the right type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion, otherwise LGTM. Thank you for the attention to details throughout this PR!
The computed result is now more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving latest changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the java side of things, these tests are there just to verify that we plumbed thing up correctly and we are calling into the C++/cuda code correctly. Spark uses a totally different method to compute the population variance in a distributed way so we don't actually call into this code.
258aee4
to
99ac7c7
Compare
/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of very small things.
Description
We use the pairwise approach of Chan, Golub, and LeVeque (1983).
Checklist