Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix stddev indeterministically producing NAN #13248

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

eejbyfeldt
Copy link
Contributor

@eejbyfeldt eejbyfeldt commented Nov 4, 2024

Which issue does this PR close?

Closes #13247

Rationale for this change

In the VarianceGroupAccumulator we were missing a count == 0 check
that is present in the normal Accumulator. This mostly does not matter
except for the case where the first state to be merged has count == 0
then the merge function will calculate a new m2 of NAN which will
propagate to the final result.

What changes are included in this PR?

This fixes the bug bu adding the missing count == 0 check.

Are these changes tested?

New and existing tests.

Are there any user-facing changes?

Correctness issue is fixed.

@eejbyfeldt eejbyfeldt force-pushed the i-13247-variance-correctness-bug-nan branch from 82219b5 to 2580bcc Compare November 4, 2024 13:49
In the VarianceGroupAccumulator we were missing a `count == 0` check
that is present in the normal Accumulator. This mostly does not matter
except for the case where the first state to be merged has `count == 0`
then the `merge` function will incorrectly calculate a new m2 of NAN
which will propagate to the final result.

This fixes the bug bu adding the missing `count == 0` check.
@eejbyfeldt eejbyfeldt force-pushed the i-13247-variance-correctness-bug-nan branch from 2580bcc to 3265483 Compare November 4, 2024 13:52
@eejbyfeldt eejbyfeldt marked this pull request as ready for review November 4, 2024 13:52
@@ -316,6 +316,7 @@ fn merge(
mean2: f64,
m22: f64,
) -> (u64, f64, f64) {
debug_assert!(count != 0 || count2 != 0, "Cannot merge two empty states");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly know how it works. Is supporting this not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a precondition of using this function. So support is not needed here, but expected to be handled by callers. When both counts are 0 then also mean and m2 is 0 and the correct result would also be all 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i should have checked the callers. thanks. 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @eejbyfeldt and @findepi

@alamb alamb merged commit 48b8c13 into apache:main Nov 5, 2024
24 checks passed
@eejbyfeldt eejbyfeldt deleted the i-13247-variance-correctness-bug-nan branch November 5, 2024 20:47
@alamb alamb mentioned this pull request Nov 5, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

variance/stddev sometimes produces incorrect results
3 participants