-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix another floating point precision error in statisticsBuilder #38730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, if this problem is happening often maybe it would make sense to have a
SetDistinctCount
which includes this kind of safeguard automatically? I guess it's pretty situational, though
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
TFTR! Yea, when @RaduBerinde and I discussed adding this assertion a while ago we decided that each operator would have a better idea about how to set the distinct count than a global function. It is a bit annoying to keep getting these bugs, but I think it's helped fix a bunch of issues. If they don't disappear before the release I can a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):
// 1 to avoid setting the distinct count to 0 (since the row count is // non-zero). p = 0.9999999
Would it make more sense to come up with a reasonable "final value" for DistinctCount
instead of passing this value through the formula below?
pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file): Previously, RaduBerinde wrote…
Not sure what that value would be... do you have any ideas? This approach at least seemed likely to produce something in the correct range. If it's larger than the row count it will get truncated to the row count in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):
Previously, rytaft wrote…
Not sure what that value would be... do you have any ideas? This approach at least seemed likely to produce something in the correct range. If it's larger than the row count it will get truncated to the row count in
finalizeFromRowCount
.
If this case is logically a case where selectivity is effectively 0, then DistinctCount
should be effectively 0, so it would be set to a very small value (I think we used 1e-7
in other places).
By the way, should the check be if p > 0.9999999
? It seems conceivable that we might get a value under 1 but very very close, and it would get treated as 1
below. Also, seems odd that selectivity=0 would get a bigger value than selectivity=1e-8
This commit fixes a floating point precision error in the statisticsBuilder code for estimating the distinct count of an unconstrained column in a SELECT or JOIN expression. Prior to this commit, the code was estimating that the probability of a row being filtered out was 1-selectivity. If the selectivity is very small, however, this results in probability=1 and estimated distinct count=0. This commit changes the logic so now we set the distinct count equal to 1e-10 if it would otherwise be equal to 0. Fixes cockroachdb#38375 Release note: None
pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file): Previously, RaduBerinde wrote…
Yea that makes sense (I decided to directly check distinct count instead, but same idea). I also changed the other spot to check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):
Previously, rytaft wrote…
Yea that makes sense (I decided to directly check distinct count instead, but same idea). I also changed the other spot to check
< epsilon
and for good measure made them both1e-10
.
Nice, I like it.
pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):
// when d << n. c.DistinctCount = d - d*math.Pow(1-selectivity, n/d) const epsilon = 1e-10
[nit] this could be reused across this file, e.g. distinctCountEpsilon
pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file): Previously, RaduBerinde wrote…
The other case is for selectivity, not distinct count. How about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):
Previously, rytaft wrote…
The other case is for selectivity, not distinct count. How about
StatsEpsilon
(needs to be exported since these are different packages)?
Ah, right. Never mind then, they can stay separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):
Previously, RaduBerinde wrote…
Ah, right. Never mind then, they can stay separate.
Sound good.
38730: opt: fix another floating point precision error in statisticsBuilder r=rytaft a=rytaft This commit fixes a floating point precision error in the statisticsBuilder code for estimating the distinct count of an unconstrained column in a `SELECT` or `JOIN` expression. Prior to this commit, the code was estimating that the probability of a row being filtered out was 1-selectivity. If the selectivity is very small, however, this results in probability=1. This commit changes the logic so now we set the probability equal to 0.9999999 if it would otherwise be equal to 1. This ensures that the estimated distinct count is always greater than 0 if the row count is greater than 0. Fixes #38375 Release note: None Co-authored-by: Rebecca Taft <[email protected]>
Build succeeded |
…lCounts A previous PR (cockroachdb#38730) updated the logic in selectivityFromNullCounts to compare the result of floating point arithmetic with a small constant epsilon. For consistency, this commit adds similar logic in joinSelectivityFromNullCounts. Release note: None
38795: opt: update joinSelectivityFromNullCounts to match selectivityFromNullCounts r=rytaft a=rytaft A previous PR (#38730) updated the logic in `selectivityFromNullCounts` to compare the result of floating point arithmetic with a small constant `epsilon`. For consistency, this commit adds similar logic in `joinSelectivityFromNullCounts`. Release note: None Co-authored-by: Rebecca Taft <[email protected]>
This commit fixes a floating point precision error in the
statisticsBuilder code for estimating the distinct count of an
unconstrained column in a
SELECT
orJOIN
expression.Prior to this commit, the code was estimating that the probability
of a row being filtered out was 1-selectivity. If the selectivity is
very small, however, this results in probability=1. This commit changes
the logic so now we set the probability equal to 0.9999999 if it would
otherwise be equal to 1. This ensures that the estimated distinct count
is always greater than 0 if the row count is greater than 0.
Fixes #38375
Release note: None