opt: fix another floating point precision error in statisticsBuilder #38730

rytaft · 2019-07-08T15:36:00Z

This commit fixes a floating point precision error in the
statisticsBuilder code for estimating the distinct count of an
unconstrained column in a SELECT or JOIN expression.

Prior to this commit, the code was estimating that the probability
of a row being filtered out was 1-selectivity. If the selectivity is
very small, however, this results in probability=1. This commit changes
the logic so now we set the probability equal to 0.9999999 if it would
otherwise be equal to 1. This ensures that the estimated distinct count
is always greater than 0 if the row count is greater than 0.

Fixes #38375

Release note: None

cockroach-teamcity · 2019-07-08T15:36:08Z

This change is

justinj

, if this problem is happening often maybe it would make sense to have a SetDistinctCount which includes this kind of safeguard automatically? I guess it's pretty situational, though

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)

rytaft · 2019-07-08T15:49:58Z

TFTR! Yea, when @RaduBerinde and I discussed adding this assertion a while ago we decided that each operator would have a better idea about how to set the distinct count than a global function. It is a bit annoying to keep getting these bugs, but I think it's helped fix a bunch of issues. If they don't disappear before the release I can a SetDistinctCount function so customers don't run into this....

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)

pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):

		// 1 to avoid setting the distinct count to 0 (since the row count is
		// non-zero).
		p = 0.9999999

Would it make more sense to come up with a reasonable "final value" for DistinctCount instead of passing this value through the formula below?

rytaft · 2019-07-08T16:10:05Z

pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):

Previously, RaduBerinde wrote…

Would it make more sense to come up with a reasonable "final value" for DistinctCount instead of passing this value through the formula below?

Not sure what that value would be... do you have any ideas? This approach at least seemed likely to produce something in the correct range. If it's larger than the row count it will get truncated to the row count in finalizeFromRowCount.

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):

Previously, rytaft wrote…

Not sure what that value would be... do you have any ideas? This approach at least seemed likely to produce something in the correct range. If it's larger than the row count it will get truncated to the row count in finalizeFromRowCount.

If this case is logically a case where selectivity is effectively 0, then DistinctCount should be effectively 0, so it would be set to a very small value (I think we used 1e-7 in other places).

By the way, should the check be if p > 0.9999999 ? It seems conceivable that we might get a value under 1 but very very close, and it would get treated as 1 below. Also, seems odd that selectivity=0 would get a bigger value than selectivity=1e-8

This commit fixes a floating point precision error in the statisticsBuilder code for estimating the distinct count of an unconstrained column in a SELECT or JOIN expression. Prior to this commit, the code was estimating that the probability of a row being filtered out was 1-selectivity. If the selectivity is very small, however, this results in probability=1 and estimated distinct count=0. This commit changes the logic so now we set the distinct count equal to 1e-10 if it would otherwise be equal to 0. Fixes cockroachdb#38375 Release note: None

rytaft · 2019-07-09T17:27:17Z

pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):

Previously, RaduBerinde wrote…

If this case is logically a case where selectivity is effectively 0, then DistinctCount should be effectively 0, so it would be set to a very small value (I think we used 1e-7 in other places).

By the way, should the check be if p > 0.9999999 ? It seems conceivable that we might get a value under 1 but very very close, and it would get treated as 1 below. Also, seems odd that selectivity=0 would get a bigger value than selectivity=1e-8

Yea that makes sense (I decided to directly check distinct count instead, but same idea). I also changed the other spot to check < epsilon and for good measure made them both 1e-10.

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

pkg/sql/opt/props/statistics.go, line 197 at r1 (raw file):

Previously, rytaft wrote…

Yea that makes sense (I decided to directly check distinct count instead, but same idea). I also changed the other spot to check < epsilon and for good measure made them both 1e-10.

Nice, I like it.

pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):

	// when d << n.
	c.DistinctCount = d - d*math.Pow(1-selectivity, n/d)
	const epsilon = 1e-10

[nit] this could be reused across this file, e.g. distinctCountEpsilon

rytaft · 2019-07-09T18:26:56Z

pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] this could be reused across this file, e.g. distinctCountEpsilon

The other case is for selectivity, not distinct count. How about StatsEpsilon (needs to be exported since these are different packages)?

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):

Previously, rytaft wrote…

The other case is for selectivity, not distinct count. How about StatsEpsilon (needs to be exported since these are different packages)?

Ah, right. Never mind then, they can stay separate.

rytaft

TFTR!

bors r+

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

pkg/sql/opt/props/statistics.go, line 192 at r2 (raw file):

Previously, RaduBerinde wrote…

Ah, right. Never mind then, they can stay separate.

Sound good.

38730: opt: fix another floating point precision error in statisticsBuilder r=rytaft a=rytaft This commit fixes a floating point precision error in the statisticsBuilder code for estimating the distinct count of an unconstrained column in a `SELECT` or `JOIN` expression. Prior to this commit, the code was estimating that the probability of a row being filtered out was 1-selectivity. If the selectivity is very small, however, this results in probability=1. This commit changes the logic so now we set the probability equal to 0.9999999 if it would otherwise be equal to 1. This ensures that the estimated distinct count is always greater than 0 if the row count is greater than 0. Fixes #38375 Release note: None Co-authored-by: Rebecca Taft <[email protected]>

craig · 2019-07-09T19:15:11Z

Build succeeded

GitHub CI (Cockroach)

…lCounts A previous PR (cockroachdb#38730) updated the logic in selectivityFromNullCounts to compare the result of floating point arithmetic with a small constant epsilon. For consistency, this commit adds similar logic in joinSelectivityFromNullCounts. Release note: None

38795: opt: update joinSelectivityFromNullCounts to match selectivityFromNullCounts r=rytaft a=rytaft A previous PR (#38730) updated the logic in `selectivityFromNullCounts` to compare the result of floating point arithmetic with a small constant `epsilon`. For consistency, this commit adds similar logic in `joinSelectivityFromNullCounts`. Release note: None Co-authored-by: Rebecca Taft <[email protected]>

rytaft requested review from justinj and RaduBerinde July 8, 2019 15:36

rytaft requested a review from a team as a code owner July 8, 2019 15:36

justinj reviewed Jul 8, 2019

View reviewed changes

RaduBerinde reviewed Jul 8, 2019

View reviewed changes

rytaft force-pushed the zero-distinct branch from c14c049 to 2c1e193 Compare July 9, 2019 17:24

RaduBerinde approved these changes Jul 9, 2019

View reviewed changes

RaduBerinde reviewed Jul 9, 2019

View reviewed changes

rytaft commented Jul 9, 2019

View reviewed changes

craig bot merged commit 2c1e193 into cockroachdb:master Jul 9, 2019

rytaft mentioned this pull request Jul 10, 2019

opt: update joinSelectivityFromNullCounts to match selectivityFromNullCounts #38795

Merged

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: fix another floating point precision error in statisticsBuilder #38730

opt: fix another floating point precision error in statisticsBuilder #38730

rytaft commented Jul 8, 2019

cockroach-teamcity commented Jul 8, 2019

justinj left a comment

rytaft commented Jul 8, 2019

RaduBerinde left a comment

rytaft commented Jul 8, 2019

RaduBerinde left a comment

rytaft commented Jul 9, 2019

RaduBerinde left a comment

rytaft commented Jul 9, 2019

RaduBerinde left a comment

rytaft left a comment

craig bot commented Jul 9, 2019

opt: fix another floating point precision error in statisticsBuilder #38730

opt: fix another floating point precision error in statisticsBuilder #38730

Conversation

rytaft commented Jul 8, 2019

cockroach-teamcity commented Jul 8, 2019

justinj left a comment

Choose a reason for hiding this comment

rytaft commented Jul 8, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft commented Jul 8, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft commented Jul 9, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft commented Jul 9, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

craig bot commented Jul 9, 2019

Build succeeded