-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/stats: don't use linear regression with NaN for stats forecasting #113712
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let Michael review this, but I wonder whether we want to revert 120132e given that we now have a proper fix?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job finding this btw!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I wonder whether we want to revert 120132e given that we now have a proper fix?
Good point. I've reverted it, since fixMalformed
shouldn't return an error now.
I also expanded the invalid
check to include +/-Inf
values.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah, I was checking for these in toQuantileValue
and fromQuantileValue
but it never occurred to me to check in between! Great catch!
It's possible this fixes #93892 as well, if we have an invalid quantile that makes it through fixMalformed without panicking. I'll think about that more.
@DrewKimball WDYT about adding another commit that adds panic-catching to GetTableStats
?
cockroach/pkg/sql/stats/stats_cache.go
Lines 215 to 223 in 722f902
func (sc *TableStatisticsCache) GetTableStats( | |
ctx context.Context, table catalog.TableDescriptor, | |
) ([]*TableStatistic, error) { | |
if !statsUsageAllowed(table, sc.settings) { | |
return nil, nil | |
} | |
forecast := forecastAllowed(table, sc.settings) | |
return sc.getTableStatsFromCache(ctx, table.GetID(), &forecast) | |
} |
If we panic while getting stats, I think we should try to plan the query without stats, rather than cause the query (all queries touching that table) to fail.
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/stats/quantile.go
line 658 at r2 (raw file):
// This function fixes the negative slope pieces by "moving" p from the // overlapping places to where it should be. No p is lost in the making of these // calculations.
If you wouldn't mind, could you add a sentence to this comment mentioning that isInvalid
must be checked first?
This patch adds a method to the quantile function, `invalid`, which checks for `NaN` and `+/-Inf` values within the function. These indicate a failure to derive a linear regression model for the observed histograms, and can lead to internal errors and panics if not caught. Stats forecasting now checks this method before attempting to use a quantile function to derive a histogram prediction. This patch also reverts 120132e, since `fixMalformed` is no longer expected to return an error. Fixes cockroachdb#113645 Fixes cockroachdb#113680 Release note (bug fix): Fixed a bug that could cause internal errors or panics while attempting to forecast statistics on a numeric column.
This patch adds panic-catching logic to `GetTableStats`, so that "safe" panics (out-of-bounds, assertion etc.) can be propagated as a returned error. This allows some code-paths to ignore the returned error, so that execution can proceed even there's an unexpected error in the stats code. This prevents situations where it becomes impossible to query a table because of a stats bug. Informs cockroachdb#113645 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we panic while getting stats, I think we should try to plan the query without stats, rather than cause the query (all queries touching that table) to fail.
I think that makes sense, done.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)
pkg/sql/stats/quantile.go
line 658 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
If you wouldn't mind, could you add a sentence to this comment mentioning that
isInvalid
must be checked first?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
TFTR! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 050b37c to blathers/backport-release-22.2-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from 050b37c to blathers/backport-release-23.1-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
I think we should also backport this to release-22.2.17-rc |
blathers backport 23.1.12-rc |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 050b37c to blathers/backport-release-23.1.12-rc.FROZEN-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.12-rc failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
blathers backport 23.1.12-rc.Frozen |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error getting backport branch release-23.1.12-rc.Frozen: unexpected status code: 404 Not Found Backport to branch 23.1.12-rc.Frozen failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
blathers backport 23.1.12-rc.FROZEN |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 050b37c to blathers/backport-release-23.1.12-rc.FROZEN-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.12-rc.FROZEN failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This patch adds a method to the quantile function,
invalid
, which checks forNaN
values within the function. These indicate a failure to derive a linear regression model for the observed histograms, and can lead to internal errors and panics if not caught. Stats forecasting now checks this method before attempting to use a quantile function to derive a histogram prediction.Fixes #113645
Fixes #113680
Release note (bug fix): Fixed a bug that could cause internal errors or panics while attempting to forecast statistics on a numeric column.