Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: don't use linear regression with NaN for stats forecasting #113712

Merged
merged 2 commits into from
Nov 3, 2023

Conversation

DrewKimball
Copy link
Collaborator

This patch adds a method to the quantile function, invalid, which checks for NaN values within the function. These indicate a failure to derive a linear regression model for the observed histograms, and can lead to internal errors and panics if not caught. Stats forecasting now checks this method before attempting to use a quantile function to derive a histogram prediction.

Fixes #113645
Fixes #113680

Release note (bug fix): Fixed a bug that could cause internal errors or panics while attempting to forecast statistics on a numeric column.

@DrewKimball DrewKimball added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Nov 2, 2023
@DrewKimball DrewKimball requested a review from michae2 November 2, 2023 21:12
@DrewKimball DrewKimball requested a review from a team as a code owner November 2, 2023 21:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let Michael review this, but I wonder whether we want to revert 120132e given that we now have a proper fix?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job finding this btw!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I wonder whether we want to revert 120132e given that we now have a proper fix?

Good point. I've reverted it, since fixMalformed shouldn't return an error now.

I also expanded the invalid check to include +/-Inf values.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, I was checking for these in toQuantileValue and fromQuantileValue but it never occurred to me to check in between! Great catch! :lgtm:

It's possible this fixes #93892 as well, if we have an invalid quantile that makes it through fixMalformed without panicking. I'll think about that more.

@DrewKimball WDYT about adding another commit that adds panic-catching to GetTableStats?

func (sc *TableStatisticsCache) GetTableStats(
ctx context.Context, table catalog.TableDescriptor,
) ([]*TableStatistic, error) {
if !statsUsageAllowed(table, sc.settings) {
return nil, nil
}
forecast := forecastAllowed(table, sc.settings)
return sc.getTableStatsFromCache(ctx, table.GetID(), &forecast)
}

If we panic while getting stats, I think we should try to plan the query without stats, rather than cause the query (all queries touching that table) to fail.

Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/stats/quantile.go line 658 at r2 (raw file):

// This function fixes the negative slope pieces by "moving" p from the
// overlapping places to where it should be. No p is lost in the making of these
// calculations.

If you wouldn't mind, could you add a sentence to this comment mentioning that isInvalid must be checked first?

This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we panic while getting stats, I think we should try to plan the query without stats, rather than cause the query (all queries touching that table) to fail.

I think that makes sense, done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)


pkg/sql/stats/quantile.go line 658 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

If you wouldn't mind, could you add a sentence to this comment mentioning that isInvalid must be checked first?

Done.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thank you!

Reviewed 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

@DrewKimball
Copy link
Collaborator Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 3, 2023

Build succeeded:

Copy link

blathers-crl bot commented Nov 3, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 050b37c to blathers/backport-release-22.2-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


error creating merge commit from 050b37c to blathers/backport-release-23.1-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@michae2
Copy link
Collaborator

michae2 commented Nov 3, 2023

I think we should also backport this to release-22.2.17-rc

@DrewKimball
Copy link
Collaborator Author

blathers backport 23.1.12-rc

Copy link

blathers-crl bot commented Nov 4, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 050b37c to blathers/backport-release-23.1.12-rc.FROZEN-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.12-rc failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@DrewKimball
Copy link
Collaborator Author

blathers backport 23.1.12-rc.Frozen

Copy link

blathers-crl bot commented Nov 6, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error getting backport branch release-23.1.12-rc.Frozen: unexpected status code: 404 Not Found

Backport to branch 23.1.12-rc.Frozen failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@DrewKimball
Copy link
Collaborator Author

blathers backport 23.1.12-rc.FROZEN

Copy link

blathers-crl bot commented Nov 6, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 050b37c to blathers/backport-release-23.1.12-rc.FROZEN-113712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.12-rc.FROZEN failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
4 participants