Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: stats forecasting can't handle NaN values #113645

Closed
DrewKimball opened this issue Nov 2, 2023 · 3 comments · Fixed by #113712
Closed

sql/stats: stats forecasting can't handle NaN values #113645

DrewKimball opened this issue Nov 2, 2023 · 3 comments · Fixed by #113712
Labels
branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team

Comments

@DrewKimball
Copy link
Collaborator

DrewKimball commented Nov 2, 2023

It is possible for the calculations involved with stats forecasting to result in NaN values, whether because histogram bounds are NaN or because of extreme values like the max float64 value. Currently, the quantile logic is unable to handle this. It is possible to trigger the error added in #109461 using the following test:

diff --git a/pkg/sql/stats/quantile_test.go b/pkg/sql/stats/quantile_test.go
index a7727384085..6f88999d497 100644
--- a/pkg/sql/stats/quantile_test.go
+++ b/pkg/sql/stats/quantile_test.go
@@ -1290,6 +1290,15 @@ func TestQuantileOps(t *testing.T) {
                        intSq: 25416.666666666664,
                        fixed: quantile{{0, 0}, {0.125, 0}, {0.25, 100}, {0.5, 100}, {0.7916666666666666, 200}, {0.9583333333333334, 300}, {1, 400}},
                },
+               {
+                       q:     quantile{{0, 100}, {0.125, 100}, {0.25, 200}, {0.375, 300}, {0.375, 200}, {0.5, 100}, {0.625, 100}, {0.75, 0}, {0.875, 0}, {0.875, 100}, {0.875, 400}, {1, math.NaN()}},
+                       addR:  quantile{{0, 102}, {0.125, 102.25}, {0.25, 202.5}, {0.375, 302.75}, {0.375, 202.75}, {0.5, 103}, {0.625, 103.25}, {0.75, 3.5}, {0.875, 3.75}, {0.875, 103.75}, {0.875, 403.75}, {1, 104}},
+                       addS:  quantile{{0, 111}, {0.125, 111}, {0.125, 113}, {0.25, 213}, {0.25, 215}, {0.375, 316}, {0.375, 216}, {0.5, 117}, {0.625, 118}, {0.75, 19}, {0.875, 20}, {0.875, 120}, {0.875, 420}, {1, 121}},
+                       subR:  quantile{{0, 98}, {0.125, 97.75}, {0.25, 197.5}, {0.375, 297.25}, {0.375, 197.25}, {0.5, 97}, {0.625, 96.75}, {0.75, -3.5}, {0.875, -3.75}, {0.875, 96.25}, {0.875, 396.25}, {1, 96}},
+                       subS:  quantile{{0, 89}, {0.125, 89}, {0.125, 87}, {0.25, 187}, {0.25, 185}, {0.375, 284}, {0.375, 184}, {0.5, 83}, {0.625, 82}, {0.75, -19}, {0.875, -20}, {0.875, 80}, {0.875, 380}, {1, 79}},
+                       intSq: 25416.666666666664,
+                       fixed: quantile{{0, 0}, {0.125, 0}, {0.25, 100}, {0.5, 100}, {0.7916666666666666, 200}, {0.9583333333333334, 300}, {1, 400}},
+               },
        }

as well as a related out-of-bounds error on a separate line with a slightly different test:

diff --git a/pkg/sql/stats/quantile_test.go b/pkg/sql/stats/quantile_test.go
index a7727384085..8e130590040 100644
--- a/pkg/sql/stats/quantile_test.go
+++ b/pkg/sql/stats/quantile_test.go
@@ -1290,6 +1290,15 @@ func TestQuantileOps(t *testing.T) {
                        intSq: 25416.666666666664,
                        fixed: quantile{{0, 0}, {0.125, 0}, {0.25, 100}, {0.5, 100}, {0.7916666666666666, 200}, {0.9583333333333334, 300}, {1, 400}},
                },
+               {
+                       q:     quantile{{0, 100}, {0.125, 100}, {0.25, 200}, {0.375, 300}, {0.375, 200}, {0.5, 100}, {0.625, 100}, {0.75, 0}, {0.875, 0}, {0.875, 100}, {0.875, math.NaN()}, {1, math.NaN()}},
+                       addR:  quantile{{0, 102}, {0.125, 102.25}, {0.25, 202.5}, {0.375, 302.75}, {0.375, 202.75}, {0.5, 103}, {0.625, 103.25}, {0.75, 3.5}, {0.875, 3.75}, {0.875, 103.75}, {0.875, 403.75}, {1, 104}},
+                       addS:  quantile{{0, 111}, {0.125, 111}, {0.125, 113}, {0.25, 213}, {0.25, 215}, {0.375, 316}, {0.375, 216}, {0.5, 117}, {0.625, 118}, {0.75, 19}, {0.875, 20}, {0.875, 120}, {0.875, 420}, {1, 121}},
+                       subR:  quantile{{0, 98}, {0.125, 97.75}, {0.25, 197.5}, {0.375, 297.25}, {0.375, 197.25}, {0.5, 97}, {0.625, 96.75}, {0.75, -3.5}, {0.875, -3.75}, {0.875, 96.25}, {0.875, 396.25}, {1, 96}},
+                       subS:  quantile{{0, 89}, {0.125, 89}, {0.125, 87}, {0.25, 187}, {0.25, 185}, {0.375, 284}, {0.375, 184}, {0.5, 83}, {0.625, 82}, {0.75, -19}, {0.875, -20}, {0.875, 80}, {0.875, 380}, {1, 79}},
+                       intSq: 25416.666666666664,
+                       fixed: quantile{{0, 0}, {0.125, 0}, {0.25, 100}, {0.5, 100}, {0.7916666666666666, 200}, {0.9583333333333334, 300}, {1, 400}},
+               },
        }

Jira issue: CRDB-33086

@DrewKimball DrewKimball added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Nov 2, 2023
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Nov 2, 2023
@DrewKimball
Copy link
Collaborator Author

There are likely similar problems with +/-inf as well.

@yuzefovich yuzefovich added the T-sql-queries SQL Queries Team label Nov 2, 2023
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 2, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` values within the function. These indicate a failure to derive
a linear regression model for the observed histograms, and can lead to
internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 2, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
@michae2 michae2 moved this from Triage to Active in SQL Queries Nov 2, 2023
@michae2 michae2 added branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 labels Nov 2, 2023
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 3, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
craig bot pushed a commit that referenced this issue Nov 3, 2023
113712: sql/stats: don't use linear regression with NaN for stats forecasting r=DrewKimball a=DrewKimball

This patch adds a method to the quantile function, `invalid`, which checks for `NaN` values within the function. These indicate a failure to derive a linear regression model for the observed histograms, and can lead to internal errors and panics if not caught. Stats forecasting now checks this method before attempting to use a quantile function to derive a histogram prediction.

Fixes #113645
Fixes #113680

Release note (bug fix): Fixed a bug that could cause internal errors or panics while attempting to forecast statistics on a numeric column.

Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in 050b37c Nov 3, 2023
@github-project-automation github-project-automation bot moved this from Active to Done in SQL Queries Nov 3, 2023
blathers-crl bot pushed a commit that referenced this issue Nov 3, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes #113645
Fixes #113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
blathers-crl bot pushed a commit that referenced this issue Nov 3, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs #113645

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 3, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 3, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 3, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 4, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 4, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
blathers-crl bot pushed a commit that referenced this issue Nov 6, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes #113645
Fixes #113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
blathers-crl bot pushed a commit that referenced this issue Nov 6, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs #113645

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 6, 2023
This patch adds a method to the quantile function, `invalid`, which checks
for `NaN` and `+/-Inf` values within the function. These indicate a failure
to derive a linear regression model for the observed histograms, and can lead
to internal errors and panics if not caught. Stats forecasting now checks this
method before attempting to use a quantile function to derive a histogram
prediction. This patch also reverts 120132e, since `fixMalformed` is no longer
expected to return an error.

Fixes cockroachdb#113645
Fixes cockroachdb#113680

Release note (bug fix): Fixed a bug that could cause internal errors or
panics while attempting to forecast statistics on a numeric column.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 6, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 6, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Nov 6, 2023
This patch adds panic-catching logic to `GetTableStats`, so that "safe"
panics (out-of-bounds, assertion etc.) can be propagated as a returned
error. This allows some code-paths to ignore the returned error, so that
execution can proceed even there's an unexpected error in the stats code.
This prevents situations where it becomes impossible to query a table
because of a stats bug.

Informs cockroachdb#113645

Release note: None
@mgartner
Copy link
Collaborator

mgartner commented Nov 7, 2023

@DrewKimball Is there a work-around for this issue, like disabling forecasting?

@mgartner
Copy link
Collaborator

mgartner commented Nov 7, 2023

We talked IRL. The workaround is to disable stats forecasting with either:

  1. The cluster setting: SET CLUSTER SETTING sql.stats.forecasts.enabled=false;.
  2. The table level setting: ALTER TABLE name_of_table SET (sql_stats_forecasts_enabled = false);.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants