Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: support SHOW HISTOGRAM for forecasted stats #86358

Open
michae2 opened this issue Aug 18, 2022 · 1 comment
Open

sql/stats: support SHOW HISTOGRAM for forecasted stats #86358

michae2 opened this issue Aug 18, 2022 · 1 comment
Assignees
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team X-nostale Marks an issue/pr that should be ignored by the stale bot

Comments

@michae2
Copy link
Collaborator

michae2 commented Aug 18, 2022

Follow up from #79872: we currently support SHOW STATISTICS ... WITH FORECAST but the histogram_id field is always 0 for forecasts, and SHOW HISTOGRAM does not work.

For example:

CREATE TABLE h (a INT PRIMARY KEY) WITH (sql_stats_automatic_collection_enabled = false);
ALTER TABLE h INJECT STATISTICS '[
      {
          "avg_size": 1,
          "columns": [
              "a"
          ],
          "created_at": "2022-08-13 00:00:00.000000",
          "distinct_count": 3,
          "histo_buckets": [
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "4"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "5"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "6"
              }
          ],
          "histo_col_type": "INT8",
          "histo_version": 2,
          "name": "__auto__",
          "null_count": 0,
          "row_count": 3
      },
      {
          "avg_size": 1,
          "columns": [
              "a"
          ],
          "created_at": "2022-08-14 00:00:00.000000",
          "distinct_count": 3,
          "histo_buckets": [
              {
                  "distinct_range": 0,
                  "num_eq": 0,
                  "num_range": 0,
                  "upper_bound": "7"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "8"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "9"
              }
          ],
          "histo_col_type": "INT8",
          "histo_version": 2,
          "name": "__auto__",
          "null_count": 0,
          "row_count": 3
      },
      {
          "avg_size": 1,
          "columns": [
              "a"
          ],
          "created_at": "2022-08-15 00:00:00.000000",
          "distinct_count": 3,
          "histo_buckets": [
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "10"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "11"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "12"
              }
          ],
          "histo_col_type": "INT8",
          "histo_version": 2,
          "name": "__auto__",
          "null_count": 0,
          "row_count": 3
      }
]';
SHOW STATISTICS FOR TABLE h WITH FORECAST;

We can show histograms of the collected statistics, but not the forecasted statistics:

[email protected]:26257/defaultdb> SHOW STATISTICS FOR TABLE h WITH FORECAST;
  statistics_name | column_names |       created       | row_count | distinct_count | null_count | avg_size |    histogram_id
------------------+--------------+---------------------+-----------+----------------+------------+----------+---------------------
  __auto__        | {a}          | 2022-08-13 00:00:00 |         3 |              3 |          0 |        1 | 788759343272853505
  __auto__        | {a}          | 2022-08-14 00:00:00 |         3 |              3 |          0 |        1 | 788759343275540481
  __auto__        | {a}          | 2022-08-15 00:00:00 |         3 |              3 |          0 |        1 | 788759343277932545
  __forecast__    | {a}          | 2022-08-16 00:00:00 |         3 |              3 |          0 |        1 |                  0
(4 rows)


Time: 4ms total (execution 4ms / network 0ms)

[email protected]:26257/defaultdb> SHOW HISTOGRAM 788759343272853505;
  upper_bound | range_rows | distinct_range_rows | equal_rows
--------------+------------+---------------------+-------------
  4           |          0 |                   0 |          1
  5           |          0 |                   0 |          1
  6           |          0 |                   0 |          1
(3 rows)


Time: 2ms total (execution 2ms / network 0ms)

[email protected]:26257/defaultdb> SHOW HISTOGRAM 788759343277932545;
  upper_bound | range_rows | distinct_range_rows | equal_rows
--------------+------------+---------------------+-------------
  10          |          0 |                   0 |          1
  11          |          0 |                   0 |          1
  12          |          0 |                   0 |          1
(3 rows)


Time: 2ms total (execution 2ms / network 0ms)

[email protected]:26257/defaultdb> SHOW HISTOGRAM 0;
ERROR: histogram 0 not found

As a workaround we can use SHOW STATISTICS USING JSON ... WITH FORECAST but it's pretty ugly:

[email protected]:26257/defaultdb> SELECT jsonb_array_elements(stat->'histo_buckets')
FROM (
  SELECT jsonb_array_elements(statistics) AS stat
  FROM [SHOW STATISTICS USING JSON FOR TABLE h WITH FORECAST]
)
WHERE stat->>'name' = '__forecast__';
                           jsonb_array_elements
---------------------------------------------------------------------------
  {"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "13"}
  {"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "14"}
  {"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "15"}
(3 rows)


Time: 4ms total (execution 4ms / network 0ms)

I think the trick to doing this will be to encode the table ID and column ID in the histogram_id.

Jira issue: CRDB-18713

@michae2 michae2 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-table-stats Table statistics (and their automatic refresh). T-sql-queries SQL Queries Team labels Aug 18, 2022
@michae2 michae2 self-assigned this Aug 18, 2022
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

This comment was marked as resolved.

@michae2 michae2 added X-nostale Marks an issue/pr that should be ignored by the stale bot and removed no-issue-activity labels Feb 12, 2024
@michae2 michae2 moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team X-nostale Marks an issue/pr that should be ignored by the stale bot
Projects
Status: Backlog
Development

No branches or pull requests

1 participant