sql: add telemetry for statistics forecast usage #88539

michae2 · 2022-09-23T05:28:48Z

Add a few fields to the sampled_query telemetry events that will help us
measure how useful table statistics forecasting is in practice.

Fixes: #86356

Release note (ops change): Add five new fields to the sampled_query
telemetry events:

ScanCount: Number of scans in the query plan.
ScanWithStatsCount: Number of scans using statistics (including
forecasted statistics) in the query plan.
ScanWithStatsForecastCount: Number of scans using forecasted
statistics in the query plan.
TotalScanRowsWithoutForecastsEstimate: Total number of rows read by
all scans in the query, as estimated by the optimizer without using
forecasts.
NanosSinceStatsForecasted: The greatest quantity of nanoseconds that
have passed since the forecast time (or until the forecast time, if it
is in the future, in which case it will be negative) for any table
with forecasted stats scanned by this query.

cockroach-teamcity · 2022-09-23T05:28:54Z

This change is

michae2 · 2022-09-23T14:05:23Z

@rytaft was this the sort of telemetry you had in mind? On the issue I wrote a note about nanos but I forget what we discussed.

rytaft

Yea this looks good. I think we were maybe discussing the timestamp of the forecast (similar to the nanosSinceStatsCollected item, but perhaps nanosUntilStatsForecast?)

Also, don't forget to update telemetry_logging_test.go

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

pkg/sql/opt/exec/execbuilder/relational.go line 756 at r1 (raw file):

		}
		if first < tab.StatisticCount() {
			b.ScanCounts[exec.ScanWithStatsCount]++

Could this be incremented higher up? Right under where b.TotalScanRows is incremented?

pkg/sql/opt/exec/execbuilder/relational.go line 760 at r1 (raw file):

			if stat.IsForecast() {
				b.ScanCounts[exec.ScanWithStatsForecastCount]++
				b.TotalScanWithStatsForecastRows += stats.RowCount

How do you envision this data point being used?

Seems like it might be more useful to collect the total scan rows without forecast, so you can see which one is more accurate. For example, suppose that we have 3 scans, and 2 of them use forecasts:

Scan a:
- estimate without forecast: 10 rows
- estimate with forecast: 20 rows
Scan b:
- estimate without forecast: 10 rows
- estimate with forecast: 30 rows
Scan c:
- estimate without forecast: 10 rows

Then you'd have

TotalScanRows: 60
TotalScanRowsWithoutForecast: 30

pkg/sql/opt/exec/execbuilder/relational.go line 762 at r1 (raw file):

				b.TotalScanWithStatsForecastRows += stats.RowCount
			} else {
				nanosSinceStatsCollected := timeutil.Since(stat.CreatedAt())

Seems like it might still be useful to collect this info -- I guess you'll just need to continue incrementing first until you find a non-forecast?

michae2

Ah, that's right. Ok, now there is a NanosSinceStatsForecasted (it's the max absolute value, so if the forecast time is in the future it will also be positive).

And now there's a test. I think this is RFAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/opt/exec/execbuilder/relational.go line 756 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Could this be incremented higher up? Right under where b.TotalScanRows is incremented?

Yes, done.

pkg/sql/opt/exec/execbuilder/relational.go line 760 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

How do you envision this data point being used?

Seems like it might be more useful to collect the total scan rows without forecast, so you can see which one is more accurate. For example, suppose that we have 3 scans, and 2 of them use forecasts:
Scan a:
- estimate without forecast: 10 rows
- estimate with forecast: 20 rows
Scan b:
- estimate without forecast: 10 rows
- estimate with forecast: 30 rows
Scan c:
- estimate without forecast: 10 rows
Then you'd have
TotalScanRows: 60
TotalScanRowsWithoutForecast: 30

Good point! I've changed this to now estimate row count without forecast. I've tried to make the calculation the same as would be produced by statisticsBuilder.buildScan.

pkg/sql/opt/exec/execbuilder/relational.go line 762 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Seems like it might still be useful to collect this info -- I guess you'll just need to continue incrementing first until you find a non-forecast?

Done. (I've changed the structure of the code a bit so that first we calculate all forecast metrics, then we calculate all non-forecast metrics.)

rytaft

Great!

Reviewed 8 of 8 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

pkg/sql/instrumentation.go line 177 at r2 (raw file):

	// nanosSinceStatsForecasted is the maximum number of nanoseconds that have
	// passed since the forecast time (or until the forecast time, if it is in the
	// future) for any table with forecasted stats scanned by this query.

How will we know whether it's in the past or the future?

pkg/sql/telemetry_logging_test.go line 902 at r2 (raw file):

}

// TestTelemetryScanCounts

this comment needs a few more words :)

michae2

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)

pkg/sql/instrumentation.go line 177 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

How will we know whether it's in the past or the future?

We won't. My thinking was:

too far in the future is equally bad as too far in the past
if we compare this metric across different queries, we want to be able to compare with < rather than abs() < abs()
if we aggregate this metric with sum or avg, we don't want equally-wrong-but-opposite values to cancel out to 0

WDYT?

rytaft

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

pkg/sql/instrumentation.go line 177 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

We won't. My thinking was:

too far in the future is equally bad as too far in the past

if we compare this metric across different queries, we want to be able to compare with < rather than abs() < abs()

if we aggregate this metric with sum or avg, we don't want equally-wrong-but-opposite values to cancel out to 0

WDYT?

I agree that too far in the future is equally bad as too far in the past, but it seems like using abs for calculations isn't the end of the world. And it might be useful to know if future stats are causing more of a problem for some reason than past stats (or vice versa). As it is now, we are losing this information.

michae2

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/instrumentation.go line 177 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I agree that too far in the future is equally bad as too far in the past, but it seems like using abs for calculations isn't the end of the world. And it might be useful to know if future stats are causing more of a problem for some reason than past stats (or vice versa). As it is now, we are losing this information.

OK, done.

pkg/sql/telemetry_logging_test.go line 902 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this comment needs a few more words :)

Done.

rytaft

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

michae2 · 2022-10-08T03:46:26Z

TFYR!

bors r=rytaft

michae2 · 2022-10-08T03:47:00Z

bors r-

craig · 2022-10-08T03:47:02Z

Canceled.

Add a few fields to the sampled_query telemetry events that will help us measure how useful table statistics forecasting is in practice. Fixes: cockroachdb#86356 Release note (ops change): Add five new fields to the sampled_query telemetry events: - `ScanCount`: Number of scans in the query plan. - `ScanWithStatsCount`: Number of scans using statistics (including forecasted statistics) in the query plan. - `ScanWithStatsForecastCount`: Number of scans using forecasted statistics in the query plan. - `TotalScanRowsWithoutForecastsEstimate`: Total number of rows read by all scans in the query, as estimated by the optimizer without using forecasts. - `NanosSinceStatsForecasted`: The greatest quantity of nanoseconds that have passed since the forecast time (or until the forecast time, if it is in the future, in which case it will be negative) for any table with forecasted stats scanned by this query.

michae2 · 2022-10-08T04:24:31Z

Let's try that again.

bors r=rytaft

craig · 2022-10-08T05:11:03Z

Build failed:

Bazel Essential CI (Cockroach)

michae2 · 2022-10-10T14:24:31Z

Looks like a flake.

bors r=rytaft

craig · 2022-10-10T15:35:40Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-10-10T17:06:58Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2022-10-10T17:07:06Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from ab38868 to blathers/backport-release-22.2-88539: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/opt/exec/execbuilder/relational.go line 753 at r4 (raw file):

	// Save some instrumentation info.
	b.ScanCounts[exec.ScanCount]++

I'm reviewing another PR #96358 and stumbled upon the changes in this code. Currently b.ScanCounts only tracks the number of scans (ignoring all other possible "disk read" operators like index joins, etc). I'm assuming this is expected and desirable here but thought I'd ask.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/opt/exec/execbuilder/relational.go line 753 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm reviewing another PR #96358 and stumbled upon the changes in this code. Currently b.ScanCounts only tracks the number of scans (ignoring all other possible "disk read" operators like index joins, etc). I'm assuming this is expected and desirable here but thought I'd ask.

(Wrong PR link #98002)

michae2

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/opt/exec/execbuilder/relational.go line 753 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

(Wrong PR link #98002)

Yes, it was expected.

rytaft reviewed Sep 26, 2022

View reviewed changes

michae2 force-pushed the forecasttelem branch from b1170e3 to ff9f448 Compare October 3, 2022 23:37

michae2 marked this pull request as ready for review October 3, 2022 23:39

michae2 requested a review from a team October 3, 2022 23:39

michae2 requested a review from a team as a code owner October 3, 2022 23:39

michae2 commented Oct 3, 2022

View reviewed changes

michae2 added the backport-22.2.x label Oct 4, 2022

rytaft reviewed Oct 4, 2022

View reviewed changes

michae2 commented Oct 4, 2022

View reviewed changes

rytaft reviewed Oct 5, 2022

View reviewed changes

dhartunian removed the request for review from a team October 6, 2022 15:17

michae2 force-pushed the forecasttelem branch 2 times, most recently from d4cae10 to 0ccda97 Compare October 7, 2022 05:20

michae2 commented Oct 7, 2022

View reviewed changes

rytaft approved these changes Oct 7, 2022

View reviewed changes

michae2 force-pushed the forecasttelem branch from 0ccda97 to ab38868 Compare October 8, 2022 04:22

craig bot merged commit 94e2a2b into cockroachdb:master Oct 10, 2022

cockroach-teamcity mentioned this pull request Oct 10, 2022

PR #88539 - sql: add telemetry for statistics forecast usage cockroachdb/docs#15328

Open

michae2 deleted the forecasttelem branch October 11, 2022 02:44

michae2 mentioned this pull request Oct 12, 2022

release-22.2: sql: add telemetry for statistics forecast usage #89852

Merged

yuzefovich reviewed Mar 20, 2023

View reviewed changes

michae2 commented Mar 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add telemetry for statistics forecast usage #88539

sql: add telemetry for statistics forecast usage #88539

michae2 commented Sep 23, 2022 •

edited

Loading

cockroach-teamcity commented Sep 23, 2022

michae2 commented Sep 23, 2022

rytaft left a comment

michae2 left a comment

rytaft left a comment

michae2 left a comment

rytaft left a comment

michae2 left a comment

rytaft left a comment

michae2 commented Oct 8, 2022

michae2 commented Oct 8, 2022

craig bot commented Oct 8, 2022

michae2 commented Oct 8, 2022

craig bot commented Oct 8, 2022

michae2 commented Oct 10, 2022

craig bot commented Oct 10, 2022

craig bot commented Oct 10, 2022

blathers-crl bot commented Oct 10, 2022

yuzefovich left a comment

yuzefovich left a comment

michae2 left a comment

sql: add telemetry for statistics forecast usage #88539

sql: add telemetry for statistics forecast usage #88539

Conversation

michae2 commented Sep 23, 2022 • edited Loading

cockroach-teamcity commented Sep 23, 2022

michae2 commented Sep 23, 2022

rytaft left a comment

Choose a reason for hiding this comment

michae2 left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

michae2 left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

michae2 left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

michae2 commented Oct 8, 2022

michae2 commented Oct 8, 2022

craig bot commented Oct 8, 2022

michae2 commented Oct 8, 2022

craig bot commented Oct 8, 2022

michae2 commented Oct 10, 2022

craig bot commented Oct 10, 2022

craig bot commented Oct 10, 2022

blathers-crl bot commented Oct 10, 2022

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

michae2 left a comment

Choose a reason for hiding this comment

michae2 commented Sep 23, 2022 •

edited

Loading