Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add telemetry for statistics forecast usage #88539

Merged
merged 1 commit into from
Oct 10, 2022

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Sep 23, 2022

Add a few fields to the sampled_query telemetry events that will help us
measure how useful table statistics forecasting is in practice.

Fixes: #86356

Release note (ops change): Add five new fields to the sampled_query
telemetry events:

  • ScanCount: Number of scans in the query plan.
  • ScanWithStatsCount: Number of scans using statistics (including
    forecasted statistics) in the query plan.
  • ScanWithStatsForecastCount: Number of scans using forecasted
    statistics in the query plan.
  • TotalScanRowsWithoutForecastsEstimate: Total number of rows read by
    all scans in the query, as estimated by the optimizer without using
    forecasts.
  • NanosSinceStatsForecasted: The greatest quantity of nanoseconds that
    have passed since the forecast time (or until the forecast time, if it
    is in the future, in which case it will be negative) for any table
    with forecasted stats scanned by this query.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2
Copy link
Collaborator Author

michae2 commented Sep 23, 2022

@rytaft was this the sort of telemetry you had in mind? On the issue I wrote a note about nanos but I forget what we discussed.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this looks good. I think we were maybe discussing the timestamp of the forecast (similar to the nanosSinceStatsCollected item, but perhaps nanosUntilStatsForecast?)

Also, don't forget to update telemetry_logging_test.go

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/opt/exec/execbuilder/relational.go line 756 at r1 (raw file):

		}
		if first < tab.StatisticCount() {
			b.ScanCounts[exec.ScanWithStatsCount]++

Could this be incremented higher up? Right under where b.TotalScanRows is incremented?


pkg/sql/opt/exec/execbuilder/relational.go line 760 at r1 (raw file):

			if stat.IsForecast() {
				b.ScanCounts[exec.ScanWithStatsForecastCount]++
				b.TotalScanWithStatsForecastRows += stats.RowCount

How do you envision this data point being used?

Seems like it might be more useful to collect the total scan rows without forecast, so you can see which one is more accurate. For example, suppose that we have 3 scans, and 2 of them use forecasts:

Scan a:
- estimate without forecast: 10 rows
- estimate with forecast: 20 rows
Scan b:
- estimate without forecast: 10 rows
- estimate with forecast: 30 rows
Scan c:
- estimate without forecast: 10 rows

Then you'd have

TotalScanRows: 60
TotalScanRowsWithoutForecast: 30

pkg/sql/opt/exec/execbuilder/relational.go line 762 at r1 (raw file):

				b.TotalScanWithStatsForecastRows += stats.RowCount
			} else {
				nanosSinceStatsCollected := timeutil.Since(stat.CreatedAt())

Seems like it might still be useful to collect this info -- I guess you'll just need to continue incrementing first until you find a non-forecast?

@michae2 michae2 marked this pull request as ready for review October 3, 2022 23:39
@michae2 michae2 requested a review from a team October 3, 2022 23:39
@michae2 michae2 requested a review from a team as a code owner October 3, 2022 23:39
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's right. Ok, now there is a NanosSinceStatsForecasted (it's the max absolute value, so if the forecast time is in the future it will also be positive).

And now there's a test. I think this is RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/exec/execbuilder/relational.go line 756 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Could this be incremented higher up? Right under where b.TotalScanRows is incremented?

Yes, done.


pkg/sql/opt/exec/execbuilder/relational.go line 760 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

How do you envision this data point being used?

Seems like it might be more useful to collect the total scan rows without forecast, so you can see which one is more accurate. For example, suppose that we have 3 scans, and 2 of them use forecasts:

Scan a:
- estimate without forecast: 10 rows
- estimate with forecast: 20 rows
Scan b:
- estimate without forecast: 10 rows
- estimate with forecast: 30 rows
Scan c:
- estimate without forecast: 10 rows

Then you'd have

TotalScanRows: 60
TotalScanRowsWithoutForecast: 30

Good point! I've changed this to now estimate row count without forecast. I've tried to make the calculation the same as would be produced by statisticsBuilder.buildScan.


pkg/sql/opt/exec/execbuilder/relational.go line 762 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Seems like it might still be useful to collect this info -- I guess you'll just need to continue incrementing first until you find a non-forecast?

Done. (I've changed the structure of the code a bit so that first we calculate all forecast metrics, then we calculate all non-forecast metrics.)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Reviewed 8 of 8 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/instrumentation.go line 177 at r2 (raw file):

	// nanosSinceStatsForecasted is the maximum number of nanoseconds that have
	// passed since the forecast time (or until the forecast time, if it is in the
	// future) for any table with forecasted stats scanned by this query.

How will we know whether it's in the past or the future?


pkg/sql/telemetry_logging_test.go line 902 at r2 (raw file):

}

// TestTelemetryScanCounts

this comment needs a few more words :)

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)


pkg/sql/instrumentation.go line 177 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

How will we know whether it's in the past or the future?

We won't. My thinking was:

  • too far in the future is equally bad as too far in the past
  • if we compare this metric across different queries, we want to be able to compare with < rather than abs() < abs()
  • if we aggregate this metric with sum or avg, we don't want equally-wrong-but-opposite values to cancel out to 0

WDYT?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/instrumentation.go line 177 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

We won't. My thinking was:

  • too far in the future is equally bad as too far in the past
  • if we compare this metric across different queries, we want to be able to compare with < rather than abs() < abs()
  • if we aggregate this metric with sum or avg, we don't want equally-wrong-but-opposite values to cancel out to 0

WDYT?

I agree that too far in the future is equally bad as too far in the past, but it seems like using abs for calculations isn't the end of the world. And it might be useful to know if future stats are causing more of a problem for some reason than past stats (or vice versa). As it is now, we are losing this information.

@dhartunian dhartunian removed the request for review from a team October 6, 2022 15:17
@michae2 michae2 force-pushed the forecasttelem branch 2 times, most recently from d4cae10 to 0ccda97 Compare October 7, 2022 05:20
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/instrumentation.go line 177 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I agree that too far in the future is equally bad as too far in the past, but it seems like using abs for calculations isn't the end of the world. And it might be useful to know if future stats are causing more of a problem for some reason than past stats (or vice versa). As it is now, we are losing this information.

OK, done.


pkg/sql/telemetry_logging_test.go line 902 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this comment needs a few more words :)

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

@michae2
Copy link
Collaborator Author

michae2 commented Oct 8, 2022

TFYR!

bors r=rytaft

@michae2
Copy link
Collaborator Author

michae2 commented Oct 8, 2022

bors r-

@craig
Copy link
Contributor

craig bot commented Oct 8, 2022

Canceled.

Add a few fields to the sampled_query telemetry events that will help us
measure how useful table statistics forecasting is in practice.

Fixes: cockroachdb#86356

Release note (ops change): Add five new fields to the sampled_query
telemetry events:
- `ScanCount`: Number of scans in the query plan.
- `ScanWithStatsCount`: Number of scans using statistics (including
  forecasted statistics) in the query plan.
- `ScanWithStatsForecastCount`: Number of scans using forecasted
  statistics in the query plan.
- `TotalScanRowsWithoutForecastsEstimate`: Total number of rows read by
  all scans in the query, as estimated by the optimizer without using
  forecasts.
- `NanosSinceStatsForecasted`: The greatest quantity of nanoseconds that
  have passed since the forecast time (or until the forecast time, if it
  is in the future, in which case it will be negative) for any table
  with forecasted stats scanned by this query.
@michae2
Copy link
Collaborator Author

michae2 commented Oct 8, 2022

Let's try that again.

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Oct 8, 2022

Build failed:

@michae2
Copy link
Collaborator Author

michae2 commented Oct 10, 2022

Looks like a flake.

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Oct 10, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 10, 2022

Build succeeded:

@craig craig bot merged commit 94e2a2b into cockroachdb:master Oct 10, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 10, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from ab38868 to blathers/backport-release-22.2-88539: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/exec/execbuilder/relational.go line 753 at r4 (raw file):

	// Save some instrumentation info.
	b.ScanCounts[exec.ScanCount]++

I'm reviewing another PR #96358 and stumbled upon the changes in this code. Currently b.ScanCounts only tracks the number of scans (ignoring all other possible "disk read" operators like index joins, etc). I'm assuming this is expected and desirable here but thought I'd ask.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/exec/execbuilder/relational.go line 753 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm reviewing another PR #96358 and stumbled upon the changes in this code. Currently b.ScanCounts only tracks the number of scans (ignoring all other possible "disk read" operators like index joins, etc). I'm assuming this is expected and desirable here but thought I'd ask.

(Wrong PR link #98002)

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/exec/execbuilder/relational.go line 753 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

(Wrong PR link #98002)

Yes, it was expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/opt: add telemetry for statistics forecast usage
4 participants