-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add telemetry for statistics forecast usage #88539
Conversation
@rytaft was this the sort of telemetry you had in mind? On the issue I wrote a note about nanos but I forget what we discussed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this looks good. I think we were maybe discussing the timestamp of the forecast (similar to the nanosSinceStatsCollected
item, but perhaps nanosUntilStatsForecast
?)
Also, don't forget to update telemetry_logging_test.go
Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/sql/opt/exec/execbuilder/relational.go
line 756 at r1 (raw file):
} if first < tab.StatisticCount() { b.ScanCounts[exec.ScanWithStatsCount]++
Could this be incremented higher up? Right under where b.TotalScanRows
is incremented?
pkg/sql/opt/exec/execbuilder/relational.go
line 760 at r1 (raw file):
if stat.IsForecast() { b.ScanCounts[exec.ScanWithStatsForecastCount]++ b.TotalScanWithStatsForecastRows += stats.RowCount
How do you envision this data point being used?
Seems like it might be more useful to collect the total scan rows without forecast, so you can see which one is more accurate. For example, suppose that we have 3 scans, and 2 of them use forecasts:
Scan a:
- estimate without forecast: 10 rows
- estimate with forecast: 20 rows
Scan b:
- estimate without forecast: 10 rows
- estimate with forecast: 30 rows
Scan c:
- estimate without forecast: 10 rows
Then you'd have
TotalScanRows: 60
TotalScanRowsWithoutForecast: 30
pkg/sql/opt/exec/execbuilder/relational.go
line 762 at r1 (raw file):
b.TotalScanWithStatsForecastRows += stats.RowCount } else { nanosSinceStatsCollected := timeutil.Since(stat.CreatedAt())
Seems like it might still be useful to collect this info -- I guess you'll just need to continue incrementing first
until you find a non-forecast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's right. Ok, now there is a NanosSinceStatsForecasted
(it's the max absolute value, so if the forecast time is in the future it will also be positive).
And now there's a test. I think this is RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/exec/execbuilder/relational.go
line 756 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Could this be incremented higher up? Right under where
b.TotalScanRows
is incremented?
Yes, done.
pkg/sql/opt/exec/execbuilder/relational.go
line 760 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
How do you envision this data point being used?
Seems like it might be more useful to collect the total scan rows without forecast, so you can see which one is more accurate. For example, suppose that we have 3 scans, and 2 of them use forecasts:
Scan a: - estimate without forecast: 10 rows - estimate with forecast: 20 rows Scan b: - estimate without forecast: 10 rows - estimate with forecast: 30 rows Scan c: - estimate without forecast: 10 rows
Then you'd have
TotalScanRows: 60 TotalScanRowsWithoutForecast: 30
Good point! I've changed this to now estimate row count without forecast. I've tried to make the calculation the same as would be produced by statisticsBuilder.buildScan
.
pkg/sql/opt/exec/execbuilder/relational.go
line 762 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Seems like it might still be useful to collect this info -- I guess you'll just need to continue incrementing
first
until you find a non-forecast?
Done. (I've changed the structure of the code a bit so that first we calculate all forecast metrics, then we calculate all non-forecast metrics.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Reviewed 8 of 8 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/sql/instrumentation.go
line 177 at r2 (raw file):
// nanosSinceStatsForecasted is the maximum number of nanoseconds that have // passed since the forecast time (or until the forecast time, if it is in the // future) for any table with forecasted stats scanned by this query.
How will we know whether it's in the past or the future?
pkg/sql/telemetry_logging_test.go
line 902 at r2 (raw file):
} // TestTelemetryScanCounts
this comment needs a few more words :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)
pkg/sql/instrumentation.go
line 177 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
How will we know whether it's in the past or the future?
We won't. My thinking was:
- too far in the future is equally bad as too far in the past
- if we compare this metric across different queries, we want to be able to compare with < rather than abs() < abs()
- if we aggregate this metric with sum or avg, we don't want equally-wrong-but-opposite values to cancel out to 0
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/sql/instrumentation.go
line 177 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
We won't. My thinking was:
- too far in the future is equally bad as too far in the past
- if we compare this metric across different queries, we want to be able to compare with < rather than abs() < abs()
- if we aggregate this metric with sum or avg, we don't want equally-wrong-but-opposite values to cancel out to 0
WDYT?
I agree that too far in the future is equally bad as too far in the past, but it seems like using abs
for calculations isn't the end of the world. And it might be useful to know if future stats are causing more of a problem for some reason than past stats (or vice versa). As it is now, we are losing this information.
d4cae10
to
0ccda97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/instrumentation.go
line 177 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I agree that too far in the future is equally bad as too far in the past, but it seems like using
abs
for calculations isn't the end of the world. And it might be useful to know if future stats are causing more of a problem for some reason than past stats (or vice versa). As it is now, we are losing this information.
OK, done.
pkg/sql/telemetry_logging_test.go
line 902 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this comment needs a few more words :)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2)
TFYR! bors r=rytaft |
bors r- |
Canceled. |
Add a few fields to the sampled_query telemetry events that will help us measure how useful table statistics forecasting is in practice. Fixes: cockroachdb#86356 Release note (ops change): Add five new fields to the sampled_query telemetry events: - `ScanCount`: Number of scans in the query plan. - `ScanWithStatsCount`: Number of scans using statistics (including forecasted statistics) in the query plan. - `ScanWithStatsForecastCount`: Number of scans using forecasted statistics in the query plan. - `TotalScanRowsWithoutForecastsEstimate`: Total number of rows read by all scans in the query, as estimated by the optimizer without using forecasts. - `NanosSinceStatsForecasted`: The greatest quantity of nanoseconds that have passed since the forecast time (or until the forecast time, if it is in the future, in which case it will be negative) for any table with forecasted stats scanned by this query.
Let's try that again. bors r=rytaft |
Build failed: |
Looks like a flake. bors r=rytaft |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from ab38868 to blathers/backport-release-22.2-88539: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/exec/execbuilder/relational.go
line 753 at r4 (raw file):
// Save some instrumentation info. b.ScanCounts[exec.ScanCount]++
I'm reviewing another PR #96358 and stumbled upon the changes in this code. Currently b.ScanCounts
only tracks the number of scans (ignoring all other possible "disk read" operators like index joins, etc). I'm assuming this is expected and desirable here but thought I'd ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/exec/execbuilder/relational.go
line 753 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'm reviewing another PR #96358 and stumbled upon the changes in this code. Currently
b.ScanCounts
only tracks the number of scans (ignoring all other possible "disk read" operators like index joins, etc). I'm assuming this is expected and desirable here but thought I'd ask.
(Wrong PR link #98002)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/exec/execbuilder/relational.go
line 753 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
(Wrong PR link #98002)
Yes, it was expected.
Add a few fields to the sampled_query telemetry events that will help us
measure how useful table statistics forecasting is in practice.
Fixes: #86356
Release note (ops change): Add five new fields to the sampled_query
telemetry events:
ScanCount
: Number of scans in the query plan.ScanWithStatsCount
: Number of scans using statistics (includingforecasted statistics) in the query plan.
ScanWithStatsForecastCount
: Number of scans using forecastedstatistics in the query plan.
TotalScanRowsWithoutForecastsEstimate
: Total number of rows read byall scans in the query, as estimated by the optimizer without using
forecasts.
NanosSinceStatsForecasted
: The greatest quantity of nanoseconds thathave passed since the forecast time (or until the forecast time, if it
is in the future, in which case it will be negative) for any table
with forecasted stats scanned by this query.