-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: record ScanStats once per BatchRequest #97442
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
fba2dd5
to
3ea67ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)
pkg/kv/kvserver/replica_evaluate.go
line 250 at r1 (raw file):
// record the scan stats for all batches, even if there are no // Gets nor Scans? This will avoid the need to figure out the // value of foundGetOrScan.
The RecordStructured
method doesn't own it's argument - maybe we could keep a ScanStats
struct somewhere further up the stack (Replica
maybe) and pass it into evaluateBatch
? I think we can be confident that it would be cheap enough to unconditionally record in that case. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @sumeerbhola)
pkg/kv/kvserver/replica_evaluate.go
line 250 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
The
RecordStructured
method doesn't own it's argument - maybe we could keep aScanStats
struct somewhere further up the stack (Replica
maybe) and pass it intoevaluateBatch
? I think we can be confident that it would be cheap enough to unconditionally record in that case. What do you think?
Are you suggesting to effectively reduce allocations further? It doesn't seem worth it to me - a single kvpb.ScanStats
allocation per BatchRequest seems reasonable. My comment here is about whether we should remove the conditional on foundGetOrScan
and just call sp.RecordStructured(scanStats)
in all cases, even for write-only batches (for which the object will contain all zeros). That shouldn't be a performance concern either, I'm thinking more about this being confusing - i.e. a write-only BatchRequest records the ScanStats into the trace (perhaps with all zeros it wouldn't be confusing). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)
pkg/kv/kvserver/replica_evaluate.go
line 250 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Are you suggesting to effectively reduce allocations further? It doesn't seem worth it to me - a single
kvpb.ScanStats
allocation per BatchRequest seems reasonable. My comment here is about whether we should remove the conditional onfoundGetOrScan
and just callsp.RecordStructured(scanStats)
in all cases, even for write-only batches (for which the object will contain all zeros). That shouldn't be a performance concern either, I'm thinking more about this being confusing - i.e. a write-only BatchRequest records the ScanStats into the trace (perhaps with all zeros it wouldn't be confusing). Thoughts?
If we aren't concerned with allocating the struct in those cases, I'd be in favor of reducing the code complexity. Since we're now aggregating over multiple requests, what do you think about tracking the number of scan requests and get requests in ScanStats
? This could be useful when viewing traces, and could be used to avoid recording anything when there are no gets or scans.
Previously, we would record a `kvpb.ScanStats` object into the trace for each evaluated Get, Scan, and ReverseScan command. This was suboptimal for two reasons: - this required an allocation of that `kvpb.ScanStats` object - this required propagating all of these separate objects via the tracing infrastructure which might make it so that the tracing limits are reached resulting in some objects being dropped. This commit, instead, changes the ScanStats to be tracked at the BatchRequest level, thus, we only need to record a single object per BatchRequest. This results in reduced granularity, but that is still sufficient for the SQL needs which simply aggregates all `kvpb.ScanStats` from a single SQL processor into one object. As a result, the tpch_concurrency metric averaged over 20 runs increased from 76.75 to 84.75. Additionally, this commit makes it so that we track the number of Gets, Scans, and ReverseScans actually evaluated as part of the BatchResponse. This information is plumbed through a couple of protos but is not exposed in any SQL Observability virtual tables. Still, due to having it in the protos will include this information into the trace. Release note: None
3ea67ec
to
1154be5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @sumeerbhola)
pkg/kv/kvserver/replica_evaluate.go
line 250 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
If we aren't concerned with allocating the struct in those cases, I'd be in favor of reducing the code complexity. Since we're now aggregating over multiple requests, what do you think about tracking the number of scan requests and get requests in
ScanStats
? This could be useful when viewing traces, and could be used to avoid recording anything when there are no gets or scans.
Thanks, I like this idea, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 8 files at r1, 8 of 8 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
@cockroachdb/storage @cockroachdb/kv-prs does some want to give this a look? The changes seem pretty straightforward, so I don't think it's strictly necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok. But do we have any test coverage (I realize this lack of coverage probably predates your change, so sorry to put you on the spot)? If not can you add a test that verifies the trace output?
Reviewed 1 of 8 files at r1, 2 of 8 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
We do have a test for this TFTRs! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 1154be5 to blathers/backport-release-22.2-97442: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, we would record a
kvpb.ScanStats
object into the trace foreach evaluated Get, Scan, and ReverseScan command. This was suboptimal
for two reasons:
kvpb.ScanStats
objecttracing infrastructure which might make it so that the tracing limits
are reached resulting in some objects being dropped.
This commit, instead, changes the ScanStats to be tracked at the
BatchRequest level, thus, we only need to record a single object per
BatchRequest. This results in reduced granularity, but that is still
sufficient for the SQL needs which simply aggregates all
kvpb.ScanStats
from a single SQL processor into one object. Asa result, the tpch_concurrency metric averaged over 20 runs increased
from 76.75 to 84.75.
Additionally, this commit makes it so that we track the number of Gets,
Scans, and ReverseScans actually evaluated as part of the BatchResponse.
This information is plumbed through a couple of protos but is not
exposed in any SQL Observability virtual tables. Still, due to having it
in the protos will include this information into the trace.
Informs: #64906.
Fixes: #71351.
Release note: None