-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: ship high-resolution tscache summaries during lease transfers and range merges #118300
kv: ship high-resolution tscache summaries during lease transfers and range merges #118300
Conversation
042e198
to
7af3033
Compare
71457cc
to
11b3b85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
-- commits
line 14 at r1:
Glad we're no longer imprecise!
-- commits
line 48 at r2:
They're only read during node failure, right? Or are there other cases I'm missing?
pkg/kv/kvserver/replica_tscache.go
line 39 at r1 (raw file):
"budget will result in loss of precision.", 4<<20, /* 4 MB */ )
Did we not want to make this setting public?
pkg/kv/kvserver/replica_tscache.go
line 606 at r1 (raw file):
lease, _ /* nextLease */ := r.GetLease() // Recognize the case where a lease started recently. Lease transfers can // bump the ts cache low water mark. However, in newer versions of the
nit: s/newer versions/versions >= 24.1/g
pkg/kv/kvserver/replica_tscache.go
line 711 at r1 (raw file):
// Serialize the key range and then compress to the budget. seg = tc.Serialize(ctx, start, end) seg.Compress(budget)
We could return a boolean from Compress
, indicating whether we needed to actually compress or not, and use that to increment some metric that keeps track of the number of times we've run into budget constraints. Feel free to do this in a followup though.
pkg/kv/kvserver/client_lease_test.go
line 685 at r1 (raw file):
err = txn.Commit(ctx) require.Regexp(t, `TransactionAbortedError\(ABORT_REASON_NEW_LEASE_PREVENTS_TXN\)`, err)
Is it worth keeping a version of the test around that does run into an ABORT_REASON_NEW_LEASE_PREVENTS_TXN
? Maybe for cases where we've overshot the budget?
11b3b85
to
783e0b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)
Previously, arulajmani (Arul Ajmani) wrote…
Glad we're no longer imprecise!
Agreed. Imagine now you are a customer and you read the word "imprecise"...
Previously, arulajmani (Arul Ajmani) wrote…
They're only read during node failure, right? Or are there other cases I'm missing?
The main place where they're used is actually when a leaseholder determined that it holds the lease through a Raft snapshot.
pkg/kv/kvserver/replica_tscache.go
line 39 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Did we not want to make this setting public?
Good idea. I made both public.
pkg/kv/kvserver/replica_tscache.go
line 606 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: s/newer versions/versions >= 24.1/g
Done.
pkg/kv/kvserver/replica_tscache.go
line 711 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
We could return a boolean from
Compress
, indicating whether we needed to actually compress or not, and use that to increment some metric that keeps track of the number of times we've run into budget constraints. Feel free to do this in a followup though.
Good idea. I added a TODO for now.
pkg/kv/kvserver/client_lease_test.go
line 685 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Is it worth keeping a version of the test around that does run into an
ABORT_REASON_NEW_LEASE_PREVENTS_TXN
? Maybe for cases where we've overshot the budget?
Nice idea! Done.
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
These are rarely read, so it is not worth the extra space to store them with high precision. Instead, maximally compress them before persisting them. Epic: None Release note: None
783e0b3
to
0322959
Compare
bors r+ |
Build succeeded: |
Fixes #61986.
Fixes #117486.
Unblocks #118000.
This commit uses the new
tscache.Cache.Serialize
method introduced in #118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge.This loss of precision was a source of transaction retries for three reasons:
ABORT_REASON_NEW_LEASE_PREVENTS_TXN
reason.RETRY_SERIALIZABLE
reason.RETRY_SERIALIZABLE
reason.The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels.
This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries:
kv.lease_transfer_read_summary.local_budget
kv.lease_transfer_read_summary.global_budget
It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path.
Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.