Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: ship high-resolution tscache summaries during lease transfers and range merges #118300

Merged
merged 2 commits into from
Feb 2, 2024

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jan 25, 2024

Fixes #61986.
Fixes #117486.
Unblocks #118000.

This commit uses the new tscache.Cache.Serialize method introduced in #118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge.

This loss of precision was a source of transaction retries for three reasons:

  1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the ABORT_REASON_NEW_LEASE_PREVENTS_TXN reason.
  2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the RETRY_SERIALIZABLE reason.
  3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the RETRY_SERIALIZABLE reason.

The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries:

  • kv.lease_transfer_read_summary.local_budget
  • kv.lease_transfer_read_summary.global_budget

It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path.

Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tsCacheShip branch 3 times, most recently from 042e198 to 7af3033 Compare January 26, 2024 06:26
@nvanbenschoten nvanbenschoten marked this pull request as ready for review January 26, 2024 06:26
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 26, 2024 06:26
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tsCacheShip branch 2 times, most recently from 71457cc to 11b3b85 Compare January 31, 2024 22:47
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to see this land! :lgtm:

Reviewed 4 of 4 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


-- commits line 14 at r1:
Glad we're no longer imprecise!


-- commits line 48 at r2:
They're only read during node failure, right? Or are there other cases I'm missing?


pkg/kv/kvserver/replica_tscache.go line 39 at r1 (raw file):

		"budget will result in loss of precision.",
	4<<20, /* 4 MB */
)

Did we not want to make this setting public?


pkg/kv/kvserver/replica_tscache.go line 606 at r1 (raw file):

			lease, _ /* nextLease */ := r.GetLease()
			// Recognize the case where a lease started recently. Lease transfers can
			// bump the ts cache low water mark. However, in newer versions of the

nit: s/newer versions/versions >= 24.1/g


pkg/kv/kvserver/replica_tscache.go line 711 at r1 (raw file):

			// Serialize the key range and then compress to the budget.
			seg = tc.Serialize(ctx, start, end)
			seg.Compress(budget)

We could return a boolean from Compress, indicating whether we needed to actually compress or not, and use that to increment some metric that keeps track of the number of times we've run into budget constraints. Feel free to do this in a followup though.


pkg/kv/kvserver/client_lease_test.go line 685 at r1 (raw file):

	err = txn.Commit(ctx)
	require.Regexp(t, `TransactionAbortedError\(ABORT_REASON_NEW_LEASE_PREVENTS_TXN\)`, err)

Is it worth keeping a version of the test around that does run into an ABORT_REASON_NEW_LEASE_PREVENTS_TXN? Maybe for cases where we've overshot the budget?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tsCacheShip branch from 11b3b85 to 783e0b3 Compare February 1, 2024 21:01
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)


-- commits line 14 at r1:

Previously, arulajmani (Arul Ajmani) wrote…

Glad we're no longer imprecise!

Agreed. Imagine now you are a customer and you read the word "imprecise"...


-- commits line 48 at r2:

Previously, arulajmani (Arul Ajmani) wrote…

They're only read during node failure, right? Or are there other cases I'm missing?

The main place where they're used is actually when a leaseholder determined that it holds the lease through a Raft snapshot.


pkg/kv/kvserver/replica_tscache.go line 39 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Did we not want to make this setting public?

Good idea. I made both public.


pkg/kv/kvserver/replica_tscache.go line 606 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: s/newer versions/versions >= 24.1/g

Done.


pkg/kv/kvserver/replica_tscache.go line 711 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

We could return a boolean from Compress, indicating whether we needed to actually compress or not, and use that to increment some metric that keeps track of the number of times we've run into budget constraints. Feel free to do this in a followup though.

Good idea. I added a TODO for now.


pkg/kv/kvserver/client_lease_test.go line 685 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Is it worth keeping a version of the test around that does run into an ABORT_REASON_NEW_LEASE_PREVENTS_TXN? Maybe for cases where we've overshot the budget?

Nice idea! Done.

… range merges

Fixes cockroachdb#61986.
Fixes cockroachdb#117486.
Unblocks cockroachdb#118000.

This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to
ship high-resolution summaries of the timestamp cache during lease transfers and
ranges merges. In doing so, it eliminates the loss of precision that occurs in an
incoming leaseholder's timestamp cache when it receives a lease transfer or range
merge.

This loss of precision was a source of transaction retries for three reasons:
1. txn tombstone marker keys would have their timestamp advanced, leading to
   TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN`
   reason.
2. txn push marker keys would have their timestamp advanced, leading to
   transactions having their commit timestamp pushed, which could lead to
   TransactionRetryError with the `RETRY_SERIALIZABLE` reason.
3. global keys would have their timestamp advanced as if they had been read,
   also leading to transactions having their commit timestamp pushed if they
   wrote to those keys, which could also lead to TransactionRetryError with the
   `RETRY_SERIALIZABLE` reason.

The first issue here is the most severe, both because it can not be refreshed
away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of
these timestamp cache read summaries:
- `kv.lease_transfer_read_summary.local_budget`
- `kv.lease_transfer_read_summary.global_budget`

It configures the local keyspace budget to 4MB and the global keyspace budget to
0B. This default configuration should be sufficient to eliminate the first two
sources of retries described above. The third has not been observed as a serious
issue in practice, so we configure the global budget to 0 so that we can hit a
serialization fast-path.

Release note (ops change): Transaction replay protection state is now
passed between the outgoing and incoming leaseholder for a range during
a lease transfer. This avoids cases where lease transfers can cause
transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
These are rarely read, so it is not worth the extra space to store them
with high precision. Instead, maximally compress them before persisting
them.

Epic: None
Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tsCacheShip branch from 783e0b3 to 0322959 Compare February 1, 2024 21:22
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 2, 2024

Build succeeded:

@craig craig bot merged commit 3d7a44a into cockroachdb:master Feb 2, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants