Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: run Read Committed variants of TPC-C without txn retry loops #118000

Merged

Conversation

nvanbenschoten
Copy link
Member

Closes #115191.
Depends on #61986.

This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the --txn-retries flag introduced in #117096. With #117630 and #61986 resolved (the latter of which is still in review and under development), these tests both pass.

Release note: None

@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 19, 2024 22:27
@nvanbenschoten nvanbenschoten requested review from srosenberg and DarrylWong and removed request for a team January 19, 2024 22:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As exciting as a 2 line change gets! 🔥

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @michae2, and @srosenberg)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Here we go! rubs hands together

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong and @srosenberg)

Closes cockroachdb#115191.
Depends on cockroachdb#61986.

This commit switches the two nightly Read Committed variants of the TPC-C
roachtest to run without transaction retry loops, using the `--txn-retries` flag
introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which
is still in review and under development), these tests both pass.

Release note: None
This commit increases the vmodule level around transaction
pushes so that if we do see a transaction retry error, we
can debug it.

Epic: None
Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/rcNoRetryLoops branch from cf60393 to c57ce62 Compare January 24, 2024 18:46
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 29, 2024
… range merges

Fixes cockroachdb#61986.
Fixes cockroachdb#117486.
Unblocks cockroachdb#118000.

This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to
ship high-resolution summaries of the timestamp cache during lease transfers and
ranges merges. In doing so, it eliminates the loss of precision that occurs in an
incoming leaseholder's timestamp cache when it receives a lease transfer or range
merge.

This loss of precision was a source of transaction retries for three reasons:
1. txn tombstone marker keys would have their timestamp advanced, leading to
   TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN`
   reason.
2. txn push marker keys would have their timestamp advanced, leading to
   transactions having their commit timestamp pushed, which could lead to
   TransactionRetryError with the `RETRY_SERIALIZABLE` reason.
3. global keys would have their timestamp advanced as if they had been read,
   also leading to transactions having their commit timestamp pushed if they
   wrote to those keys, which could also lead to TransactionRetryError with the
   `RETRY_SERIALIZABLE` reason.

The first issue here is the most severe, both because it can not be refreshed
away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of
these timestamp cache read summaries:
- `kv.lease_transfer_read_summary.local_budget`
- `kv.lease_transfer_read_summary.global_budget`

It configures the local keyspace budget to 4MB and the global keyspace budget to
0B. This default configuration should be sufficient to eliminate the first two
sources of retries described above. The third has not been observed as a serious
issue in practice, so we configure the global budget to 0 so that we can hit a
serialization fast-path.

Release note (ops change): Transaction replay protection state is now
passed between the outgoing and incoming leaseholder for a range during
a lease transfer. This avoids cases where lease transfers can cause
transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 31, 2024
… range merges

Fixes cockroachdb#61986.
Fixes cockroachdb#117486.
Unblocks cockroachdb#118000.

This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to
ship high-resolution summaries of the timestamp cache during lease transfers and
ranges merges. In doing so, it eliminates the loss of precision that occurs in an
incoming leaseholder's timestamp cache when it receives a lease transfer or range
merge.

This loss of precision was a source of transaction retries for three reasons:
1. txn tombstone marker keys would have their timestamp advanced, leading to
   TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN`
   reason.
2. txn push marker keys would have their timestamp advanced, leading to
   transactions having their commit timestamp pushed, which could lead to
   TransactionRetryError with the `RETRY_SERIALIZABLE` reason.
3. global keys would have their timestamp advanced as if they had been read,
   also leading to transactions having their commit timestamp pushed if they
   wrote to those keys, which could also lead to TransactionRetryError with the
   `RETRY_SERIALIZABLE` reason.

The first issue here is the most severe, both because it can not be refreshed
away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of
these timestamp cache read summaries:
- `kv.lease_transfer_read_summary.local_budget`
- `kv.lease_transfer_read_summary.global_budget`

It configures the local keyspace budget to 4MB and the global keyspace budget to
0B. This default configuration should be sufficient to eliminate the first two
sources of retries described above. The third has not been observed as a serious
issue in practice, so we configure the global budget to 0 so that we can hit a
serialization fast-path.

Release note (ops change): Transaction replay protection state is now
passed between the outgoing and incoming leaseholder for a range during
a lease transfer. This avoids cases where lease transfers can cause
transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 1, 2024
… range merges

Fixes cockroachdb#61986.
Fixes cockroachdb#117486.
Unblocks cockroachdb#118000.

This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to
ship high-resolution summaries of the timestamp cache during lease transfers and
ranges merges. In doing so, it eliminates the loss of precision that occurs in an
incoming leaseholder's timestamp cache when it receives a lease transfer or range
merge.

This loss of precision was a source of transaction retries for three reasons:
1. txn tombstone marker keys would have their timestamp advanced, leading to
   TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN`
   reason.
2. txn push marker keys would have their timestamp advanced, leading to
   transactions having their commit timestamp pushed, which could lead to
   TransactionRetryError with the `RETRY_SERIALIZABLE` reason.
3. global keys would have their timestamp advanced as if they had been read,
   also leading to transactions having their commit timestamp pushed if they
   wrote to those keys, which could also lead to TransactionRetryError with the
   `RETRY_SERIALIZABLE` reason.

The first issue here is the most severe, both because it can not be refreshed
away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of
these timestamp cache read summaries:
- `kv.lease_transfer_read_summary.local_budget`
- `kv.lease_transfer_read_summary.global_budget`

It configures the local keyspace budget to 4MB and the global keyspace budget to
0B. This default configuration should be sufficient to eliminate the first two
sources of retries described above. The third has not been observed as a serious
issue in practice, so we configure the global budget to 0 so that we can hit a
serialization fast-path.

Release note (ops change): Transaction replay protection state is now
passed between the outgoing and incoming leaseholder for a range during
a lease transfer. This avoids cases where lease transfers can cause
transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
craig bot pushed a commit that referenced this pull request Feb 1, 2024
115746: kv: log on excessive latch hold duration r=lyang24 a=lyang24

This commit aims to help observability by logging request holding latches over
threshold. long_latch_hold_duration is a new cluster setting that is introduced
to set the latch holding time threshold, latches held over the threshold will
be logged at most every second. To achieve logging spanlatch.manager now
contains a pointer to cluster setting.

Fixes: #114609

Release note: None

118300: kv: ship high-resolution tscache summaries during lease transfers and range merges r=nvanbenschoten a=nvanbenschoten

Fixes #61986.
Fixes #117486.
Unblocks #118000.

This commit uses the new `tscache.Cache.Serialize` method introduced in #118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge.

This loss of precision was a source of transaction retries for three reasons:
1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason.
2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason.
3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason.

The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries:
- `kv.lease_transfer_read_summary.local_budget`
- `kv.lease_transfer_read_summary.global_budget`

It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path.

Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.

Co-authored-by: lyang24 <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 2, 2024

Build succeeded:

@craig craig bot merged commit 71cab3e into cockroachdb:master Feb 2, 2024
9 checks passed
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/rcNoRetryLoops branch February 7, 2024 02:09
wenyihu6 pushed a commit to wenyihu6/cockroach that referenced this pull request Feb 21, 2024
… range merges

Fixes cockroachdb#61986.
Fixes cockroachdb#117486.
Unblocks cockroachdb#118000.

This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to
ship high-resolution summaries of the timestamp cache during lease transfers and
ranges merges. In doing so, it eliminates the loss of precision that occurs in an
incoming leaseholder's timestamp cache when it receives a lease transfer or range
merge.

This loss of precision was a source of transaction retries for three reasons:
1. txn tombstone marker keys would have their timestamp advanced, leading to
   TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN`
   reason.
2. txn push marker keys would have their timestamp advanced, leading to
   transactions having their commit timestamp pushed, which could lead to
   TransactionRetryError with the `RETRY_SERIALIZABLE` reason.
3. global keys would have their timestamp advanced as if they had been read,
   also leading to transactions having their commit timestamp pushed if they
   wrote to those keys, which could also lead to TransactionRetryError with the
   `RETRY_SERIALIZABLE` reason.

The first issue here is the most severe, both because it can not be refreshed
away and because it affects transactions of all isolation levels.

This commit introduces two new cluster settings to control the maximum size of
these timestamp cache read summaries:
- `kv.lease_transfer_read_summary.local_budget`
- `kv.lease_transfer_read_summary.global_budget`

It configures the local keyspace budget to 4MB and the global keyspace budget to
0B. This default configuration should be sufficient to eliminate the first two
sources of retries described above. The third has not been observed as a serious
issue in practice, so we configure the global budget to 0 so that we can hit a
serialization fast-path.

Release note (ops change): Transaction replay protection state is now
passed between the outgoing and incoming leaseholder for a range during
a lease transfer. This avoids cases where lease transfers can cause
transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: test tpc-c + read committed without txn retry loops
4 participants