Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: test tpc-c + read committed without txn retry loops #115191

Closed
nvanbenschoten opened this issue Nov 28, 2023 · 2 comments · Fixed by #118000
Closed

roachtest: test tpc-c + read committed without txn retry loops #115191

nvanbenschoten opened this issue Nov 28, 2023 · 2 comments · Fixed by #118000
Assignees
Labels
A-read-committed Related to the introduction of Read Committed A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Nov 28, 2023

In #100176, we added support for Read Committed to TPC-C.

Part of the motivation for Read Committed is that it does not require transaction retry loops. We should test this by removing the use of the cockroach-go retry loop from the workload and running it without --tolerate-errors. The workload should survive.
Epic: CRDB-34183

Jira issue: CRDB-34185

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-testing Testing tools and infrastructure A-read-committed Related to the introduction of Read Committed labels Nov 28, 2023
@nvanbenschoten
Copy link
Member Author

The code changes for this might look something like nvanbenschoten@8839cc2. We should probably expose a flag to disable retry loops in the workload.

The interesting part will be running in this mode and seeing what breaks.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Dec 26, 2023
Informs cockroachdb#115191.

This commit adds a `--txn-retries` flag to `tpcc`, allowing users of the
workload to disable transaction retry loops for 40001 errors.

For Serializable transactions (controllable with `--isolation-level`),
this quickly leads to errors being thrown by the workload. For Read
Committed transaction, the workload eventually hits an error due to
a lease transfer. Combined with a prototype fix for cockroachdb#61986, Read
Committed transactions survive for at least a few minutes without
error.

Release note: None
@nvanbenschoten nvanbenschoten self-assigned this Jan 2, 2024
@nvanbenschoten
Copy link
Member Author

As part of this issue, we should add a roachtest that runs TPC-C without retry loops and measures the error rate.

craig bot pushed a commit that referenced this issue Jan 2, 2024
117016: kv/closedts: remove use and propagation of synthetic timestamp bit  r=nvanbenschoten a=nvanbenschoten

Informs #101938.

This PR removes the handling of synthetic timestamps from the closed timestamp tracker data structure.

It then stops setting the synthetic timestamp bit on the closed timestamps selected for ranges with a `LEAD_FOR_GLOBAL_READS` closed timestamp policy.

This flag has been deprecated since v22.2 and is no longer consulted in uncertainty interval checks or by transaction commit-wait. It does not need to be propagated from evaluating requests to the closed timestamp.

Release note: None

117096: workload/tpcc: optionally disable txn retry loops r=michae2 a=nvanbenschoten

Informs #115191.

This commit adds a `--txn-retries` flag to `tpcc`, allowing users of the workload to disable transaction retry loops for 40001 errors.

For Serializable transactions (controllable with `--isolation-level`), this quickly leads to errors being thrown by the workload. For Read Committed transaction, the workload eventually hits an error due to a lease transfer. Combined with a prototype fix for #61986, Read Committed transactions survive for at least a few minutes without error.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jan 16, 2024
…hange

Fixes cockroachdb#117630.
Fixes cockroachdb#90656.
Fixes cockroachdb#98553.

Informs cockroachdb#61986.
Informs cockroachdb#115191.

This commit updates the post-lease transfer promotion of expiration-based leases
to epoch-based leases to not change the sequence number of the lease. This
avoids invalidating all requests proposed under the original expiration-based
lease, which can lead to `RETRY_ASYNC_WRITE_FAILURE` errors.

The change accomplishes this by updating the `Lease.Equivalent` method to
consider an expiration-based lease to be equivalent to an epoch-based lease that
is held by the same replica and has the same start time. Doing so requires some
care, because lease equivalency is checked below Raft and needs to remain
deterministic across binary versions.

This change requires a cluster version check, so it cannot be backported.

Release note (bug fix): Improved an interaction during range lease transfers
which could previously cause `RETRY_ASYNC_WRITE_FAILURE` errors to be returned
to clients.
craig bot pushed a commit that referenced this issue Jan 19, 2024
117840: kv: promote expiration-based lease to epoch without sequence number change r=erikgrinaker a=nvanbenschoten

Fixes #117630.
Fixes #90656.
Fixes #98553.

Informs #61986.
Informs #115191.

This commit updates the post-lease transfer promotion of expiration-based leases to epoch-based leases to not change the sequence number of the lease. This avoids invalidating all requests proposed under the original expiration-based lease, which can lead to `RETRY_ASYNC_WRITE_FAILURE` errors.

The change accomplishes this by updating the `Lease.Equivalent` method to consider an expiration-based lease to be equivalent to an epoch-based lease that is held by the same replica and has the same start time. Doing so requires some care, because lease equivalency is checked below Raft and needs to remain deterministic across binary versions.

This change requires a cluster version check, so it cannot be backported.

Release note (bug fix): Improved an interaction during range lease transfers which could previously cause `RETRY_ASYNC_WRITE_FAILURE` errors to be returned to clients.

117899: backupccl: skip `TestBackupRestoreAppend` under `deadlock` r=rail a=rickystewart

These tests are likely to time out.

Epic: CRDB-8308
Release note: None

117940: backupccl,sql: skip a couple more tests under duress r=rail a=rickystewart

These tests are all timing out. For the failures that seem suspect in some way, I have filed GitHub issues.

Epic: CRDB-8308
Release note: None

117950: copy: skip TestCopyFromRetries for now r=yuzefovich a=yuzefovich

We recently expanded this test and it became flaky. Skip it until we stabilize it.

Informs: #117912.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jan 19, 2024
Closes cockroachdb#115191.
Depends on cockroachdb#61986.

This commit switches the two nightly Read Committed variants of the TPC-C
roachtest to run without transaction retry loops, using the `--txn-retries` flag
introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which
is still in review and under development), these tests both pass.

Release note: None
craig bot pushed a commit that referenced this issue Feb 2, 2024
118000: roachtest: run Read Committed variants of TPC-C without txn retry loops r=nvanbenschoten a=nvanbenschoten

Closes #115191.
Depends on #61986.

This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in #117096. With #117630 and #61986 resolved (the latter of which is still in review and under development), these tests both pass.

Release note: None

118600: gceworker bootstrap: fix checksum for go download r=rickystewart a=jlinder

On the last go version upgrade to cockroach, the wrong checksum was entered in the gceworker bootstrap script for the downloaded tar file. This fixes it to be the correct checksum.

Epic: none
Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: James H. Linder <[email protected]>
@craig craig bot closed this as completed in 9002c19 Feb 2, 2024
cockroach-dev-inf pushed a commit that referenced this issue Feb 15, 2024
Informs #115191.

This commit adds a `--txn-retries` flag to `tpcc`, allowing users of the
workload to disable transaction retry loops for 40001 errors.

For Serializable transactions (controllable with `--isolation-level`),
this quickly leads to errors being thrown by the workload. For Read
Committed transaction, the workload eventually hits an error due to
a lease transfer. Combined with a prototype fix for #61986, Read
Committed transactions survive for at least a few minutes without
error.

Release note: None
wenyihu6 pushed a commit to wenyihu6/cockroach that referenced this issue Feb 21, 2024
Closes cockroachdb#115191.
Depends on cockroachdb#61986.

This commit switches the two nightly Read Committed variants of the TPC-C
roachtest to run without transaction retry loops, using the `--txn-retries` flag
introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which
is still in review and under development), these tests both pass.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-read-committed Related to the introduction of Read Committed A-testing Testing tools and infrastructure C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant