backport-2.1: one week of merge PRs #28994

benesch · 2018-08-23T03:46:36Z

Backport:

1/1 commits from "storage: discard a unworthwhile merge TODO" (storage: discard a unworthwhile merge TODO #28885)
3/3 commits from "storage: gate merges behind a cluster setting " (storage: gate merges behind a cluster setting #28865)
1/1 commits from "storage: update the tscache appropriately after a merge" (storage: update the tscache appropriately after a merge #28793)
1/1 commits from "storage: avoid stopping twice in merge test" (storage: avoid stopping twice in merge test #28902)
1/1 commits from "roachpb,storage: rename GetSnapshotForMerge to Subsume" (roachpb,storage: rename GetSnapshotForMerge to Subsume #28910)
1/1 commits from "build: fix generation rules for settings.html" (build: fix generation rules for settings.html #28884)
1/1 commits from "storage: deflake TestStoreRangeMergeTimestampCacheCausality" (storage: deflake TestStoreRangeMergeTimestampCacheCausality #28928)
1/1 commits from "storage: check for in-progress merge before installing new lease" (storage: check for in-progress merge before installing new lease #28894)
7/8 commits from "storage: enable merge queue by default" ( storage: enable merge queue by default #28961)

Please see individual PRs for details.

/cc @cockroachdb/release

This TODO suggested removing maybeWatchForMerge by teaching the LHS replica to reach into the RHS replica after a merge committed to unblock requests. It failed to consider that we'd need to do the same if the merge aborted. We don't presently have an abort trigger, so this is excessively difficult. Simply discard the TODO; the status quo is fine. Release note: None

Store.SplitRange was still using the old term "range" to refer to replicas. Also use "left" and "right" instead of "orig" and "new" for consistency with Store.MergeRange. Release note: None

It is important that the in-memory copy of a replica's range descriptor exactly match the on-disk copy. Previously, during splits/merges, we would reconstruct the updates to the in-memory descriptor piecemeal. This is dangerous, especially in mixed version clusters, where nodes can disagree about what updates to the descriptor to perform. Notably, splits only update the generation counter in v2.1. Instead of trying to reconstruct the updates piecemeal during a split or merge, simply adopt the updated descriptor from the split/merge trigger wholesale. We'd ideally adjust replica changes to operate the same way. Unfortunately the updated descriptor is not currently included in the ChangeReplicasTrigger, so the migration is rather involved. Release note: None

Merges will explode if used in a mixed-version cluster. Gate them behind a cluster setting. For extra safety, also gate increments of the new generation counter in the range descriptor behind the same cluster setting. It's not clear exactly what would go wrong, if anything, if mixed-version clusters incremented the generation counter, but better to be extra cautious. Release note: None

When applying a merge, teach the leaseholder of the LHS range to update its timestamp cache for the keyspace previously owned by the RHS range appropriately. Release note: None

TestStoreRangeMergeDuringShutdown shuts down the multiTestContext when it applies a lease for the RHS. In rare cases, the lease application can get replayed, which previously caused the multiTestContext to get shutdown twice, which panics. Add additional state to prevent this case. Fix cockroachdb#28894. Release note: None

GetSnapshotForMerge no longer fetches a snapshot of the RHS of a merge. Rename the method Subsume to better reflect its purpose: to freeze a range before it is subsumed by its left-hand neighbor. Release note: None

The rules for generating the settings HTML table got broken at some point. Fix them by using target-specific variables properly: they only apply to prerequisites of the declaring target, and are only resolved within recipes. Release note: None

TestStoreRangeMergeTimestampCacheCausality could time out if the merge transaction retried, which occured about one out of every two hundred runs, because it would try to send multiple values over a channel whose buffer had room for only one value. Deflake the test by remembering in a variable the value from the last merge transaction to execute rather than using channels. Release note: None

During a merge, it is possible for the RHS lease to change hands, e.g., when the original leaseholder dies and another member of the range acquires the lease. In this case, the new leaseholder is responsible for checking for a deletion intent on its local range descriptor; if it discovers such an intent, a merge is in progress and the leaseholder is required to block all traffic unless it can prove that the merge aborted. The previous implementation of this scheme had a small window in which the new leaseholder had installed a valid lease but had not yet installed a mergeComplete channel to block all traffic. This race was never seen in practice, but it could, in theory, lead to a serializability violation. Reorder the flow post-lease acquisition so that checking for an in-progress merge occurs before the new lease is installed. Release note: None

Now that merges do not include a snapshot of the RHS data in the merge trigger, we no longer need a setting limiting the size of the RHS of a merge. Release note: None

Merges are relatively expensive. Set the merge queue interval to one second so we avoid processing too many merges at once. Introduce a cluster setting to allow users/tests to adjust the merge queue interval if they so choose. Fix cockroachdb#27769. Release note: None

The retry loop in AdminSplit can span many seconds. In that time, the replica may lose its lease, or the range might be merged away entirely. In either of those cases, the split can never succeed, and so the retry loop needs to give up. The loop was properly exiting if it noticed it lost its lease, but a range can get merged away without losing its lease. The final lease on that range remains valid until the liveness epoch it is tied to expires. Teach the loop to notice that condition too by checking Replica.IsDestroyed on every turn of the loop. Release note: None

Teach TestSystemZoneConfigs to install zone configs via SQL, rather than the hacky testing override system, which interacts poorly with the forthcoming on-by-default merge queue. Release note: None

Guarantee session consistency for SET CLUSTER SETTING. That is, a session that executes a SET CLUSTER SETTING in a transaction is guaranteed to use that new value of the cluster setting after the transaction commits. (Unless, of course, there are concurrent updates to the setting.) Release note: None

Turn off the merge queue in all tests that need it. The actual default will be changed in a separate PR so that this commit can be safely backported to release-2.1. Release note: None

Splitting while the merge queue is enabled is almost certainly a user mistake. Add a best-effort check to prevent users from splitting while the merge queue is enabled. Users can override the check and request a split anyway by twiddling a new session variable, experimental_force_split_at. We have plans to eventually make the splits created by SPLIT AT "sticky", so that the merge queue does not immediately merge them away, but not in time for 2.1. Release note: None

cockroach-teamcity · 2018-08-23T03:46:55Z

This change is

benesch · 2018-08-27T03:26:12Z

Superseded by #29082.

benesch added 17 commits August 22, 2018 23:42

storage: update variable names in Store.SplitRange

88b806a

Store.SplitRange was still using the old term "range" to refer to replicas. Also use "left" and "right" instead of "orig" and "new" for consistency with Store.MergeRange. Release note: None

storage: update the tscache appropriately after a merge

16057d2

When applying a merge, teach the leaseholder of the LHS range to update its timestamp cache for the keyspace previously owned by the RHS range appropriately. Release note: None

roachpb,storage: rename GetSnapshotForMerge to Subsume

9338765

GetSnapshotForMerge no longer fetches a snapshot of the RHS of a merge. Rename the method Subsume to better reflect its purpose: to freeze a range before it is subsumed by its left-hand neighbor. Release note: None

storage: remove MergeMaxRHSSize setting

b7e9b29

Now that merges do not include a snapshot of the RHS data in the merge trigger, we no longer need a setting limiting the size of the RHS of a merge. Release note: None

storage: update zone config installation in TestSystemZoneConfigs

d164cb6

Teach TestSystemZoneConfigs to install zone configs via SQL, rather than the hacky testing override system, which interacts poorly with the forthcoming on-by-default merge queue. Release note: None

storage: prepare to enable merge queue by default

911812b

Turn off the merge queue in all tests that need it. The actual default will be changed in a separate PR so that this commit can be safely backported to release-2.1. Release note: None

benesch requested review from tbg, nvanbenschoten and a team August 23, 2018 03:46

benesch requested a review from a team as a code owner August 23, 2018 03:46

benesch requested review from a team August 23, 2018 03:46

benesch requested a review from a team as a code owner August 23, 2018 03:46

benesch requested review from a team August 23, 2018 03:46

benesch closed this Aug 27, 2018

benesch deleted the backport2.1-28885-28865-28793-28902-28910-28884-28928-28894-28961 branch August 27, 2018 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backport-2.1: one week of merge PRs #28994

backport-2.1: one week of merge PRs #28994

benesch commented Aug 23, 2018

cockroach-teamcity commented Aug 23, 2018

benesch commented Aug 27, 2018

backport-2.1: one week of merge PRs #28994

backport-2.1: one week of merge PRs #28994

Conversation

benesch commented Aug 23, 2018

cockroach-teamcity commented Aug 23, 2018

benesch commented Aug 27, 2018