Update MR docs to recommend 250ms max-offset #10988

rmloveland · 2021-07-29T19:42:56Z

cockroach-teamcity · 2021-07-29T19:43:04Z

This change is

github-actions · 2021-07-29T19:43:19Z

Files changed:

_includes/v21.1/faq/clock-synchronization-effects.md
_includes/v21.1/misc/multiregion-max-offset.md
_includes/v21.2/faq/clock-synchronization-effects.md
_includes/v21.2/misc/multiregion-max-offset.md
v21.1/choosing-a-multi-region-configuration.md
v21.1/global-tables.md
v21.1/multiregion-overview.md
v21.1/regional-tables.md
v21.1/when-to-use-regional-vs-global-tables.md
v21.2/choosing-a-multi-region-configuration.md
v21.2/global-tables.md
v21.2/multiregion-overview.md
v21.2/regional-tables.md
v21.2/when-to-use-regional-vs-global-tables.md

netlify · 2021-07-29T19:46:24Z

✔️ Netlify Preview

🔨 Explore the source changes: 98bae79

🔍 Inspect the deploy log: https://app.netlify.com/sites/cockroachdb-docs/deploys/611574bdd8f4ac00085d91bf

😎 Browse the preview: https://deploy-preview-10988--cockroachdb-docs.netlify.app

awoods187

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rmloveland)

_includes/v21.1/misc/multiregion-max-offset.md, line 1 at r1 (raw file):

For new clusters using the [multi-region SQL abstractions](multiregion-overview.html), we recommend lowering the [`--max-offset`](cockroach-start.html#flags-max-offset) setting to `250ms`.  This is especially helpful for lowering the write latency of [global tables](multiregion-overview.html#global-tables).

Maybe add a note here that says "adjusting this setting requires taking the cluster offline and should be done with care" or something similiar

v21.1/when-to-use-regional-vs-global-tables.md, line 22 at r1 (raw file):

- Your application has a "read-mostly" table of reference data that is rarely updated, and that needs to be available to all regions.
- You can accept that writes to the table will incur higher latencies from any given region, since writes use a novel [non-blocking transaction protocol](architecture/transaction-layer.html#non-blocking-transactions) that uses a timestamp "in the future".

Maybe mention here that the observed write latency is dependent on the max offset value which in turn needs to be carefully adjusted based on the clocks on your machines

rmloveland

TFTR Andy!

Nathan, are you OK with these changes?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187 and @nvanbenschoten)

_includes/v21.1/misc/multiregion-max-offset.md, line 1 at r1 (raw file):

Previously, awoods187 (Andy Woods) wrote…

Maybe add a note here that says "adjusting this setting requires taking the cluster offline and should be done with care" or something similiar

Updated to note that restart is required!

v21.1/when-to-use-regional-vs-global-tables.md, line 22 at r1 (raw file):

Previously, awoods187 (Andy Woods) wrote…

Maybe mention here that the observed write latency is dependent on the max offset value which in turn needs to be carefully adjusted based on the clocks on your machines

Added a note that this depends on max-offset!

nvanbenschoten

thanks Rich! I just have one comment on our wording around restarting the cluster to change the max-offset.

Reviewed 5 of 7 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @awoods187)

_includes/v21.1/misc/multiregion-max-offset.md, line 1 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Updated to note that restart is required!

I think we'll need to be even more explicit here. This doesn't just require restarting the nodes in the cluster, it requires restarting them all at the same time. In other words, this can't be done in a rolling fashion.

Fixes #10920

rmloveland

Thanks Nathan! Updated!

PS Sorry for the notifications noise from the huge rebase on this one. We just created the new "v21.2" folder and I had to rebase on that to get these changes into the new docs as well

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @awoods187)

_includes/v21.1/misc/multiregion-max-offset.md, line 1 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we'll need to be even more explicit here. This doesn't just require restarting the nodes in the cluster, it requires restarting them all at the same time. In other words, this can't be done in a rolling fashion.

OK - updated to say "Note that this will require restarting all of the nodes in your cluster at the same time; it cannot be done with a rolling restart."

taroface

LGTM

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @awoods187 and @nvanbenschoten)

85983: kv: permit different max clock offsets on different nodes in same cluster r=nvanbenschoten a=nvanbenschoten Fixes #66868. Before this commit, it was not possible to change the maximum clock offset of a cluster without shutting down all nodes in the cluster and bringing them all back up with a new `--max-offset` flag. This was enforced by protection added in #9612, which would crash a node if it detected different settings on different nodes. A stop-the-world cluster restart is quite disruptive and prevents us from changing the default value of the `--max-offset` flag between releases of CockroachDB or on existing CC clusters. This commit removes this protection. The change itself is trivial. The real work is convincing ourselves that it is safe to run in a mixed max-offset cluster. We do that below. In the following table, we detail each use of `--max-offset`. We then explore the hazards of running in a cluster with mixed `--max-offset` settings. ### Uses of `--max-offset` | Use | Description | | ----------- | ----------- | | `kv.NewTxn` | Used to configure the transaction's global uncertainty limit | | `computeIntervalForNonTxn` | Used to configure the non-transacitonal request's global uncertainty limit | | `TxnCoordSender.maybeCommitWait` | **When in "linearizable" mode**, used to sleep until all clocks beyond local HLC | | `Txn.DeadlineLikelySufficient` | Used to estimate whether the current txn deadline is sufficiently far in the future | | `closedts.TargetForPolicy` | Used to estimate lead time for global tables | | `Replica.leaseStatus` | Used to determine the start of the stasis period for leases | | `startupmigrations.LeaseManager.timeRemaining` | Used to determine the time remaining on a migration lease | | `RemoteClockMonitor.VerifyClockOffset` | Used to detect clock synchronization errors | ### Hazards of mixed `--max-offset` | Use | skew < min(max-offset) | min(max-offset) < skew < max(max-offset) | max(max-offset) < skew | | ----------- | ----------- | ----------- | ----------- | | `kv.NewTxn` | N/A | Stale reads | Stale reads | | `computeIntervalForNonTxn` | N/A | Stale reads | Stale reads | | `TxnCoordSender.maybeCommitWait` | N/A | Causal reverse | Causal reverse | | `Txn.DeadlineLikelySufficient` | N/A | Deadline exceeded retry errors | Deadline exceeded retry errors | | `closedts.TargetForPolicy` | N/A | Global table reads redirect to leaseholder | Global table reads redirect to leaseholder | | `Replica.leaseStatus` | N/A | Stale reads for non-txn requests | Stale reads for non-txn requests | | `startupmigrations.LeaseManager.timeRemaining` | N/A | Lease replaced unexpectedly, [error](https://github.com/cockroachdb/cockroach/blob/8e3ee57f3c8ea734c7221ff4c758c91cd928b6d3/pkg/startupmigrations/leasemanager/lease.go#L167) | Lease replaced unexpectedly, [error](https://github.com/cockroachdb/cockroach/blob/8e3ee57f3c8ea734c7221ff4c758c91cd928b6d3/pkg/startupmigrations/leasemanager/lease.go#L167) | | `RemoteClockMonitor.VerifyClockOffset` | N/A | Nodes in minority self-terminate **if their own max-offset < skew** | Nodes in minority self-terminate | Notice that in all cases except the last, the outcome of `min(max-offset) < skew < max(max-offset)` and `max(max-offset) < skew` are identical. This means that, ignoring the last use, the hazards of `max-offset < skew` in a cluster with a homogeneous setting for `--max-offset` is the same as the hazard of `min(max-offset) < skew` in a cluster with a heterogeneous setting for `--max-offset`. As a result, we can consider `min(max-offset)` to be the "effective" max-offset of a cluster. #### Use in `Replica.leaseStatus` The use in `Replica.leaseStatus` of the max clock offset to determine whether a request falls into a lease's stasis period is a subtle case here, so it deserves a bit more explanation. As a reminder, the lease stasis period is described [here](https://github.com/cockroachdb/cockroach/blob/8e3ee57f3c8ea734c7221ff4c758c91cd928b6d3/pkg/kv/kvserver/kvserverpb/lease_status.proto#L27). Consider the case where `min(max-offset) < skew < max(max-offset)` and a request is sent to an outgoing leaseholder near the end of its lease. The interesting case is where the outgoing leaseholder has a smaller max-offset than the skew. In such cases, it may not consider a non-transaction request to be in its stasis period even though a different replica saw the lease as expired, requested a new lease, and served a write. This could allow the non-transactional request to serve a stale read. Transactional requests are [not subject](https://github.com/cockroachdb/cockroach/blob/8e3ee57f3c8ea734c7221ff4c758c91cd928b6d3/pkg/kv/kvserver/replica_range_lease.go#L611) to the stasis period, so they are not discussed in this example. However, the effect of `min(max-offset) < skew < max(max-offset)` is the same — stale reads. #### Use in `RemoteClockMonitor.VerifyClockOffset` The use of `RemoteClockMonitor.VerifyClockOffset` also deserves discussion because it's the one case where mixed `--max-offset` values make a difference. If the nodes with skew (i.e. those in the minority) have lower `--max-offset` values, they will self-terminate. However, if the nodes with skew have higher `--max-offset` values, they will not self-terminate. This could leave the cluster open to the other hazards indefinitely, without the skewed nodes ever deciding to self-terminate. For instance, these skewed nodes could continue to perform writes at high timestamps, causing other nodes with lower `--max-offset` values to serve stale reads. One option I explored to address this was to have `RemoteClockMonitor.VerifyClockOffset` use the minimum max offset across the cluster instead of its local max offset when verifying clocks. This would be possible by using the `OriginMaxOffsetNanos` value communicated in each `PingRequest`. I decided against this for now in favor of keeping things simple, as it had some rough edges around ignoring stale max-offset values. It's also not clear how this protection will work in a world with dynamic clock uncertainty error bounds. ---- After this lands, we'll need to remove the following text in the docs, which was added in cockroachdb/docs#10988 and in other PRs: ``` Note that this value must be the same on all nodes in the cluster and cannot be changed with a rolling upgrade. In order to change it, first stop every node in the cluster. Then once the entire cluster is offline, restart each node with the new value. ``` and ``` For existing clusters, changing the setting will require restarting all of the nodes in your cluster at the same time; it cannot be done with a rolling restart. ``` ---- Release note (ops change): clusters can now run nodes with different --max-offset settings at the same time. This enables operators to perform a rolling restart to change the value of each node's --max-offset flag. Co-authored-by: Nathan VanBenschoten <[email protected]>

rmloveland requested a review from nvanbenschoten July 29, 2021 21:02

awoods187 approved these changes Aug 3, 2021

View reviewed changes

rmloveland force-pushed the 20210729-250ms-to-freedom branch from f5115a2 to eac84aa Compare August 11, 2021 15:23

rmloveland commented Aug 11, 2021

View reviewed changes

nvanbenschoten approved these changes Aug 11, 2021

View reviewed changes

Update MR docs to recommend 250ms max-offset

98bae79

Fixes #10920

rmloveland force-pushed the 20210729-250ms-to-freedom branch from eac84aa to 98bae79 Compare August 12, 2021 19:21

rmloveland commented Aug 12, 2021

View reviewed changes

rmloveland requested a review from taroface August 12, 2021 19:40

nvanbenschoten approved these changes Aug 12, 2021

View reviewed changes

taroface approved these changes Aug 12, 2021

View reviewed changes

rmloveland merged commit 9db8048 into master Aug 13, 2021

rmloveland deleted the 20210729-250ms-to-freedom branch August 13, 2021 17:06

nvanbenschoten mentioned this pull request Aug 11, 2022

kv: permit different max clock offsets on different nodes in same cluster cockroachdb/cockroach#85983

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MR docs to recommend 250ms max-offset #10988

Update MR docs to recommend 250ms max-offset #10988

rmloveland commented Jul 29, 2021

cockroach-teamcity commented Jul 29, 2021

github-actions bot commented Jul 29, 2021 •

edited

Loading

netlify bot commented Jul 29, 2021 •

edited

Loading

awoods187 left a comment

rmloveland left a comment

nvanbenschoten left a comment

rmloveland left a comment

taroface left a comment

Update MR docs to recommend 250ms max-offset #10988

Update MR docs to recommend 250ms max-offset #10988

Conversation

rmloveland commented Jul 29, 2021

cockroach-teamcity commented Jul 29, 2021

github-actions bot commented Jul 29, 2021 • edited Loading

netlify bot commented Jul 29, 2021 • edited Loading

awoods187 left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

taroface left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 29, 2021 •

edited

Loading

netlify bot commented Jul 29, 2021 •

edited

Loading