-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: replica inconsistency false positive in mixed-version cluster with global table #117302
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Comments
nvanbenschoten
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-kv-transactions
Relating to MVCC and the transactional model.
GA-blocker
branch-release-23.2
Used to mark GA and release blockers, technical advisories, and bugs for 23.2
labels
Jan 3, 2024
This was referenced Jan 3, 2024
nvanbenschoten
changed the title
kv: replica inconsistency in mixed-version cluster with global table
kv: replica inconsistency false positive in mixed-version cluster with global table
Jan 4, 2024
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 4, 2024
…y checks Possible fix for cockroachdb#117302. This commit bumps the ReplicaChecksumVersion to disable replica consistency checks between v23.1 and v23.2 nodes when in a mixed-version cluster. This avoids the backwards incompatibility discussed in cockroachdb#117302. While here and permitted to change the replica consistency check logic, we unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during stats-only consistency checks. This form of consistency check is rarely used, but this prevents it from causing trouble with cockroachdb#101938. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 4, 2024
…y checks Informs cockroachdb#101938. This commit bumps the ReplicaChecksumVersion to disable replica consistency checks between v23.2 and v24.1 nodes when in a mixed-version cluster. This avoids the backwards incompatibility discussed in cockroachdb#117302. While here and permitted to change the replica consistency check logic, we unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during stats-only consistency checks. This form of consistency check is rarely used, but this prevents it from causing trouble with cockroachdb#101938. While here and allowed to change the consistency check hash computation, we also switch from using the LegacyTimestamp encoding to the Timestamp encoding for the hash contribution of MVCCKeys. Release note: None
nvanbenschoten
removed
GA-blocker
branch-release-23.2
Used to mark GA and release blockers, technical advisories, and bugs for 23.2
branch-release-23.2.0-rc
(deleted)
labels
Jan 5, 2024
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 10, 2024
…ty=global/reads=strong Informs cockroachdb#117302. Now that cockroachdb#117302 is fixed on master, we can unskip this test. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jan 10, 2024
117578: packer: install `protoc` on TC agents r=rail,pawalt a=rickystewart Epic: None Part of: DEVINF-1016 Release note: None 117620: roachtest: unskip follower-reads/mixed-version/survival=region/locality=global/reads=strong r=nvanbenschoten a=nvanbenschoten Informs #117302. Now that #117302 is fixed on master, we can unskip this test. Release note: None 117622: kvclient: remove deprecated NLHE field r=arulajmani a=andrewbaptist The speculative leases field on the NLHE was introduced for compabitility between 22.1 and 22.2. It is no longer needed and removing the field cleans up some exisiting code. Epic: none Release note: None 117637: roachtest: update cdc/filtering tests to work with duplicate events r=miretskiy a=andyyang890 This patch updates the `cdc/filtering` tests to not fail when duplicate events are emitted by the changefeed. Fixes #117590 Release note: None Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Andy Yang <[email protected]>
mgartner
pushed a commit
to mgartner/cockroach
that referenced
this issue
Jan 12, 2024
…y checks Informs cockroachdb#101938. This commit bumps the ReplicaChecksumVersion to disable replica consistency checks between v23.2 and v24.1 nodes when in a mixed-version cluster. This avoids the backwards incompatibility discussed in cockroachdb#117302. While here and permitted to change the replica consistency check logic, we unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during stats-only consistency checks. This form of consistency check is rarely used, but this prevents it from causing trouble with cockroachdb#101938. While here and allowed to change the consistency check hash computation, we also switch from using the LegacyTimestamp encoding to the Timestamp encoding for the hash contribution of MVCCKeys. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
In #105523 (v23.2), we removed the logic to decode the synthetic timestamp bit from mvcc keys. Keys with synthetic timestamps set would be decoded as if their synthetic timestamp bit was not set. Under normal operation, this was safe, as this field is no longer consulted and is being phased out (#101938).
The idea was that this would not cause issues with mixed-version clusters (v23.1 + v23.2) because replica consistency checks operated at the encoded
EngineKey
level, so previously present synthetic timestamps would be included in replica consistency checksum even from v23.2 nodes.However, in the process of completing the migration and adding more mixed-version testing (#117294), I am finding that this assumption is false. Replica consistency checks operate at the
MVCCKey
level, meaning that they decode each encoded pebble key into an MVCC key and then hash the timestamp component separately:cockroach/pkg/kv/kvserver/replica_consistency.go
Lines 559 to 565 in 797e658
This is a problem. If a v23.2 node does not decode the synthetic timestamp bit and a v23.1 node does, then it will hash the timestamp component differently than a v23.1 node will, and the replica consistency check will fail. I can reproduce this with the following script:
Mitigation and Next Steps
Option 1
One mitigation is to revert most or all of #105523. We need to keep decoding synthetic timestamps in MVCC keys until the consistency checks stop consulting them.
To complete the migration, we'll then need to phase out synthetic timestamps in replica consistency checks more slowly. We'll add a version gate flag that gets plumbed down to
CalcReplicaDigest
and is used to selectively ignore synthetic timestamps.Option 2
Another mitigation is to bump the
ReplicaChecksumVersion
in v23.2, so that v23.1 and v23.2 nodes don't mix when performing consistency checks. This would be the safest fix and would also make the migration easy, but it would disable replica consistency checks between v23.1 and v23.2 nodes when in a mixed-version, which might be too much of a loss of coverage to justify.Jira issue: CRDB-35111
The text was updated successfully, but these errors were encountered: