Skip to content

v20.2.0-beta.2

We dropped this field recently, but unfortunately that wasn't safe for
mixed-version clusters. The rub is that 20.1 nodes need to roundtrip the
proto through 20.2 nodes in a fairly subtle way. When it comes back to
the 20.1 node, the descriptor needs to compare Equal() to the original.
We configure our protos to not preserve unrecognized fields, so removing
the field breaks this round-tripping.

Specifically, the scenario which broke is the following:
1. A 20.1 node processes an AdminSplit, and performs the tranction
   writing the new descriptors. The descriptors have the
   GenerationCompable field set.
2. The lease changes while the respective txn is running. The lease
   moves to a 20.2 node.
3. The 20.2 node evaluates the EndTxn, and computes the split trigger
   info that's going to be replicated. The EndTxn has split info in it
   containing the field set, but the field is dropped when converting
   that into the proposed SplitTrigger (since the 20.2 unmarshalls and
   re-marshalls the descriptors).
4. All the 20.1 replicas of the ranges involved now apply the respective
   trigger via Raft, and their in-memory state doesn't have the field
   set. This doesn't match the bytes written in the database, which have
   the field.
5. The discrepancy between the in-memory state and the db state is a
   problem, as it causes the 20.1 node to spin if it tries to perform
   subsequent merge/split operations. The reason is that the code
   performing these operations short-circuits itself if it detects that
   the descriptor has changed while the operation was running. This
   detection is performed via the generated Equals() method, and it
   mis-fires because of the phantom field. That detection happens here:
   https://github.com/cockroachdb/cockroach/blob/79c01d28da9c379f67bb41beef3d85ad3bee1da1/pkg/kv/kvserver/replica_command.go#L1957

This patch takes precautions so that we can remove the field again in
21.1 - I'm merging this in 21.1, I'll backport it to 20.2, and then I'll
come back to 20.2 and remove the field. Namely, the patch changes
RangeDesc.Equal() to ignore that field (and the method is no longer
generated).

Fixes #53535

Release note: None
Assets 2
Loading