-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migration: introduce fence versions #57186
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 2 of 2 files at r4, 8 of 8 files at r5, 3 of 3 files at r6, 7 of 7 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)
pkg/clusterversion/clusterversion.go, line 247 at r7 (raw file):
v20_2 := roachpb.Version{Major: 20, Minor: 2} if !(cv.Version.Less(v20_2) || cv.Version == v20_2) && (cv.Internal%2) != 0 { p.Printf("%d.%d(fence=%d)", cv.Major, cv.Minor, cv.Internal)
That format strikes me as unintuitive, how about just p.Print(cv.Version); p.SafeString("-fence")
?
Also, will this leak out into show cluster setting version
? That gets parsed in a few places. Maybe we don't need any special printing here?
pkg/clusterversion/cockroach_versions.go, line 364 at r7 (raw file):
var cvs []ClusterVersion for _, keyedV := range vs { if keyedV.Version.Less(from.Version) || keyedV.Version == from.Version {
Just offering this:
// Read: "from < keyedV && keyedV <= to".
if from.Less(keyedV) && !to.Less(keyedV) {
cvs = append(cvs, ClusterVersion{Version: keyedV.Version})
}
I wouldn't be opposed to introducing LessEq
on Version
, I think we'd find quite a few uses for it.
pkg/clusterversion/cockroach_versions_test.go, line 99 at r7 (raw file):
return ClusterVersion{Version: roachpb.Version{Major: major}} } list := func(from, to int32) []ClusterVersion { // inclusive bounds
if you named the params "first" and "last" it would be obvious that they're inclusive of end points.
pkg/migration/manager.go, line 249 at r7 (raw file):
// If the v21.1 node is only added after v21.2(fence=1) is active, // it won't be able to actually join the cluster (it'll be prevented // by the join RPC).
This means that once we've seen the node list stabilize (as EveryNode enforces),
any new nodes that can join the cluster will run a release that support the fence version,
and by design also supports the actual version (which is the direct successor of the fence).
pkg/migration/manager.go, line 308 at r7 (raw file):
// to slot in fence versions for each cluster version. See top-level // documentation in pkg/clusterversion for more details. func FenceVersionFor(
this doesn't need to be exported. Are you doing this for docs? I'm noticing that there are lots of good comments in this package but most of them wouldn't be visible to godoc. Have you considered the pattern Nathan uses in #56392?
e43197b
to
6f5162c
Compare
For versions introduced during and after the v21.1 release, we introduce the constraint that internal versions must always be even. We intend to use the odd versions as internal fence versions for the long running migrations infrastructure. This will come in future commits. Release note: None
ca4c564
to
33891b5
Compare
bors r+ |
Build failed: |
The migrations infrastructure makes use of internal fence versions when stepping through consecutive versions. We'll need to first bump the fence version for each intermediate cluster version, before bumping the "real" one. Doing so allows us to provide the invariant that whenever a cluster version is active, all nodes in the cluster (including ones added concurrently during version upgrades) are running binaries that know about the version. It's instructive to walk through how we expect a version migration from v21.1 to v21.2 to take place, and how we behave in the presence of new v21.1 or v21.2 nodes being added to the cluster. - All nodes are running v21.1 - All nodes are rolled into v21.2 binaries, but with active cluster version still as v21.1 - The first version bump will be into v21.2(fence=1), see the migration manager above for where that happens Then concurrently: - A new node is added to the cluster, but running binary v21.1 - We try bumping the cluster gates to v21.2(fence=1) If the v21.1 nodes manages to sneak in before the version bump, it's fine as the version bump is a no-op one (all fence versions are). Any subsequent bumps (including the "actual" one bumping to v21.2) will fail during the validation step where we'll first check to see that all nodes are running v21.2 binaries. If the v21.1 node is only added after v21.2(fence=1) is active, it won't be able to actually join the cluster (it'll be prevented by the join RPC). --- We rely on the invariant we introduced earlier that requires new user-defined versions (users being crdb engineers) to always have even-numbered Internal versions. This reserved the odd numbers to slot in fence versions for each user-defined one. Release note: None
33891b5
to
6787eec
Compare
bors r+ |
Build succeeded: |
The migrations infrastructure makes use of internal fence versions when
stepping through consecutive versions. We'll need to first bump the
fence version for each intermediate cluster version, before bumping the
"real" one. Doing so allows us to provide the invariant that whenever a
cluster version is active, all nodes in the cluster (including ones
added concurrently during version upgrades) are running binaries that
know about the version.
It's instructive to walk through how we expect a version migration from
v21.1 to v21.2 to take place, and how we behave in the presence of new
v21.1 or v21.2 nodes being added to the cluster.
version still as v21.1
migration manager above for where that happens
Then concurrently:
If the v21.1 nodes manages to sneak in before the version bump, it's
fine as the version bump is a no-op one (all fence versions are). Any
subsequent bumps (including the "actual" one bumping to v21.2) will fail
during the validation step where we'll first check to see that all nodes
are running v21.2 binaries.
If the v21.1 node is only added after v21.2(fence=1) is active, it won't
be able to actually join the cluster (it'll be prevented by the join
RPC).
We rely on the invariant we introduced earlier that requires new
user-defined versions (users being crdb engineers) to always have
even-numbered Internal versions. This reserved the odd numbers to slot
in fence versions for each user-defined one.
Release note: None
Only the last two commits here are of interest. Earlier commits are from
#57154 and #57118.