Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migration: introduce fence versions #57186

Merged
merged 2 commits into from
Nov 28, 2020

Conversation

irfansharif
Copy link
Contributor

The migrations infrastructure makes use of internal fence versions when
stepping through consecutive versions. We'll need to first bump the
fence version for each intermediate cluster version, before bumping the
"real" one. Doing so allows us to provide the invariant that whenever a
cluster version is active, all nodes in the cluster (including ones
added concurrently during version upgrades) are running binaries that
know about the version.

It's instructive to walk through how we expect a version migration from
v21.1 to v21.2 to take place, and how we behave in the presence of new
v21.1 or v21.2 nodes being added to the cluster.

  • All nodes are running v21.1
  • All nodes are rolled into v21.2 binaries, but with active cluster
    version still as v21.1
  • The first version bump will be into v21.2(fence=1), see the
    migration manager above for where that happens
    Then concurrently:
  • A new node is added to the cluster, but running binary v21.1
  • We try bumping the cluster gates to v21.2(fence=1)

If the v21.1 nodes manages to sneak in before the version bump, it's
fine as the version bump is a no-op one (all fence versions are). Any
subsequent bumps (including the "actual" one bumping to v21.2) will fail
during the validation step where we'll first check to see that all nodes
are running v21.2 binaries.

If the v21.1 node is only added after v21.2(fence=1) is active, it won't
be able to actually join the cluster (it'll be prevented by the join
RPC).

We rely on the invariant we introduced earlier that requires new
user-defined versions (users being crdb engineers) to always have
even-numbered Internal versions. This reserved the odd numbers to slot
in fence versions for each user-defined one.

Release note: None


Only the last two commits here are of interest. Earlier commits are from
#57154 and #57118.

@irfansharif irfansharif requested review from knz and tbg November 27, 2020 00:10
@irfansharif irfansharif requested a review from a team as a code owner November 27, 2020 00:10
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 2 of 2 files at r4, 8 of 8 files at r5, 3 of 3 files at r6, 7 of 7 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)


pkg/clusterversion/clusterversion.go, line 247 at r7 (raw file):

	v20_2 := roachpb.Version{Major: 20, Minor: 2}
	if !(cv.Version.Less(v20_2) || cv.Version == v20_2) && (cv.Internal%2) != 0 {
		p.Printf("%d.%d(fence=%d)", cv.Major, cv.Minor, cv.Internal)

That format strikes me as unintuitive, how about just p.Print(cv.Version); p.SafeString("-fence")?

Also, will this leak out into show cluster setting version? That gets parsed in a few places. Maybe we don't need any special printing here?


pkg/clusterversion/cockroach_versions.go, line 364 at r7 (raw file):

	var cvs []ClusterVersion
	for _, keyedV := range vs {
		if keyedV.Version.Less(from.Version) || keyedV.Version == from.Version {

Just offering this:

// Read: "from < keyedV && keyedV <= to".
if from.Less(keyedV) && !to.Less(keyedV) {
  cvs = append(cvs, ClusterVersion{Version: keyedV.Version})
}

I wouldn't be opposed to introducing LessEq on Version, I think we'd find quite a few uses for it.


pkg/clusterversion/cockroach_versions_test.go, line 99 at r7 (raw file):

		return ClusterVersion{Version: roachpb.Version{Major: major}}
	}
	list := func(from, to int32) []ClusterVersion { // inclusive bounds

if you named the params "first" and "last" it would be obvious that they're inclusive of end points.


pkg/migration/manager.go, line 249 at r7 (raw file):

			// If the v21.1 node is only added after v21.2(fence=1) is active,
			// it won't be able to actually join the cluster (it'll be prevented
			// by the join RPC).

This means that once we've seen the node list stabilize (as EveryNode enforces),
any new nodes that can join the cluster will run a release that support the fence version,
and by design also supports the actual version (which is the direct successor of the fence).


pkg/migration/manager.go, line 308 at r7 (raw file):

// to slot in fence versions for each cluster version. See top-level
// documentation in pkg/clusterversion for more details.
func FenceVersionFor(

this doesn't need to be exported. Are you doing this for docs? I'm noticing that there are lots of good comments in this package but most of them wouldn't be visible to godoc. Have you considered the pattern Nathan uses in #56392?

@irfansharif irfansharif force-pushed the 201125.fence-versions branch from e43197b to 6f5162c Compare November 28, 2020 00:28
For versions introduced during and after the v21.1 release, we introduce
the constraint that internal versions must always be even. We intend to
use the odd versions as internal fence versions for the long running
migrations infrastructure. This will come in future commits.

Release note: None
@irfansharif irfansharif force-pushed the 201125.fence-versions branch 2 times, most recently from ca4c564 to 33891b5 Compare November 28, 2020 01:05
@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 28, 2020

Build failed:

The migrations infrastructure makes use of internal fence versions when
stepping through consecutive versions. We'll need to first bump the
fence version for each intermediate cluster version, before bumping the
"real" one. Doing so allows us to provide the invariant that whenever a
cluster version is active, all nodes in the cluster (including ones
added concurrently during version upgrades) are running binaries that
know about the version.

It's instructive to walk through how we expect a version migration from
v21.1 to v21.2 to take place, and how we behave in the presence of new
v21.1 or v21.2 nodes being added to the cluster.
  - All nodes are running v21.1
  - All nodes are rolled into v21.2 binaries, but with active cluster
    version still as v21.1
  - The first version bump will be into v21.2(fence=1), see the
    migration manager above for where that happens
Then concurrently:
  - A new node is added to the cluster, but running binary v21.1
  - We try bumping the cluster gates to v21.2(fence=1)

If the v21.1 nodes manages to sneak in before the version bump, it's
fine as the version bump is a no-op one (all fence versions are). Any
subsequent bumps (including the "actual" one bumping to v21.2) will fail
during the validation step where we'll first check to see that all nodes
are running v21.2 binaries.

If the v21.1 node is only added after v21.2(fence=1) is active, it won't
be able to actually join the cluster (it'll be prevented by the join
RPC).

                              ---

We rely on the invariant we introduced earlier that requires new
user-defined versions (users being crdb engineers) to always have
even-numbered Internal versions. This reserved the odd numbers to slot
in fence versions for each user-defined one.

Release note: None
@irfansharif irfansharif force-pushed the 201125.fence-versions branch from 33891b5 to 6787eec Compare November 28, 2020 02:27
@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 28, 2020

Build succeeded:

@craig craig bot merged commit 54cdc5f into cockroachdb:master Nov 28, 2020
@irfansharif irfansharif deleted the 201125.fence-versions branch November 28, 2020 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants