Adds in-place upgrade and manual recovery support. #139

slackpad · 2016-07-18T23:17:04Z

This adds several important capabilities to help in upgrading to the new Raft protocol version:

We can migrate an existing peers.json file, which is sometimes the source of truth for the old version of the library before this support was moved to be fully in snapshots + raft log as the official source.
If we are using protocol version 0 where we don't support server IDs, operators can continue to use peers.json as an interface to manually recover from a loss of quorum.
We left ourselves open for a more full-featured recovery manager by giving a new RecoverCluster interface access to a complete Configuration object to consume. This will allow us to manually pick which server is a voter for manual elections (set 1 to a voter and the rest to nonvoters, the 1 voter will elect itself), as well as basically any other configuration we want to set.

This also gives a path for introducing Raft servers running the new version of the library into a cluster running the old code. Things would work like this:

// These are the versions of the protocol (which includes RPC messages as
// well as Raft-specific log entries) that this server can _understand_. Use
// the ProtocolVersion member of the Config object to control the version of
// the protocol to use when _speaking_ to other servers. This is not currently
// written into snapshots so they are unversioned. Note that depending on the
// protocol version being spoken, some otherwise understood RPC messages may be
// refused. See isVersionCompatible for details of this logic.
//
// There are notes about the upgrade path in the description of the versions
// below. If you are starting a fresh cluster then there's no reason not to
// jump right to the latest protocol version. If you need to interoperate with
// older, version 0 Raft servers you'll need to drive the cluster through the
// different versions in order.
//
// The version details are complicated, but here's a summary of what's required
// to get from an version 0 cluster to version 3:
//
// 1. In version N of your app that starts using the new Raft library with
//    versioning, set ProtocolVersion to 1.
// 2. Make version N+1 of your app require version N as a prerequisite (all
//    servers must be upgraded). For version N+1 of your app set ProtocolVersion
//    to 2.
// 3. Similarly, make version N+2 of your app require version N+1 as a
//    prerequisite. For version N+2 of your app, set ProtocolVersion to 3.
//
// During this upgrade, older cluster members will still have Server IDs equal
// to their network addresses. To upgrade an older member and give it an ID, it
// needs to leave the cluster and re-enter:
//
// 1. Remove the server from the cluster with RemoveServer, using its network
//    address as its ServerID.
// 2. Update the server's config to a better ID (restarting the server).
// 3. Add the server back to the cluster with AddVoter, using its new ID.
//
// You can do this during the rolling upgrade from N+1 to N+2 of your app, or
// as a rolling change at any time after the upgrade.
//
// Version History
//
// 0: Original Raft library before versioning was added. Servers running this
//    version of the Raft library use AddPeerDeprecated/RemovePeerDeprecated
//    for all configuration changes, and have no support for LogConfiguration.
// 1: First versioned protocol, used to interoperate with old servers, and begin
//    the migration path to newer versions of the protocol. Under this version
//    all configuration changes are propagated using the now-deprecated
//    RemovePeerDeprecated Raft log entry. This means that server IDs are always
//    set to be the same as the server addresses (since the old log entry type
//    cannot transmit an ID), and only AddPeer/RemovePeer APIs are supported.
//    Servers running this version of the protocol can understand the new
//    LogConfiguration Raft log entry but will never generate one so they can
//    remain compatible with version 0 Raft servers in the cluster.
// 2: Transitional protocol used when migrating an existing cluster to the new
//    server ID system. Server IDs are still set to be the same as server
//    addresses, but all configuration changes are propagated using the new
//    LogConfiguration Raft log entry type, which can carry full ID information.
//    This version supports the old AddPeer/RemovePeer APIs as well as the new
//    ID-based AddVoter/RemoveServer APIs which should be used when adding
//    version 3 servers to the cluster later. This version sheds all
//    interoperability with version 0 servers, but can interoperate with newer
//    Raft servers running with protocol version 1 since they can understand the
//    new LogConfiguration Raft log entry, and this version can still understand
//    their RemovePeerDeprecated Raft log entries. We need this protocol version
//    as an intermediate step between 1 and 3 so that servers will propagate the
//    ID information that will come from newly-added (or -rolled) servers using
//    protocol version 3, but since they are still using their address-based IDs
//    from the previous step they will still be able to track commitments and
//    their own voting status properly. If we skipped this step, servers would
//    be started with their new IDs, but they wouldn't see themselves in the old
//    address-based configuration, so none of the servers would think they had a
//    vote.
// 3: Protocol adding full support for server IDs and new ID-based server APIs
//    (AddVoter, AddNonvoter, etc.), old AddPeer/RemovePeer APIs are no longer
//    supported. Version 2 servers should be swapped out by removing them from
//    the cluster one-by-one and re-adding them with updated configuration for
//    this protocol version, along with their server ID. The remove/add cycle
//    is required to populate their server ID. Note that removing must be done
//    by ID, which will be the old server's address.

// These are versions of snapshots that this server can _understand_. Currently,
// it is always assumed that this server generates the latest version, though
// this may be changed in the future to include a configurable version.                                                                                                              //
// Version History
//
// 0: Original Raft library before versioning was added. The peers portion of
//    these snapshots is encoded in the legacy format which requires decodePeers
//    to parse. This version of snapshots should only be produced by the
//    unversioned Raft library.
// 1: New format which adds support for a full configuration structure and its
//    associated log index, with support for server IDs and non-voting server
//    modes. To ease upgrades, this also includes the legacy peers structure but
//    that will never be used by servers that understand version 1 snapshots.
//    Since the original Raft library didn't enforce any versioning, we must
//    include the legacy peers structure for this version, but we can deprecate
//    it in the next snapshot version.

This isn't super great, but will give us a path to keep things compatible with existing clusters as we roll out the changes. We can make some higher-level tooling in Consul to help orchestrate this.

slackpad · 2016-07-18T23:23:50Z

@ongardie this should address TODO item 2 from #84. If you have a few minutes PTAL - thanks!

sean- · 2016-07-19T00:45:37Z

api.go

-		logger.Printf("[WARN] raft: No server ID given, using network address: %v. This default will be removed in the future. Set server ID explicitly in config.",
+	if protocolVersion < 1 || localID == "" {
+		// During the transition to the new ID system, keep this as an
+		// INFO level message. Once the new scheme has been out for a


Can you comment this as COMPAT and a TODO so it's easily grepable that there is a pending action to enable a depreciate WARN message?

ongardie-sfdc · 2016-07-20T18:50:34Z

sorry @slackpad, was too busy generating code, will review

slackpad · 2016-07-20T20:32:14Z

@ongardie-sfdc thanks! If you have a chance please take a look at #140 as well - it's based on this branch so relative to this PR.

ongardie-sfdc · 2016-07-20T21:06:22Z

config.go

+		config.ProtocolVersion > ProtocolVersionMax {
+		return fmt.Errorf("Protocol version %d must be >= %d and <= %d",
+			config.ProtocolVersion, ProtocolVersionMin, ProtocolVersionMax)
+	}
 	if config.HeartbeatTimeout < 5*time.Millisecond {


Add a new validation check: when ProtocolVersion > 0, LocalID should be nonempty.

ongardie-sfdc · 2016-07-20T21:27:12Z

This sets some good groundwork.

I think having the config dictate which version the server speaks may be too simplistic. Here's an example: Bring up server A with version n + 1 while servers B and C still have version n. Server A can send B a LogConfiguration entry, then crash. B can become leader and have to replicate that LogConfiguration. But B isn't allowed to speak n + 1 yet. That's awkward.

slackpad · 2016-07-20T21:36:40Z

@ongardie-sfdc agree the manual part isn't ideal but we don't have a good mechanism to negotiate. I was thinking I'd ship the next release of Consul (N) w/protocol version set to 0 so the upgrade works with the old servers, but all new servers will be able to understand 1. The release after that (N+1) would switch to protocol version 1, and we'd make N a prerequisite for N+1.

I suppose we could try to add a version to some existing messages and then upshift if we see that, but given that we can add incompatible things to the log I don't think that will work if an older server suddenly shows up later.

ongardie-sfdc · 2016-07-20T21:44:27Z

... I don't think that will work if an older server suddenly shows up later.

If an older server suddenly shows up later with old code, the best thing for it to do is crash. If we're clever, there might be some way to make that happen, though there aren't many panics around.

slackpad · 2016-07-20T21:49:55Z

I just meant if we are in protocol 0 mode we should work with old servers, even if we bootstrap a new cluster. In the > 0 world, old servers will print errors for all peer changes and will never be able to accept them. Will have to do a little thinking/testing to see if that's enough to keep them out of a cluster. With our current message serialization library we could add the protocol version to all our RPCs and reject anything from 0 (which it will default to on new servers when receiving an old message) if we are > 0 - this might be a clean way to stop shenanigans after the upgrade is complete.

sean- · 2016-07-21T16:03:04Z

Automatically add them to the blacklist and ignore them.

ongardie-sfdc · 2016-07-30T00:51:12Z

raft.go

+		}
+
+	} else {
+		r.logger.Printf("[ERR] raft: Ignoring un-versioned command: %#v", rpc.Command)


How does this work? Don't we need to listen to unversioned commands until we've reached some version ourselves?

And isn't that the same as ErrUnsupportedProtocol?

Unversioned commands come through as version 0 because of the way MsgPack decoding works - the new receivers get zero-valued version info. I updated the comment to reflect this.

If we get a message without a version at all it's kind of a bug in the code, so I wanted that to be distinguishable from a normal "talking to an old, unsupported thing" error.

This can later be the place where we put a cluster ID.

ongardie-sfdc · 2016-07-30T01:04:56Z

recovery.go

+// file at that location, returning a recovery manager to apply this
+// configuration at startup. If there's no recovery defined then this will return
+// nil, which is a valid thing to pass to NewRaft().
+func NewPeersJSONRecovery(base string) (*PeersJSONRecovery, error) {


A "recovery manager" doesn't seem to really be a thing, and I don't think it needs to be. How about simply:

func ReadPeersJSON(path) (Configuration, error)

The caller would have the burden of path := filepath.Join(base, "peers.json") and os.Remove(path), which I think would be entirely tolerable.

And then I'd rename this file to peersjson.go or something like that.

Good catch - I originally had this horrible thing that was passed in to NewRaft() that waited for things to replicate, etc. This is a leftover from that so I'll rename this and remove the Disarm() shiz.

ongardie-sfdc · 2016-07-30T01:27:56Z

config.go

+// well as Raft-specific log entries) that this server can _understand_. Use
+// the ProtocolVersion member of the Config object to control the version of
+// the protocol to use when _speaking_ to other servers. This is not currently
+// written into snapshots so they are unversioned. Note that depending on the


Any reason not to write some version number into snapshots?

Since the snapshots are somewhat divorced from Raft log entry types I figured it would be overkill. You can think of them as being at "snapshot format version 0" :-)

One thing we need to guarantee is that if you don't understand something in your log or snapshot, you crash. It's not cool to skip it like we're doing now.

This could be a server was running new code, wrote out a log entry/snapshot with important new information, then somehow got downgraded.

For log entries, we can probably agree to do this using new log entry types. But we're missing the check in NewRaft and Recover right now.

For snapshots, we don't have a way to know.

Otherwise, this could be that we've deleted the code that reads in old log entries/snapshot formats. Someday we need to remove that cruft. How can we guarantee that no log entry or snapshot is so old? I think the solution has to be of the form: write out a new snapshot in a newer format before you upgrade to code that stops knowing how to read older log entry/snapshot formats. How do we arrange that?

BTW, I think all this might lead us to having multiple versions covering different things, at least internally. You can't read in snapshots with newer versions than what you support, but the snapshot format won't change on every upgrade. So I think it needs its own class of version number.

This should probably be done as a separate PR, but we could add a snapshot version as well as a function similar to RecoverCluster() that doesn't take a new configuration but rolls up the logs and takes a snapshot, writing that out in the latest version. That tool should allow for us to deprecate things by saying "you must run this on all your servers prior to running X (it could maybe be in NewRaft() as something we do automatically for folks).

a function similar to RecoverCluster() that doesn't take a new configuration but rolls up the logs and takes a snapshot, writing that out in the latest version.

That implicitly commits everything, so we can't do that safely.

it could maybe be in NewRaft() as something we do automatically for folks

Maybe after an upgrade once your commit index has reached ???, you force a snapshot. Filling in the ??? is tricky, hmm.

Re-reviewed your Raft paper and I see why this is dangerous, even at startup time. After an outage we have to do the best we can because we don't have the right peers to determine if the last few logs were committed or not, so the best we can do is assume they were. We wouldn't want to do this process any other time. The non-outage snapshot policy is to snapshot committed stuff only, so I see why we can't roll up willy-nilly without implicitly committing.

It's not going to be safe to ingest the old peers.json initially like I'm doing now over in the Consul integration branch, so I'll get rid of that behavior.

Fixed this over in Consul by adding some code to blow away the peers.json file on the first boot - hashicorp/consul@771ba18.

Sorry for my late night thrashing here - I edited the last few comments based on my current understanding.

Tomorrow I'll take a shot at versioning snapshots and making things die for log entries that aren't understood - that'll make things more robust.

Still not sure what a great deprecation story is for when it's safe to delete code for handling old versions because of the uncertainty of snapshot timing. We could do something simple like run a cleanup goroutine that will fire off a snapshot request 15 minutes after a to-be-deprecated log entry gets committed (or if we see one in the log at startup).

This would only cause up to 4 snapshots per hour (or whatever) even if there were tons of these deprecated entries flying by, and would eliminate the deprecated entries within 15 minutes of finishing an upgrade once there were no servers producing the old entries any longer. The ongoing cost would be very low as long as the cleanup routine doesn't need to scan the logs; it can just select on a one-item buffered channel that gets pinged in a non-blocking fashion whenever we encounter something that needs cleanup.

Adding the snapshot versioning uncovered an issue where we have dropped support for old peers in the snapshot (we can read it but not produce it) - we will need to write both in order to upgrade so that old servers can read new snapshots during the migration. I'll unpack this at the same time as snapshot versioning.

…types.

…son migrator.

ongardie-sfdc · 2016-07-30T02:26:06Z

Ok, sorry for delaying a few days. I think I'm done with this pass of reviewing. It's a hairy issue, but thanks for taking it on, and you're making great progress. Have a good weekend.

slackpad · 2016-07-30T02:28:03Z

@ongardie-sfdc thanks for taking a look. I'm finishing up the last bit of changes now and will push up shortly. There's a follow on change in #143 that I'll rebase off of this.

Appreciate your detailed scrub of this one - have a good weekend as well!

…on of the library.

slackpad · 2016-07-31T06:02:25Z

Ok I think I've got all the issues addressed. Unless there are any huge bugs I'd like to merge this and fix things in separate PRs at this point. I was able to form a cluster with a legacy Consul 0.6.4 binary and a build using this and #143!

ongardie-sfdc · 2016-08-01T17:03:14Z

config.go

+//    it in the next snapshot version.
+const (
+	SnapshotVersionMin = 0
+	SnapshotVersionMax = 1


What controls which SnapshotVersion the server generates?

Right now it's set to SnapshotVersionMax, which would be problematic if it were non-backwards-compatible. For example, a new server could generate a new snapshot, send it to an old server, and that server would panic.

It seems like this should either be a function of the ProtocolVersion, or it should be specified separately in the config.

I'll add a function for now that takes the protocol version and returns the snapshot version to use so we don't hard code the max constant all over the place.

ongardie-sfdc · 2016-08-01T17:08:39Z

@slackpad left you two more comments (1. SnapshotVersion, 2. dispositionRPC), then lgtm to merge. The issue of when is it safe to remove support for old log entry types and snapshot versions can be a separate PR.

…ersion.

James Phillips added 2 commits July 17, 2016 10:05

Cleans up an error message.

1521771

Adds basic protocol versioning for in-place upgrades.

19587a7

slackpad force-pushed the f-upgrade-and-recovery branch from d7d93e5 to 19587a7 Compare July 18, 2016 23:18

slackpad mentioned this pull request Jul 18, 2016

Cleanup Meta Ticket #84

Closed

14 tasks

Adds handler for old-style log entries at startup.

01ec996

sean- reviewed Jul 19, 2016
View reviewed changes

James Phillips added 2 commits July 18, 2016 18:14

Cleans up based on initial review feedback.

6f4a02f

Tweaks comment for config scanner helper.

22d11a0

slackpad mentioned this pull request Jul 19, 2016

Adds in-place manual recovery support. #140

Merged

ongardie-sfdc reviewed Jul 20, 2016
View reviewed changes

James Phillips added 4 commits July 21, 2016 13:49

Makes ID required when using protocol version > 0.

1131433

Renames checkAndProcessConfigurationLog to something more sensible.

dbf1e95

Locks out old add and remove peer APIs under new protocol.

2d8a01b

Adds a comment clarifying what the protocol version covers.

ec1689f

Fixes a bad comment.

25fb028

ongardie-sfdc reviewed Jul 30, 2016
View reviewed changes

Sets up for a more generic RPC header.

d20f436

This can later be the place where we put a cluster ID.

ongardie-sfdc reviewed Jul 30, 2016
View reviewed changes

Fixes some confusing comments.

b772c15

ongardie-sfdc reviewed Jul 30, 2016
View reviewed changes

slackpad added 3 commits July 29, 2016 18:41

Fixes inmem store to properly track all types of delete operations.

e09eba6

Adds a comment about protocol versions with the deprecated log entry …

d946c02

…types.

Removes all references to recovery manager and simplifies the peers.j…

e9a8372

…son migrator.

slackpad added 8 commits July 29, 2016 19:50

Introduces a new protocol version and makes 0 really be the old versi…

3c4c62b

…on of the library.

Makes the user set the LocalID properly with no automated twiddling.

0f35b78

Fixes typo in config version history.

815b477

Cleans up verbiage in config.go comments.

682c01e

Adds versioning for snapshots.

afbf08b

Panics when an unknown log entry type is encountered.

8353de6

Adds peers back to request vote response to avoid panics.

e991ffd

Tweaks comments about snapshot versioning.

2baad4a

ongardie-sfdc reviewed Aug 1, 2016
View reviewed changes

slackpad added 2 commits August 1, 2016 10:25

Refactors to checkRPCHeader.

6a6cb07

Adds types for version numbers and a stub for choosing the snapshot v…

038c699

…ersion.

slackpad merged commit 6e3917d into issue-84-integration Aug 1, 2016

slackpad deleted the f-upgrade-and-recovery branch October 7, 2016 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds in-place upgrade and manual recovery support. #139

Adds in-place upgrade and manual recovery support. #139

slackpad commented Jul 18, 2016 •

edited

Loading

slackpad commented Jul 18, 2016 •

edited

Loading

sean- Jul 19, 2016

ongardie-sfdc commented Jul 20, 2016

slackpad commented Jul 20, 2016

ongardie-sfdc Jul 20, 2016

ongardie-sfdc commented Jul 20, 2016

slackpad commented Jul 20, 2016

ongardie-sfdc commented Jul 20, 2016

slackpad commented Jul 20, 2016 •

edited

Loading

sean- commented Jul 21, 2016 via email

ongardie-sfdc Jul 30, 2016

ongardie-sfdc Jul 30, 2016

slackpad Jul 30, 2016

slackpad Jul 30, 2016

ongardie-sfdc Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016

ongardie-sfdc Jul 30, 2016

ongardie-sfdc Jul 30, 2016

slackpad Jul 30, 2016

ongardie-sfdc Jul 30, 2016

slackpad Jul 30, 2016

ongardie Jul 30, 2016

slackpad Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016

slackpad Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016

ongardie-sfdc commented Jul 30, 2016

slackpad commented Jul 30, 2016

slackpad commented Jul 31, 2016 •

edited

Loading

ongardie-sfdc Aug 1, 2016

slackpad Aug 1, 2016

ongardie-sfdc commented Aug 1, 2016

Adds in-place upgrade and manual recovery support. #139

Adds in-place upgrade and manual recovery support. #139

Conversation

slackpad commented Jul 18, 2016 • edited Loading

slackpad commented Jul 18, 2016 • edited Loading

Choose a reason for hiding this comment

ongardie-sfdc commented Jul 20, 2016

slackpad commented Jul 20, 2016

Choose a reason for hiding this comment

ongardie-sfdc commented Jul 20, 2016

slackpad commented Jul 20, 2016

ongardie-sfdc commented Jul 20, 2016

slackpad commented Jul 20, 2016 • edited Loading

sean- commented Jul 21, 2016 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ongardie-sfdc Jul 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slackpad Jul 30, 2016 • edited Loading

Choose a reason for hiding this comment

slackpad Jul 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slackpad Jul 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ongardie-sfdc commented Jul 30, 2016

slackpad commented Jul 30, 2016

slackpad commented Jul 31, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ongardie-sfdc commented Aug 1, 2016

slackpad commented Jul 18, 2016 •

edited

Loading

slackpad commented Jul 18, 2016 •

edited

Loading

slackpad commented Jul 20, 2016 •

edited

Loading

ongardie-sfdc Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016 •

edited

Loading

slackpad Jul 30, 2016 •

edited

Loading

slackpad commented Jul 31, 2016 •

edited

Loading