-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PlannedReparent fixes #6050
PlannedReparent fixes #6050
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We'll wait for @enisoc to have a look also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't finished a full pass yet, but sending what I have so far.
} | ||
return mysql.EncodePosition(pos), nil | ||
return mysql.EncodePosition(rs.RelayLogPosition), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if there are errant transactions on this replica, they'll show up in the returned position. Does that end up protecting us against any scenario? Do we lose that if we return the relay log pos instead?
Maybe we should check for this case and fail if our expected invariant does not hold? If one of relayPos or masterPos are a superset of the other (AtLeast()
), then return the one that's farther ahead. If neither is a superset of the other, return an error saying we detected errant transactions and this replica should not be used as a master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if masterPos is a superset of relayPos, that is exactly the errant transaction situation. The latest commit addresses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on experiments with relayLogPosition, we have decided to exclude that logic from this PR, and do it as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be another one of those changes that makes it critical to use the recommended upgrade order since vtctld expects a new tabletmanager RPC, right? What's the process for making a sufficient amount of noise about that?
go/vt/wrangler/reparent.go
Outdated
@@ -763,9 +797,11 @@ func (maxPosSearch *maxReplPosSearch) processTablet(tablet *topodatapb.Tablet) { | |||
maxPosSearch.wrangler.logger.Warningf("failed to get replication status from %v, ignoring tablet: %v", topoproto.TabletAliasString(tablet.Alias), err) | |||
return | |||
} | |||
replPos, err := mysql.DecodePosition(status.Position) | |||
// The replica that is most progressed may actually be the one with the furthest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check both positions and use the one that's a superset of the other. If neither is a true superset, then we have detected errant transactions and it isn't safe to promote this replica.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using RelayLogPosition has been deferred to a future PR.
go/vt/wrangler/reparent.go
Outdated
@@ -934,27 +970,28 @@ func (wr *Wrangler) emergencyReparentShardLocked(ctx context.Context, ev *events | |||
if !ok { | |||
return fmt.Errorf("couldn't get master elect %v replication position", topoproto.TabletAliasString(masterElectTabletAlias)) | |||
} | |||
masterElectPos, err := mysql.DecodePosition(masterElectStatus.Position) | |||
// Use RelayLogPosition to determine most advanced tablet | |||
masterElectPos, err := mysql.DecodePosition(masterElectStatus.RelayLogPosition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same suggestion here about looking at both positions whenever both are known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using RelayLogPosition has been deferred to a future PR.
Fair point. We will announce this on vitess slack, and put it in release notes. I will also edit the PR description and add a notice. |
… tablet is written to topo Signed-off-by: deepthi <[email protected]>
PromoteSlaveWhenCaughtUp RPC Signed-off-by: deepthi <[email protected]>
Signed-off-by: deepthi <[email protected]>
…th current master Signed-off-by: deepthi <[email protected]>
|
||
if err := wr.tmc.SetReadWrite(rwCtx, masterElectTabletInfo.Tablet); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't put my finger on it, but this makes me feel vaguely uneasy. Is it clear in your mind how we're confident at this point that no other tablet's mysqld is read-write? I'm having trouble putting together that chain of reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replay the steps:
- Mark old master read-only
- Designate new master in topo (mastership+timestamp)
- Mark new master read-write
- Do the rest of the steps
If something fails after step 1, the topo will return the old one as master, and we just have to PRS on it.
If something fails after step 2, the topo will return the new one as master, and we perform steps 3 onwards.
This is probably your discomfort (which I also feel a bit): Is there a situation where the topo will return the old one as master after we complete step 2? By our reasoning of timestamps, it should not, because we compare all timestamps before identifying the true master.
I would feel slightly better if we marked the old master as replica before step 2, at least as best effort. But you and deepthi convinced me that it's unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning seems sound. I had included a new test that re-runs PRS on the candidate master after first simulating a failure in step 2. I could add more tests like that (simulating failures at other points in the sequence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced as well. Thanks.
Signed-off-by: deepthi <[email protected]>
Signed-off-by: deepthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR implements parts 1-3 listed in #5991. Part 4 will be done as a separate PR.
RELEASE NOTE: ACTION REQUIRED
When updating from a version before this PR to a version after it, it is critical that you follow the recommended upgrade order. In particular, you must upgrade all the vttablets in the cluster before upgrading any of the vtctlds.
Similarly, if you need to downgrade from a version after this PR to a version before it, you must downgrade in the reverse order: downgrade all vtctlds before downgrading any vttablets.