-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have a common ERS for both VtOrc and Vtctl #8492
Conversation
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
40f6770
to
08103c3
Compare
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: GuptaManan100 <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
…l database in vtorc Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more pass on my end. One thing that I'm struggling a bit is convincing myself that we didn't miss anything important that was happening in VOrchestrator logic, that is not ported to the new ERS.
I haven't look closely at the tests, so I might be able to get more clarity on this as I take a closer look at that.
@@ -360,7 +360,9 @@ func StopReplicas(replicas [](*Instance), stopReplicationMethod StopReplicationM | |||
|
|||
// StopReplicasNicely will attemt to stop all given replicas nicely, up to timeout | |||
func StopReplicasNicely(replicas [](*Instance), timeout time.Duration) [](*Instance) { | |||
return StopReplicas(replicas, StopReplicationNice, timeout) | |||
stoppedReplicas := StopReplicas(replicas, StopReplicationNice, timeout) | |||
stoppedReplicas = RemoveNilInstances(stoppedReplicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This is great context. I think we should document this in the comments for RemoveNilInstances
so future readers know why we need to remove the nil
instances.
@@ -815,113 +613,120 @@ func checkAndRecoverDeadPrimary(analysisEntry inst.ReplicationAnalysis, candidat | |||
if !(forceInstanceRecovery || analysisEntry.ClusterDetails.HasAutomatedPrimaryRecovery) { | |||
return false, nil, nil | |||
} | |||
tablet, err := TabletRefresh(analysisEntry.AnalyzedInstanceKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I don't think they can be entirely removed (that was not my suggestion). But we should strive to keep it at a minimum. Here we are adding more calls that we used to have.
In my opinion this should be revisited before GA.
return false, topologyRecovery, err | ||
|
||
// check if we have received an ERS in progress, if we do, we should not continue with the recovery | ||
if checkAndSetIfERSInProgress() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let's also talk about this in our sync. Curious if this is needed for GA.
|
||
// find the valid candidates for becoming the primary | ||
// this is where we check for errant GTIDs and remove the tablets that have them from consideration | ||
validCandidates, err = FindValidEmergencyReparentCandidates(statusMap, primaryStatusMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In orchestrator, as part of finding replica candidates (chooseCandidateReplica), it looks like some replicas get removed by various checks. One in particular that I was trying to find here are the ones in CanReplicateFrom
. Were those removed? Or am I missing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those checks were MySQL version and binlog format checks. They have been removed for now.
Bringing back from a conversation in Slack with Deepthi: Some logic in the original VOrchestrator hasn't been ported over. This seems to be because:
Some examples of these changes are (or I couldn't find the equivalent in the new code):
We should have a detailed accounting of how this new implementation is deviating from the original one. @~GuptaManan100, I think you did a great job documenting the new flow, but calling out explicitly what was left out will be super helpful. That way, people can have context on what to expect from VOrchestrator when comparing it to Orchestrator. I think that it would be great to document:
|
The In my opinion, setting |
@rafael |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed about half the changes, before my brain shut down 😛
I still need to review durability.go
, emergency_reparenter.go
,ers_sorter.go
etc. these actually seem to be the more critical changes 🤕
// Copy and throw out primary SID from consideration, so we don't mutate input. | ||
otherSetNoPrimarySID := make(Mysql56GTIDSet, len(otherSet)) | ||
for sid, intervals := range otherSet { | ||
if sid == status.SourceUUID { | ||
continue | ||
} | ||
otherSetNoPrimarySID[sid] = intervals | ||
} | ||
|
||
otherSets = append(otherSets, otherSetNoPrimarySID) | ||
otherSets = append(otherSets, otherSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were even if two servers had the exact same GTID set but had different sources set, we were flagging them both as having errant GTIDs.
This confuses me a bit. You only look for errant GTIDs in a descendant of a server; comparing siblings or cousins doesn't have the same strong check guarantees, the way I understand it.
But I'm unfamiliar with this code, I'm unsure how it's being used.
// That's it! We must do recovery! | ||
// TODO(sougou): This function gets called by GracefulPrimaryTakeover which may | ||
// need to obtain shard lock before getting here. | ||
unlock, err := LockShard(analysisEntry.AnalyzedInstanceKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is LockShard
removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LockShard is called inside the ERS code so we should not call it outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, follow question, and I'm asking because I'm not sure how the flow works; basically we shouldn't even begin the operation if LockShard
fails. My question is: since LockShard
is called inside ERS code, are there any steps taken before that point, that we should avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, there are no operations that require the shard to be locked before ERS is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All those operations are for counters and deciding whether we should run an operation at all in the first place. But yes, we should be doing a tablet refresh after the shard is locked. But this problem is out of scope of this PR and will be addressed before GA release of vtorc
return fmt.Errorf("durability policy %v not found", name) | ||
} | ||
log.Infof("Durability setting: %v", name) | ||
curDurabilityPolicy = newDurabilityCreationFunc(durabilityParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a mutex now, so that if/when we eventually support dynamic durability changes, we are protected.
This makes sense and I assumed it was something along those lines. We never used that as well, so. I have no worries on this being removed :D I'm highlighting that it seems that there are decisions to remove certain things (with good reasons) but it's not obvious to the readers of the PR. Being very explicit about those seems good context for folks trying to get involved in this work. |
Signed-off-by: Manan Gupta <[email protected]>
I took one more pass and I didn't find anything new that we haven't discussed already. It seems that there are still few outstanding comments from Shlomi. From my perspective, once those are resolved this is good to be merged and keep iterating. As discussed, if we can be super explicit on this it will be good context for other folks following this work. From my notes on the last session, for the next iteration we will need to expand further on:
|
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
@ajm188 could you take a look now that I have addressed all of your review comments? |
all my blocking comments have been addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more rename we should do before merging, but i've gotten myself out of the blocking path
Signed-off-by: Manan Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not cover the entire changes. Even though I'm naturally very familiar with orchestrator
code, I found this PR to be a bit overwhelming. Most of my review comments were based on the logic in some functions, but I lack the understanding of how everything is combined. I know there are recorded meetings I can later watch.
Notable I didn't do a good review job on go/vt/vtctl/reparentutil/...
I do appreciate that there seems to be good testing added. They look legit. They're very well commented.
return fmt.Errorf("durability policy %v not found", name) | ||
} | ||
log.Infof("Durability setting: %v", name) | ||
curDurabilityPolicy = newDurabilityCreationFunc(durabilityParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout this file I only see use of Neutral
and MustNot
. I'd rather see Prefer
and MustNot
.
Logically, the two options are the same. Both Neutral
and Prefer
are good as candidates, and obviously "better" than MustNot
.
However, Prefer
is more indicative that "yes, this server is really a good one". In orchestrator
code, when orchestrator
sees a prefer
server, it is able to cut short further investigation; whereas when orchestrator
promotes a Neutral
server, it proceeds to check "is there any server better than this?".
Non-blocking comment for your consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching up with the latest updates. From my perspective, good to merge and keep iterating.
I think we should be super loud with the community when this makes it in the next release, to fully test ERS in their environments.
Signed-off-by: Manan Gupta <[email protected]>
Yes, you are right @shlomi-noach, we are going to make a lot more changes to the durability policies and will also use prefer promote rules. |
Description
Overview
VtOrc did not use the
Emergency Reparent Shard
functionality in case of primary failure. This PR changes that behaviour so that all the code paths use a commonEmergency Reparent Shard
.VtOrc had a durability policy along with extra configurations that it used to make decisions while running reparent operations. Since we want the new ERS to also have access to this information, the durability policies have been moved to the
reparentutil
package and duarbility can be specified in thevtctl
andvtctld
binaries when they are bought up. Also an additional flag has been added toEmergencyReparentShard
which allows the users to prevent any cross cell promotions. Also, as part of reconciling the differences between the two code paths, we have chosen to keep the best of both worlds, which means that VtOrc now has the additional capability to also detect errant GTIDs and vtctld has improved by the two step process that VtOrc had of promoting a primary candidate which is the most advanced, and then later checking if we can improve that promotion.Additional metrics have been added to keep track of the number of ERS run, number of successful and failed ERS runs.
New Emergency Reparent Shard Steps
DRAINED
,RESTORE
orBACKUP
.MustNotPromoteRule
. This additional step also fixes the issue EmergencyReparentShard promotes rdonly tablet to primary #7441 since all rdonly tablets are assigned this promotion rule.Changes to vtctld Emergency Reparent Shard -
Changes to VtOrc -
Changes not in this PR but will be addressed later -
a. Fixing semi sync on the primary still uses the vttablet flag. This flag should be deprecated and the durability policies should be used instead.
b. ERS code exits in case of more than one tablets are not reachable. We should instead use the durability policies to decide whether we can run a successful promotion or not even in case of multiple failures.
c. Finding errant GTIDs code does not flag a GTID as errant as long as it is present in two servers. We should instead use the durability policies to decide if they are errant or not.
Other changes in this PR -
Related Issue(s)
Checklist
Deployment Notes
Needs to be specified in the release notes