-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix problematic ERS cases #8319
Conversation
Signed-off-by: Jacques Grove <[email protected]>
just a GTID UUID/SID without an offset/interval. Signed-off-by: Jacques Grove <[email protected]>
automatically in some cases (e.g. ERS); however, we were not resetting this sentinel flag in at least two cases: 1) When we have re-parented successfully (setMasterLocked) and 2) When we have successfully promoted a replica (PromoteReplica) Signed-off-by: Jacques Grove <[email protected]>
will make it easier to debug from log output. Signed-off-by: Jacques Grove <[email protected]>
Signed-off-by: Jacques Grove <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…future we may extend to _any uk_ Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing something since my logic is inverted to the changes in this PR, please see inline comments.
@@ -619,13 +632,16 @@ func (tm *TabletManager) setMasterLocked(ctx context.Context, parentAlias *topod | |||
return err | |||
} | |||
} | |||
// Clear replication sentinel flag for this replica | |||
tm.replManager.setReplicationStopped(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why setting to false
? It seems to me like if anything we should set to true
, because we're replicating up to a certain point, then stopping, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is that "false" means that "replication is no longer stopped on purpose; please restart if necessary"; while "true" means "replication is stopped on purpose, please do not restart".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks for clarifying, the function name is confusing.
@@ -752,6 +770,10 @@ func (tm *TabletManager) PromoteReplica(ctx context.Context) (string, error) { | |||
return "", err | |||
} | |||
|
|||
// Clear replication sentinel flag for this master, | |||
// or we might block replication the next time we demote it | |||
tm.replManager.setReplicationStopped(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a MASTER
tablet marked with false
? I'd again think we need true
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic as above; we want to remove the on-disk flag that will prevent vttablet from "fixing" replication at some later point in time, if this master ever becomes a replica again.
delete(differenceSet, sid) | ||
} else { | ||
differenceSet[sid] = diffIntervals | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
CI: Skip TestConsolidatorMemoryLimits, refactor e2e test cluster setup to reduce vreplication e2e test flakiness
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Add release notes for all the versions
Since Go has been upgraded to 1.16, setting GO111MODULE is no longer needed as it is now equivalent to the current default. Signed-off-by: Dirkjan Bussink <[email protected]>
…hecksum-rbr Ignore SBR statements from pt-table-checksum
This adds the `-trimpath` option so that the release binaries don't include builder specific paths and are independent from the path where they have been built. Signed-off-by: Dirkjan Bussink <[email protected]>
With recent Go versions, GOPATH has a built in fallback to use `~/go` when the environment variable is not set. This also works for these pre commit hooks. With this change it's possible to commit without setting GOPATH at all and having it fall back to the defaults that Go itself provides. Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Small build improvements
gomod: do not replace GRPC
…r cross-shard queries for Gen4 Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Incrementing to release 12.0.0-SNAPSHOT
Gen4: order by and enhanced scoping information for table exprs
…t_conns [grpctmclient] Add support for (bounded) per-host connection reuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…tidset Signed-off-by: Jacques Grove <[email protected]>
Going to close this and do a new (clean) PR, this is now polluted with merge commits. |
do_not_replicate
flag on disk in some cases #8333 where we do not clear thedo_not_replicate
sentinel flag that we create during ERS (and other steps) to ensure that replication reporter does not automatically restart replication after we have (intentionally) stopped it.