Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problematic ERS cases #8319

Closed
wants to merge 123 commits into from
Closed

Conversation

aquarapid
Copy link
Contributor

@aquarapid aquarapid commented Jun 12, 2021

just a GTID UUID/SID without an offset/interval.

Signed-off-by: Jacques Grove <[email protected]>
automatically in some cases (e.g. ERS);  however, we were not resetting
this sentinel flag in at least two cases:  1) When we have re-parented
successfully (setMasterLocked) and 2) When we have successfully
promoted a replica (PromoteReplica)

Signed-off-by: Jacques Grove <[email protected]>
@aquarapid aquarapid marked this pull request as draft June 12, 2021 00:39
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something since my logic is inverted to the changes in this PR, please see inline comments.

@@ -619,13 +632,16 @@ func (tm *TabletManager) setMasterLocked(ctx context.Context, parentAlias *topod
return err
}
}
// Clear replication sentinel flag for this replica
tm.replManager.setReplicationStopped(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why setting to false? It seems to me like if anything we should set to true, because we're replicating up to a certain point, then stopping, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is that "false" means that "replication is no longer stopped on purpose; please restart if necessary"; while "true" means "replication is stopped on purpose, please do not restart".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks for clarifying, the function name is confusing.

@@ -752,6 +770,10 @@ func (tm *TabletManager) PromoteReplica(ctx context.Context) (string, error) {
return "", err
}

// Clear replication sentinel flag for this master,
// or we might block replication the next time we demote it
tm.replManager.setReplicationStopped(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a MASTER tablet marked with false? I'd again think we need true here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as above; we want to remove the on-disk flag that will prevent vttablet from "fixing" replication at some later point in time, if this master ever becomes a replica again.

delete(differenceSet, sid)
} else {
differenceSet[sid] = diffIntervals
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@deepthi deepthi requested a review from ajm188 June 21, 2021 21:10
systay and others added 17 commits July 6, 2021 10:36
CI: Skip TestConsolidatorMemoryLimits, refactor e2e test cluster setup to reduce vreplication e2e test flakiness
Signed-off-by: Vicent Marti <[email protected]>
Add release notes for all the versions
Since Go has been upgraded to 1.16, setting GO111MODULE is no longer
needed as it is now equivalent to the current default.

Signed-off-by: Dirkjan Bussink <[email protected]>
…hecksum-rbr

Ignore SBR statements from pt-table-checksum
This adds the `-trimpath` option so that the release binaries don't
include builder specific paths and are independent from the path where
they have been built.

Signed-off-by: Dirkjan Bussink <[email protected]>
With recent Go versions, GOPATH has a built in fallback to use `~/go`
when the environment variable is not set. This also works for these pre
commit hooks.

With this change it's possible to commit without setting GOPATH at all
and having it fall back to the defaults that Go itself provides.

Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Florent Poinsard <[email protected]>
…r cross-shard queries for Gen4

Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Incrementing to release 12.0.0-SNAPSHOT
Gen4: order by and enhanced scoping information for table exprs
…t_conns

[grpctmclient] Add support for (bounded) per-host connection reuse
@aquarapid aquarapid marked this pull request as ready for review July 6, 2021 18:53
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aquarapid
Copy link
Contributor Author

Going to close this and do a new (clean) PR, this is now polluted with merge commits.

@aquarapid aquarapid closed this Jul 6, 2021
@aquarapid aquarapid mentioned this pull request Jul 6, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERS does not clear do_not_replicate flag on disk in some cases ERS seems to break in errant GTID check