Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-2.1: storage: take an engine checkpoint during failing consistency checks #42042

Merged
merged 1 commit into from
Oct 31, 2019

Conversation

irfansharif
Copy link
Contributor

Backport 1/2 commits from #36867.

Useful as part of #42011.

+cc @cockroachdb/release


This takes a checkpoint on the nodes with replicas of a failing range,
before the failure leads to nodes shutting down. The checkpoint will, for
the replicas of the affected range, be taken at the same Raft log position.

Release note: None

@irfansharif irfansharif requested review from tbg and a team October 30, 2019 19:47
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif removed the request for review from a team October 30, 2019 19:47
@tbg
Copy link
Member

tbg commented Oct 30, 2019

Clean backport? Wonderful!

We've repeatedly wanted this to preserve state when finding replica
inconsistencies.

See, for example:

cockroachdb#36861

Release note: None
@irfansharif
Copy link
Contributor Author

Failed with #31778, fix in #32899 was not backported. Retrying.

@irfansharif
Copy link
Contributor Author

Failing with Example-ORM failures with what looks like network flakes?

[TestDjango/FirstRun] main_test.go:163: Get http://localhost:6543/ping/: dial tcp 127.0.0.1:6543: getsockopt: connection refused

If safe to ignore, how do I get past the failing TC check here?

@irfansharif
Copy link
Contributor Author

irfansharif commented Oct 31, 2019

Ok, I see the same example-orms failures on release-v2.1 (was a major PITA to get this running, so much is broken). Very surprisingly the example-orms test suite isn't a pinned version, the changes made on example-orms master are reflected in test suites run against all branches of CRDB thereonforth. This is almost definitely not what we want.

@tbg: It's safe to disable branch protection for this PR, I don't have access to do so.

+cc @rohany, @rafiss: all future backports to release-v2.1 (and likely others) are going to fail. I don't know if this is something we care about, but TC will fail+block for future backports.

@tbg
Copy link
Member

tbg commented Oct 31, 2019

Merging at your request :shipit:

@tbg tbg merged commit c114859 into cockroachdb:release-2.1 Oct 31, 2019
@irfansharif irfansharif deleted the backport2.1-36867 branch November 18, 2019 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants