Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport-19.1: storage: prevent crash migrating from 19.1-beta into 19.1-rcX #36719

Merged
merged 1 commit into from
Apr 10, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Apr 10, 2019

Backports #36714.
cc @cockroachdb/release.


When I landed the change to stop sending the Raft log in snapshots, I
gated this on whether the truncated state had already been unreplicated
for the range. However, this wasn't enough because older 19.1 betas knew
about unreplicated truncated state and yet couldn't handle a regressing
truncated state, which sending these snapshots could introduce. As a
result, 19.1-beta nodes could crash while running mixed with 19.1-rcX.
(Simply restarting those nodes with the upgraded binary should fix the
problem).

This PR breaks one of our rules around not introducing historical
cluster versions, but in this case it's necessary and also shouldn't
have any adverse effects.

See #36680.

Release note (bug fix): prevent a crash that could occur when running
a cluster mixed between 19.1-beta and 19.1-rcX nodes. The crash would
manifest with a fatal error stating "TruncatedState regressed". Moving
all nodes to the new binary (19.1-rcX or newer) rectifies this
situation. This wouldn't affect anyone migrating directly from 2.1.x
into 19.1.x, as the majority of our users are expected to.

When I landed the change to stop sending the Raft log in snapshots, I
gated this on whether the truncated state had already been unreplicated
for the range. However, this wasn't enough because older 19.1 betas knew
about unreplicated truncated state and yet couldn't handle a regressing
truncated state, which sending these snapshots could introduce. As a
result, 19.1-beta nodes could crash while running mixed with 19.1-rcX.
(Simply restarting those nodes with the upgraded binary should fix the
problem).

This PR breaks one of our rules around not introducing historical
cluster versions, but in this case it's necessary and also shouldn't
have any adverse effects.

See cockroachdb#36680.

Release note (bug fix): prevent a crash that could occur when running
a cluster mixed between 19.1-beta and 19.1-rcX nodes. The crash would
manifest with a fatal error stating "TruncatedState regressed". Moving
all nodes to the new binary (19.1-rcX or newer) rectifies this
situation. This wouldn't affect anyone migrating directly from 2.1.x
into 19.1.x, as the majority of our users are expected to.
@tbg tbg requested a review from a team April 10, 2019 15:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg merged commit 0d20eff into cockroachdb:release-19.1 Apr 10, 2019
@tbg tbg deleted the backport19.1/trunc-panic branch April 10, 2019 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants