Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: kvserver: stop transferring leases to replicas that may need snapshots #69964

Merged

Conversation

aayushshah15
Copy link
Contributor

Backport 1/1 commits from #69696.

/cc @cockroachdb/release


This commit disallows the replicateQueue from initiating lease
transfers to replicas that may be in need of a raft snapshot. Note that
the StoreRebalancer already has a stronger form of this check since it
disallows lease transfers to replicas that are lagging behind the raft
leader (which includes the set of replicas that need a snapshot).

In cases where the raft leader is not the leaseholder, we disallow the
replicateQueue from any sort of lease transfer until leaseholdership and
leadership are collocated. We rely on calls to
maybeTransferRaftLeadershipToLeaseholderLocked() (called on every raft
tick) to make sure that such periods of leadership / leaseholdership
misalignment are ephemeral and rare.

Alternative to #63507

Release justification: bug fix

Resolves #61604

Release note (bug fix): Fixes a bug that can cause prolonged
unavailability due to lease transfer to a replica that may be in need of
a raft snapshot.

This commit disallows the `replicateQueue` from initiating lease
transfers to replicas that may be in need of a raft snapshot. Note that
the `StoreRebalancer` already has a stronger form of this check since it
disallows lease transfers to replicas that are lagging behind the raft
leader (which includes the set of replicas that need a snapshot).

In cases where the raft leader is not the leaseholder, we disallow the
replicateQueue from any sort of lease transfer until leaseholdership and
leadership are collocated. We rely on calls to
`maybeTransferRaftLeadershipToLeaseholderLocked()` (called on every raft
tick) to make sure that such periods of leadership / leaseholdership
misalignment are ephemeral and rare.

Release justification: bug fix

Release note (bug fix): Fixes a bug that can cause prolonged
unavailability due to lease transfer to a replica that may be in need of
a raft snapshot.
@blathers-crl
Copy link

blathers-crl bot commented Sep 9, 2021

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but maybe we should respect the baking period here since we haven't made any hard promises for 21.1?

@aayushshah15
Copy link
Contributor Author

LGTM, but maybe we should respect the baking period here since we haven't made any hard promises for 21.1?

I agree, I'll wait until late next week to merge.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

@aayushshah15 aayushshah15 merged commit c13e21e into cockroachdb:release-21.1 Sep 16, 2021
@aayushshah15 aayushshah15 deleted the backport21.1-69696 branch September 16, 2021 19:19
@daniel-crlabs
Copy link
Contributor

Do we know what version of 21.1 this will be available in?

@nvanbenschoten
Copy link
Member

It will be available in 21.1.11, which has not been picked yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants