Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up leadership establishment #8036

Merged
merged 8 commits into from
Jun 1, 2020

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented May 21, 2020

Establishing leadership should be very fast and never make potentially slow external API calls.

This PR makes the following changes:

  • Move Vault and Consul token revocation to background to avoid block leadership establishment.
  • Reorder leadership establishment steps: mark RPC ready as soon as the leader components are enabled, and move cleanup steps that aren't necessary immediate to the end.

Context and Motivation

Upon upgrading to 0.11.2, a cluster suffered many errors related to Nomad being slow to establish leadership and failing to respond to RPCs with "Not ready to serve consistent reads". The cause was that revokeVaultAccessorsOnRestore was running for along time and blocking the leader from performing its duties.

There are few moving parts in 0.11.2 that contributed to the issue:

The cluster had a very large number of vault tokens to be revoked. In one instance in the cluster history, over a million Vault tokens were revoked all at once! Presumably when the server gained leadership, many of the tokens were already revoked, either because Token TTL expired or because the previous leader revoke them prior before it lost leadership.

In processing this, two problems happen:

First, Nomad attempts to revoke the tokens in leadership establishment goroutine and block further leadership establishment. Changes in #5911 made the situation more visible. Prior to that PR, long vault revocation causes the leader to be half-wedged state, it may respond to RPCs but will not establish node heartbeat timers or schedule periodic jobs. That PR made the situation more obvious.

However, a second bug lessened the severity of the first bug: Vault revocation call is flaky and may terminate quickly. Prior to #7959 , if some of the tokens were already revoked or expired already, RevokeToken call fails fast and but it retries indefinitely in the background as none of the retries will succeed; this probably revokeDaemon may never actually revoke any tokens as the retries will keep failing with these tokens. PR #7959 made vault revocation thread keep going even with already revoked tokens. However, this also means that now that the revocation thread will process that long list of tokens in leader establishment thread and slow everything to a halt!

This PR addresses both of these problems by ensuring that the leadership establishment is fast, that token revocation always happen in background. Also, given that cleanup processes aren't critical or time sensitive to leader functionality, we move their order so we prioritize other more critical functionality (periodic jobs, node heartbeat tracking, etc).

Furthermore, when processing a super long list of revocation, we only check for the token TTL at the start to avoid revoking already expired tokens. But if the token TTLs are short (~60 seconds) relative to revocation of the long list, we may overwhelm Vault by making a lot of unnecessary API calls. Also, if any of their revocation fails, nomad reattempts to revoke all of them again!

This PR breaks the background revocation to be in 1000-token batches. This reduces the burden on Vault a bit and makes the retry handle for failed tokens a tad better (retry at most 1000 instead of 1 million tokens).

Mahmood Ali added 4 commits May 21, 2020 07:38
Establishing leadership should be very fast and never make external API
calls.

This fixes a situation where there is a long backlog of Vault tokens to
be revoked on when leadership is gained.  In such case, revoking the
tokens will significantly slow down leadership establishment and slow
down processing.  Worse, the revocation call does not honor leadership
`stopCh` signals, so it will not stop when the leader loses leadership.
Start serving RPC immediately after leader components are enabled, and
move clean up to the bottom as they don't block leadership
responsibilities.
@notnoop notnoop requested a review from schmichael May 21, 2020 13:21
@notnoop notnoop self-assigned this May 21, 2020
nomad/vault.go Outdated Show resolved Hide resolved
Mahmood Ali added 2 commits May 21, 2020 19:54
If a token is scheduled for revocation expires before we revoke it,
ensure that it is marked as purged in raft and is only removed from
local vault state if the purge operation succeeds.

Prior to this change, we may remove the accessor from local state but
not purge it from Raft.  This causes unnecessary and churn in the next
leadership elections (and until 0.11.2 result in indefinite retries).
purgeFunc cannot be nil, so ensure it's set to a no-op function in
tests.
@robloxrob
Copy link

Thank you!

@notnoop notnoop requested a review from schmichael May 27, 2020 18:13
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Comments are just wording tweaks and not blockers.

nomad/leader.go Outdated
Comment on lines 258 to 259
// Activate RPCs after all leadership components are enabled
// and the server can handle write RPCs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems a bit off now that not all "leadership components" are enabled at this point (although I guess all of them that set the SetEnabled pattern are).

Is there a specific operation that makes the leader ready to serve consistent reads? If so let's specifically call that one out so that as we add/remove/move calls in this func we have a way to determine whether they need to come before or after this point.

nomad/leader.go Outdated Show resolved Hide resolved
nomad/leader.go Outdated Show resolved Hide resolved
Co-authored-by: Michael Schurter <[email protected]>
@notnoop notnoop force-pushed the f-background-vault-revoke-on-restore branch from e9d14a3 to 5bf4868 Compare June 1, 2020 01:24
@notnoop notnoop force-pushed the f-background-vault-revoke-on-restore branch from 5bf4868 to 127a960 Compare June 1, 2020 01:26
@notnoop notnoop merged commit ca0b032 into master Jun 1, 2020
@notnoop notnoop deleted the f-background-vault-revoke-on-restore branch June 1, 2020 01:27
notnoop pushed a commit that referenced this pull request Jun 1, 2020
…restore

Speed up leadership establishment
@notnoop
Copy link
Contributor Author

notnoop commented Jun 1, 2020

cherry-picked into 0.11.3

@github-actions
Copy link

github-actions bot commented Jan 4, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants