Speed up leadership establishment #8036

notnoop · 2020-05-21T13:21:15Z

Establishing leadership should be very fast and never make potentially slow external API calls.

This PR makes the following changes:

Move Vault and Consul token revocation to background to avoid block leadership establishment.
Reorder leadership establishment steps: mark RPC ready as soon as the leader components are enabled, and move cleanup steps that aren't necessary immediate to the end.

Context and Motivation

Upon upgrading to 0.11.2, a cluster suffered many errors related to Nomad being slow to establish leadership and failing to respond to RPCs with "Not ready to serve consistent reads". The cause was that revokeVaultAccessorsOnRestore was running for along time and blocking the leader from performing its duties.

There are few moving parts in 0.11.2 that contributed to the issue:

The cluster had a very large number of vault tokens to be revoked. In one instance in the cluster history, over a million Vault tokens were revoked all at once! Presumably when the server gained leadership, many of the tokens were already revoked, either because Token TTL expired or because the previous leader revoke them prior before it lost leadership.

In processing this, two problems happen:

First, Nomad attempts to revoke the tokens in leadership establishment goroutine and block further leadership establishment. Changes in #5911 made the situation more visible. Prior to that PR, long vault revocation causes the leader to be half-wedged state, it may respond to RPCs but will not establish node heartbeat timers or schedule periodic jobs. That PR made the situation more obvious.

However, a second bug lessened the severity of the first bug: Vault revocation call is flaky and may terminate quickly. Prior to #7959 , if some of the tokens were already revoked or expired already, RevokeToken call fails fast and but it retries indefinitely in the background as none of the retries will succeed; this probably revokeDaemon may never actually revoke any tokens as the retries will keep failing with these tokens. PR #7959 made vault revocation thread keep going even with already revoked tokens. However, this also means that now that the revocation thread will process that long list of tokens in leader establishment thread and slow everything to a halt!

This PR addresses both of these problems by ensuring that the leadership establishment is fast, that token revocation always happen in background. Also, given that cleanup processes aren't critical or time sensitive to leader functionality, we move their order so we prioritize other more critical functionality (periodic jobs, node heartbeat tracking, etc).

Furthermore, when processing a super long list of revocation, we only check for the token TTL at the start to avoid revoking already expired tokens. But if the token TTLs are short (~60 seconds) relative to revocation of the long list, we may overwhelm Vault by making a lot of unnecessary API calls. Also, if any of their revocation fails, nomad reattempts to revoke all of them again!

This PR breaks the background revocation to be in 1000-token batches. This reduces the burden on Vault a bit and makes the retry handle for failed tokens a tad better (retry at most 1000 instead of 1 million tokens).

Establishing leadership should be very fast and never make external API calls. This fixes a situation where there is a long backlog of Vault tokens to be revoked on when leadership is gained. In such case, revoking the tokens will significantly slow down leadership establishment and slow down processing. Worse, the revocation call does not honor leadership `stopCh` signals, so it will not stop when the leader loses leadership.

Start serving RPC immediately after leader components are enabled, and move clean up to the bottom as they don't block leadership responsibilities.

nomad/vault.go

If a token is scheduled for revocation expires before we revoke it, ensure that it is marked as purged in raft and is only removed from local vault state if the purge operation succeeds. Prior to this change, we may remove the accessor from local state but not purge it from Raft. This causes unnecessary and churn in the next leadership elections (and until 0.11.2 result in indefinite retries).

purgeFunc cannot be nil, so ensure it's set to a no-op function in tests.

robloxrob · 2020-05-27T17:31:39Z

Thank you!

schmichael

Looks great! Comments are just wording tweaks and not blockers.

schmichael · 2020-05-29T21:20:00Z

nomad/leader.go

+	// Activate RPCs after all leadership components are enabled
+	// and the server can handle write RPCs


This comment seems a bit off now that not all "leadership components" are enabled at this point (although I guess all of them that set the SetEnabled pattern are).

Is there a specific operation that makes the leader ready to serve consistent reads? If so let's specifically call that one out so that as we add/remove/move calls in this func we have a way to determine whether they need to come before or after this point.

nomad/leader.go

Co-authored-by: Michael Schurter <[email protected]>

…restore Speed up leadership establishment

notnoop · 2020-06-01T14:29:07Z

cherry-picked into 0.11.3

github-actions · 2023-01-04T02:15:26Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Mahmood Ali added 4 commits May 21, 2020 07:38

rate limit revokeDaemon

506e441

apply the same change to consul revocation

39fd3b9

Reorder leadership handling

adaeb10

Start serving RPC immediately after leader components are enabled, and move clean up to the bottom as they don't block leadership responsibilities.

notnoop requested a review from schmichael May 21, 2020 13:21

notnoop self-assigned this May 21, 2020

schmichael reviewed May 21, 2020

View reviewed changes

nomad/vault.go Outdated Show resolved Hide resolved

Mahmood Ali added 2 commits May 21, 2020 19:54

always set purgeFunc

abda1a1

purgeFunc cannot be nil, so ensure it's set to a no-op function in tests.

notnoop requested a review from schmichael May 27, 2020 18:13

schmichael approved these changes May 29, 2020

View reviewed changes

Apply suggestions from code review

e9d14a3

Co-authored-by: Michael Schurter <[email protected]>

notnoop force-pushed the f-background-vault-revoke-on-restore branch from e9d14a3 to 5bf4868 Compare June 1, 2020 01:24

clarify rpc consistency readiness comment

127a960

notnoop force-pushed the f-background-vault-revoke-on-restore branch from 5bf4868 to 127a960 Compare June 1, 2020 01:26

notnoop merged commit ca0b032 into master Jun 1, 2020

notnoop deleted the f-background-vault-revoke-on-restore branch June 1, 2020 01:27

notnoop pushed a commit that referenced this pull request Jun 1, 2020

Merge pull request #8036 from hashicorp/f-background-vault-revoke-on-…

902996e

…restore Speed up leadership establishment

schmichael mentioned this pull request Jun 12, 2020

Implement snapshot restore #8131

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up leadership establishment #8036

Speed up leadership establishment #8036

notnoop commented May 21, 2020

robloxrob commented May 27, 2020

schmichael left a comment

schmichael May 29, 2020

notnoop commented Jun 1, 2020

github-actions bot commented Jan 4, 2023

		// Activate RPCs after all leadership components are enabled
		// and the server can handle write RPCs

Speed up leadership establishment #8036

Speed up leadership establishment #8036

Conversation

notnoop commented May 21, 2020

Context and Motivation

robloxrob commented May 27, 2020

schmichael left a comment

Choose a reason for hiding this comment

schmichael May 29, 2020

Choose a reason for hiding this comment

notnoop commented Jun 1, 2020

github-actions bot commented Jan 4, 2023