Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement raft multipler flag #8082

Merged
merged 3 commits into from
Jun 19, 2020
Merged

Implement raft multipler flag #8082

merged 3 commits into from
Jun 19, 2020

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented May 31, 2020

This ports raft_multiplier flag from Consul to Nomad: https://www.consul.io/docs/agent/options.html#raft_multiplier .

It allows users to set tweak raft related timeouts and leases, which is important in low/moderate performance network or disks. Consul has some guides for raft_multipliers in https://www.consul.io/docs/install/performance.html - not sure how it fully maps to nomad yet, but at least we can expose the knob and give recommendations later.

This doesn't change the defaults, and operators can increase the defaults. I infer that the current defaults are adequate, as we haven't gotten many requests or support issues related to raft defaults.

Closes #5872 .

@notnoop notnoop requested review from schmichael and angrycub May 31, 2020 18:14
@notnoop notnoop self-assigned this May 31, 2020
@notnoop notnoop requested a review from langmartin June 15, 2020 14:21
Copy link
Contributor

@langmartin langmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! the tests seem to cover everything.

return nil, fmt.Errorf("raft_multiplier cannot be %d. Must be between 1 and %d", *agentConfig.Server.RaftMultiplier, MaxRaftMultiplier)
}
}
conf.RaftConfig.ElectionTimeout *= time.Duration(raftMultiplier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking my understanding: we've already parsed ElectionTimeout from the config, so that an operator can combine a specific ElectionTimeout and the multiplier, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost but not quite. ElectionTimeout value is set earlier from the raft library hardcoded value. Without RaftMultipler, operators don't have direct way to manipulate the value in any way.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left an idea for documentation improvement, but it looks good otherwise!

@@ -141,6 +141,14 @@ server {
features and is typically not required as the agent internally knows the
latest version, but may be useful in some upgrade scenarios.

- `raft_multiplier` `(int: 1)` - An integer multiplier used by Nomad servers to
scale key Raft timing parameters. Omitting this value or setting it to 0 uses
default timing described below. Lower values are used to tighten timing and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"described below" - we don't describe it below.

I don't think this documentation from Consul works well for us because they set their raft_multiplier to 5 while we set it to the minimum value of 1. There's no way to adjust Nomad to be more sensitive to faults.

If it's ok that Nomad cannot be tuned to be more sensitive (we're really locking ourselves in for the long term with this design), and I think that's ok, then maybe the docs should read:

raft_multiplier (int: 1) - An integer multiplier used by Nomad servers to scale key Raft timing parameters similar to Consul's raft_multiplier. Unlike Consul, Nomad defaults to a multiplier of 1, so timings can only be increased. Increasing the timings makes leader election less likely during periods of networking issues or resource starvation. Since leader elections pause Nomad's normal work, it may be beneficial for slow or unreliable networks to wait longer before electing a new leader. The tradeoff when raising this value is that during network partitions or other events (server crash) where a leader is lost, Nomad will not elect a new leader for a longer period of time than the default. The nomad.nomad.leader.barrier' metric is a good indicator of how often leader elections occur.

Copy link
Contributor Author

@notnoop notnoop Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I'll follow up. To clarify, Nomad's raft_multiplier=1 matches Consul's raft_multiplier=1 - i.e. nomad's minimum is consul's minimum. When Consul introduced raft_multipler flag in 0.7, they additionally changed the default timeouts to make it looser to accommodate dev instances - here, I didn't change the default, hence it stays at 1 - the equivalent of pre-0.12 values.

@notnoop notnoop merged commit 0821c0a into master Jun 19, 2020
@notnoop notnoop deleted the f-raft-multipler branch June 19, 2020 14:05
@analytically
Copy link

Does this mean it can be removed from Consul?

@angrycub
Copy link
Contributor

Consul's default raft_multiplier is still 5. So if you would like Consul's raft to use the highest speed possible, you would still need to set it to 1 in your Consul configuration.

Adding the setting to Nomad allows you to use a lower-performance timing that's suitable for minimal servers. This can facilitate the use of smaller class server instances for development clusters. The trade-off being that lower values of raft-multiplier are used to tighten timing and increase sensitivity while higher values relax timings and reduce sensitivity.

Tuning this affects the time it takes Nomad to detect leader failures and to perform leader elections, at the expense of requiring more network and CPU resources for better performance. Because Nomad has always used a multiplier of 1, it was retained as the default. This setting can only be used to relax Nomad's default sensitivity.

@github-actions
Copy link

github-actions bot commented Jan 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] add raft_multiplier like in consul
5 participants