Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voting config exclusions should work with absent nodes #50836

Merged
merged 33 commits into from
Apr 16, 2020

Conversation

zacharymorn
Copy link
Contributor

Voting config exclusions should work with absent nodes. For details, please see #47990.

@cbuescher cbuescher added the :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. label Jan 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@DaveCTurner DaveCTurner self-requested a review January 10, 2020 10:29
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this.

We would like to entirely drop support for the more complex node resolution process from DiscoveryNodes#resolveNodes and only support node names and node IDs.

I suggest doing this by deprecating support for describing the nodes using POST /_cluster/voting_config_exclusions/{node_name} and instead adding query parameters ?node_names={...} and ?node_ids={...}. We could use these to add VotingConfigExclusions for the corresponding nodes, using a placeholder for nodeId or nodeName if it is not known at resolution time.

@zacharymorn
Copy link
Contributor Author

I see.

I tried to add the suggested query params as follow and this gave me an error. It seems as query param might be optional, they don't get taken into account when checking for url uniqueness

public RestAddVotingConfigExclusionAction(RestController controller) {
    // This API is existing and being deprecated.
    controller.registerHandler(RestRequest.Method.POST, “/_cluster/voting_config_exclusions/{node_name}”, this);

    // These are new
    controller.registerHandler(RestRequest.Method.POST, "/_cluster/voting_config_exclusions?node_names={node_names}", this);
    controller.registerHandler(RestRequest.Method.POST, "/_cluster/voting_config_exclusions?node_ids={node_ids}", this);
}
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Trying to use conflicting wildcard names for same path: node_names and node_ids

In addition, when I just keep one of those two, it still conflicts with RestClearVotingConfigExclusionsAction as it's registered with the url /_cluster/voting_config_exclusions

What do you think we register with the following urls instead?

/_cluster/voting_config_exclusions/node_names/{node_names}
/_cluster/voting_config_exclusions/node_ids/{node_ids}

@zacharymorn
Copy link
Contributor Author

@DaveCTurner I just pushed a new commit to use a new API that excludes node just based on id or name. Could you please let me know if this is in the right direction?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @zacharymorn and apologies for taking longer than normal to get back to you. I have left some more comments on the approach.

@zacharymorn
Copy link
Contributor Author

@DaveCTurner I’ve taken your suggestions and made a new commit. The changes could use more unit tests but I would like to get your feedback early. Could you please take a look and let me know if it is in the right direction?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zacharymorn, this looks neater. I left a few more comments.

@@ -47,7 +53,7 @@
* @param nodeDescriptions Descriptions of the nodes to add - see {@link DiscoveryNodes#resolveNodes(String...)}
*/
public AddVotingConfigExclusionsRequest(String[] nodeDescriptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is only used in tests, and it looks like we could migrate all of those tests over to using node names instead of node descriptions. Some of them would also be neater if we used a varargs:

Suggested change
public AddVotingConfigExclusionsRequest(String[] nodeDescriptions) {
public AddVotingConfigExclusionsRequest(String... nodeNames) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we migrate these tests now to use node names instead of descriptions, I'm a bit concerned that we may not have tests to prove that the changes are still backward compatible and don't have bugs that may break logic based on nodeDescriptions, before it is fully migrated to nodeIds / nodeNames. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're right, I didn't quite mean "all" these tests. We should comprehensively test the different kinds of node resolution by strengthening AddVotingConfigExclusionsRequestTests. The other tests can move over to node names without loss of coverage IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't resolved yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed this earlier. Done in commit 02a3533

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed this useful constructor in 02a3533 and added a lot of noise to the tests as a result. Could you follow my suggestion above instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry when I re-read this thread I somehow got the wrong idea that this constructor need to be removed. I reverted that commit and tried again in commit 5c7a226. Could you let me know if this looks good?

}
else {
Map<String, String> existingNodesNameId = new HashMap<>();
for (DiscoveryNode node : this) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just the master-eligible nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry a question just came up when I looked at this again. When we resolve by nodeId, we use ALL existing nodes to check if it exists, not just the master-eligible ones. Shall we keep this behavior the same for resolving by node name as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a new comment thread on the newly-moved code.

@zacharymorn
Copy link
Contributor Author

@DaveCTurner I've added more tests to cover most of my changes I believe, could you please take a look and let me know if more is needed?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zacharymorn, I have done another pass and left more comments.

@@ -47,7 +53,7 @@
* @param nodeDescriptions Descriptions of the nodes to add - see {@link DiscoveryNodes#resolveNodes(String...)}
*/
public AddVotingConfigExclusionsRequest(String[] nodeDescriptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't resolved yet?

@DaveCTurner
Copy link
Contributor

@elasticmachine ok to test

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 9, 2020
In elastic#50836 we deprecated the existing voting config exclusions API and added a
new one. This commit adjust the docs to match.
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 6b299d4 into elastic:master Apr 16, 2020
DaveCTurner pushed a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 16, 2020
Today the voting config exclusions API accepts node filters and resolves them
to a collection of node IDs against the current cluster membership.

This is problematic since we may want to exclude nodes that are not currently
members of the cluster. For instance:

- if attempting to remove a flaky node from the cluster you cannot reliably
  exclude it from the voting configuration since it may not reliably be a
  member of the cluster

- if `cluster.auto_shrink_voting_configuration: false` then naively shrinking
  the cluster will remove some nodes but will leaving their node IDs in the
  voting configuration. The only way to clean up the voting configuration is to
  grow the cluster back to its original size (potentially replacing some of the
  voting configuration) and then use the exclusions API.

This commit adds an alternative API that accepts node names and node IDs but
not node filters in general, and deprecates the current node-filters-based API.

Relates elastic#47990.
DaveCTurner added a commit that referenced this pull request Apr 16, 2020
Today the voting config exclusions API accepts node filters and resolves them
to a collection of node IDs against the current cluster membership.

This is problematic since we may want to exclude nodes that are not currently
members of the cluster. For instance:

- if attempting to remove a flaky node from the cluster you cannot reliably
  exclude it from the voting configuration since it may not reliably be a
  member of the cluster

- if `cluster.auto_shrink_voting_configuration: false` then naively shrinking
  the cluster will remove some nodes but will leaving their node IDs in the
  voting configuration. The only way to clean up the voting configuration is to
  grow the cluster back to its original size (potentially replacing some of the
  voting configuration) and then use the exclusions API.

This commit adds an alternative API that accepts node names and node IDs but
not node filters in general, and deprecates the current node-filters-based API.

Relates #47990.
Backport of #50836 to 7.x.

Co-authored-by: zacharymorn <[email protected]>
DaveCTurner added a commit that referenced this pull request Apr 20, 2020
In #50836 we deprecated the existing voting config exclusions API and added a
new one. This commit adjust the docs to match.
DaveCTurner added a commit that referenced this pull request Apr 20, 2020
In #50836 we deprecated the existing voting config exclusions API and added a
new one. This commit adjust the docs to match.
@DaveCTurner
Copy link
Contributor

I will follow up with a PR to remove the deprecated API in the next few days.

DaveCTurner added a commit that referenced this pull request Apr 24, 2020
The use of node filters for excluding nodes from the voting configuration was
deprecated in #50836; this commit removes support for node filters in this API.

Closes #47990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>deprecation :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants