Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect to new nodes concurrently #22984

Merged
merged 3 commits into from
Feb 6, 2017

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Feb 5, 2017

When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR.

This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.

The change is well tested under current tests. Sadly, I could find an easy way to simulate rejections as we have locked down the thread pool settings (good!) and the management still has an unbound queue.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one question... lgtm otherwise

// skip
}

@Override
public void disconnectFromNodesExcept(Iterable<DiscoveryNode> nodesToKeep) {
public void disconnectFromNodesExcept(DiscoveryNodes nodesToKeep) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we change these? Isn't Iterable more generic and would still accept DiscoveryNodes? also disconnectFromNodesExcept sounds like it would only be executed on a subset of and actual DiscoveryNodes instance. I think we should stick with what we had?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we change these? Isn't Iterable more generic and would still accept DiscoveryNodes?

I can change it back, but I did it to be consistent with connectToNodes which needs the total size of the collection. Since this is only used with DiscoNodes, I felt consistency is clearer. As said - I'll roll it back if you prefer.

disconnectFromNodesExcept sounds like it would only be executed on a subset of and actual DiscoveryNodes instance

Naming are hard. What it does is disconnecting from all nodes that are not part of the supplied disconodes parameter. We pass the disco nodes of the current cluster state directly, without any modification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough lets do it.

@bleskes bleskes merged commit 5e7d223 into elastic:master Feb 6, 2017
@bleskes bleskes deleted the node_parallel_connect branch February 6, 2017 15:32
@bleskes
Copy link
Contributor Author

bleskes commented Feb 6, 2017

thx @s1monw

bleskes added a commit that referenced this pull request Feb 9, 2017
When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR.

This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018
@DaveCTurner DaveCTurner added v5.4.0 and removed v5.3.0 labels Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement v5.4.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants