Connect to new nodes concurrently #22984

bleskes · 2017-02-05T18:03:08Z

When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR.

This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.

The change is well tested under current tests. Sadly, I could find an easy way to simulate rejections as we have locked down the thread pool settings (good!) and the management still has an unbound queue.

…nect

s1monw

I left one question... lgtm otherwise

s1monw · 2017-02-06T08:03:06Z

test/framework/src/main/java/org/elasticsearch/test/ClusterServiceUtils.java

                // skip
            }

            @Override
-            public void disconnectFromNodesExcept(Iterable<DiscoveryNode> nodesToKeep) {
+            public void disconnectFromNodesExcept(DiscoveryNodes nodesToKeep) {


why do we change these? Isn't Iterable more generic and would still accept DiscoveryNodes? also disconnectFromNodesExcept sounds like it would only be executed on a subset of and actual DiscoveryNodes instance. I think we should stick with what we had?

why do we change these? Isn't Iterable more generic and would still accept DiscoveryNodes?

I can change it back, but I did it to be consistent with connectToNodes which needs the total size of the collection. Since this is only used with DiscoNodes, I felt consistency is clearer. As said - I'll roll it back if you prefer.

disconnectFromNodesExcept sounds like it would only be executed on a subset of and actual DiscoveryNodes instance

Naming are hard. What it does is disconnecting from all nodes that are not part of the supplied disconodes parameter. We pass the disco nodes of the current cluster state directly, without any modification.

fair enough lets do it.

bleskes · 2017-02-06T15:33:17Z

thx @s1monw

When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR. This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.

bleskes added 3 commits February 1, 2017 14:51

concurrent connect on another thread

76c638e

Merge remote-tracking branch 'upstream/master' into node_parallel_con…

0e06ac6

…nect

add a comment

78c056a

bleskes added :Cluster >enhancement v5.3.0 v6.0.0-alpha1 labels Feb 5, 2017

bleskes requested a review from s1monw February 5, 2017 18:03

s1monw approved these changes Feb 6, 2017

View reviewed changes

bleskes merged commit 5e7d223 into elastic:master Feb 6, 2017

bleskes deleted the node_parallel_connect branch February 6, 2017 15:32

bleskes mentioned this pull request Feb 6, 2017

TransportService.connectToNode should validate remote node ID #22828

Merged

clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018

DaveCTurner added v5.4.0 and removed v5.3.0 labels Mar 12, 2018

DaveCTurner mentioned this pull request Mar 12, 2018

Slow recovery of write availability after partition of a large cluster #28920

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect to new nodes concurrently #22984

Connect to new nodes concurrently #22984

bleskes commented Feb 5, 2017

s1monw left a comment

s1monw Feb 6, 2017

bleskes Feb 6, 2017

s1monw Feb 6, 2017

bleskes commented Feb 6, 2017

Connect to new nodes concurrently #22984

Connect to new nodes concurrently #22984

Conversation

bleskes commented Feb 5, 2017

s1monw left a comment

Choose a reason for hiding this comment

s1monw Feb 6, 2017

Choose a reason for hiding this comment

bleskes Feb 6, 2017

Choose a reason for hiding this comment

s1monw Feb 6, 2017

Choose a reason for hiding this comment

bleskes commented Feb 6, 2017