-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce dedicated threadpool for establishing connections #29023
Labels
:Distributed Coordination/Network
Http and internode communication implementations
>enhancement
v6.4.1
v7.0.0-beta1
Comments
DaveCTurner
added
help wanted
adoptme
:Distributed Indexing/Recovery
Anything around constructing a new shard, either from a local or a remote source.
labels
Mar 13, 2018
Pinging @elastic/es-distributed |
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Jun 24, 2018
Today we attempt to (re-)connect to our peers using the management threadpool. However, during a network partition there may sometimes be a large number of concurrent connection attempts. Connection attempts to partitioned nodes or to nodes in containers that are no longer running can hang until they timeout, possibly blocking other reconnection attempts and other management activity for an extended period of time. Moreover, connecting to a peer is a relatively lightweight operation so it is reasonable to attempt a lot of them in parallel. This change introduces a separate threadpool solely for connecting to peers. Fixes elastic#29023.
Tim-Brooks
added a commit
that referenced
this issue
Nov 7, 2018
This is related to #29023. Additionally at other points we have discussed a preference for removing the need to unnecessarily block threads for opening new node connections. This commit lays the groudwork for this by opening connections asynchronously at the transport level. We still block, however, this work will make it possible to eventually remove all blocking on new connections out of the TransportService and Transport.
Tim-Brooks
added a commit
to Tim-Brooks/elasticsearch
that referenced
this issue
Nov 7, 2018
This is related to elastic#29023. Additionally at other points we have discussed a preference for removing the need to unnecessarily block threads for opening new node connections. This commit lays the groudwork for this by opening connections asynchronously at the transport level. We still block, however, this work will make it possible to eventually remove all blocking on new connections out of the TransportService and Transport.
Tim-Brooks
added a commit
that referenced
this issue
Nov 7, 2018
This is related to #29023. Additionally at other points we have discussed a preference for removing the need to unnecessarily block threads for opening new node connections. This commit lays the groudwork for this by opening connections asynchronously at the transport level. We still block, however, this work will make it possible to eventually remove all blocking on new connections out of the TransportService and Transport.
pgomulka
pushed a commit
to pgomulka/elasticsearch
that referenced
this issue
Nov 13, 2018
This is related to elastic#29023. Additionally at other points we have discussed a preference for removing the need to unnecessarily block threads for opening new node connections. This commit lays the groudwork for this by opening connections asynchronously at the transport level. We still block, however, this work will make it possible to eventually remove all blocking on new connections out of the TransportService and Transport.
The work in #35144 is an enabler to open connections asynchronously as an alternative to a dedicated threadpool. Therefore there is no need for a dedicated threadpool anymore and we might only introduce one should we run into unexpected issues with the async approach but for the time being I am closing this issue. |
DaveCTurner
added
the
:Distributed Coordination/Network
Http and internode communication implementations
label
Mar 12, 2019
DaveCTurner
removed
the
:Distributed Indexing/Recovery
Anything around constructing a new shard, either from a local or a remote source.
label
Mar 12, 2019
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Network
Http and internode communication implementations
>enhancement
v6.4.1
v7.0.0-beta1
Today, we attempt to connect to nodes concurrently using the management threadpool:
elasticsearch/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
Lines 94 to 115 in 46e16b6
Connection establishment can be time-consuming if the remote node is unresponsive, and the management threadpool is small and important, so saturating it with attempts to connect to unresponsive nodes is undesirable.
The suggested fix is to create a separate threadpool purely for establishing node-to-node connections instead. As such connections are mostly long-lived the new-connection threadpool will mostly be idle, but after a network partition it would be good for each node to try and re-establish connections to its peers using a lot more concurrency than the management threadpool can support.
Relates #28920 in which cluster state application is blocked for multiple minutes because, in part, of insufficient concurrency when attempting to connect to unresponsive peers.
The text was updated successfully, but these errors were encountered: