Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant connections in ClusterConnectionManager #77196

Merged

Conversation

DaveCTurner
Copy link
Contributor

Today it's possible to open two connections to a node, and then we
notice when registering the second connection and close it instead.
Fixing #67873 will require us to keep tighter control over the identity
and lifecycle of each connection, and opening redundant connections gets
in the way of this. This commit adds a check for an existing connection
after marking the connection as pending, which guarantees that we
don't open those redundant connections.

Today it's possible to open two connections to a node, and then we
notice when registering the second connection and close it instead.
Fixing elastic#67873 will require us to keep tighter control over the identity
and lifecycle of each connection, and opening redundant connections gets
in the way of this. This commit adds a check for an existing connection
_after_ marking the connection as pending, which guarantees that we
don't open those redundant connections.
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Network Http and internode communication implementations v8.0.0 v7.16.0 labels Sep 2, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

final RunOnce releaseOnce = new RunOnce(connectingRefCounter::decRef);
internalOpenConnection(node, resolvedProfile, ActionListener.wrap(conn -> {
connectionValidator.validate(conn, resolvedProfile, ActionListener.wrap(
ignored -> {
assert Transports.assertNotTransportThread("connection validator success");
try {
if (connectedNodes.putIfAbsent(node, conn) != null) {
assert false : "redundant conection to " + node;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterConnectionManagerTests#testConcurrentConnects exercises this - I can make it fail reasonably often by adding a few Thread.sleep calls:

diff --git a/server/src/main/java/org/elasticsearch/transport/ClusterConnectionManager.java b/server/src/main/java/org/elasticsearch/transport/ClusterConnectionManager.java
index 415769596a4..00bb8423d7b 100644
--- a/server/src/main/java/org/elasticsearch/transport/ClusterConnectionManager.java
+++ b/server/src/main/java/org/elasticsearch/transport/ClusterConnectionManager.java
@@ -11,6 +11,7 @@ import org.apache.logging.log4j.LogManager;
 import org.apache.logging.log4j.Logger;
 import org.elasticsearch.action.ActionListener;
 import org.elasticsearch.cluster.node.DiscoveryNode;
+import org.elasticsearch.common.Randomness;
 import org.elasticsearch.common.settings.Settings;
 import org.elasticsearch.core.AbstractRefCounted;
 import org.elasticsearch.common.util.concurrent.ConcurrentCollections;
@@ -109,6 +110,12 @@ public class ClusterConnectionManager implements ConnectionManager {
             return;
         }

+        try {
+            Thread.sleep(Randomness.get().nextInt(50));
+        } catch (InterruptedException e) {
+            throw new AssertionError("unexpected", e);
+        }
+
         final ListenableFuture<Void> currentListener = new ListenableFuture<>();
         final ListenableFuture<Void> existingListener = pendingConnections.putIfAbsent(node, currentListener);
         if (existingListener != null) {
@@ -126,7 +133,7 @@ public class ClusterConnectionManager implements ConnectionManager {
         // It's possible that a connection completed, and the pendingConnections entry was removed, between the calls to
         // connectedNodes.containsKey and pendingConnections.putIfAbsent above, so we check again to make sure we don't open a redundant
         // extra connection to the node. We could _just_ check here, but checking up front skips the work to mark the connection as pending.
-        if (connectedNodes.containsKey(node)) {
+        if (connectedNodes.containsKey(node) && false) {
             ListenableFuture<Void> future = pendingConnections.remove(node);
             assert future == currentListener : "Listener in pending map is different than the expected listener";
             connectingRefCounter.decRef();
diff --git a/server/src/test/java/org/elasticsearch/transport/ClusterConnectionManagerTests.java b/server/src/test/java/org/elasticsearch/transport/ClusterConnectionManagerTests.java
index f44217e276a..2b11867d21f 100644
--- a/server/src/test/java/org/elasticsearch/transport/ClusterConnectionManagerTests.java
+++ b/server/src/test/java/org/elasticsearch/transport/ClusterConnectionManagerTests.java
@@ -163,6 +163,7 @@ public class ClusterConnectionManagerTests extends ESTestCase {
             Thread thread = new Thread(() -> {
                 try {
                     barrier.await();
+                    Thread.sleep(between(0, 50));
                 } catch (InterruptedException | BrokenBarrierException e) {
                     throw new RuntimeException(e);
                 }

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice find :)

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 1f15764 into elastic:master Sep 3, 2021
@DaveCTurner DaveCTurner deleted the 2021-09-02-avoid-redundant-connections branch September 3, 2021 07:09
@DaveCTurner
Copy link
Contributor Author

Thanks both :)

DaveCTurner added a commit that referenced this pull request Sep 3, 2021
Today it's possible to open two connections to a node, and then we
notice when registering the second connection and close it instead.
Fixing #67873 will require us to keep tighter control over the identity
and lifecycle of each connection, and opening redundant connections gets
in the way of this. This commit adds a check for an existing connection
_after_ marking the connection as pending, which guarantees that we
don't open those redundant connections.
@kkewwei
Copy link
Contributor

kkewwei commented Sep 4, 2021

It' necessary to add the second connectedNodes.containsKey, but if we add the second check after pendingConnections.putIfAbsent, the first check connectedNodes.containsKey is useless. we should remove it.

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Sep 4, 2021

The comment added in this PR notes that it is indeed not needed for correctness, but explains why I left it in anyway:

// extra connection to the node. We could _just_ check here, but checking up front skips the work to mark the connection as pending.

We're almost always going to be re-using an existing connection, no need to faff around with pendingConnections etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants