Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only connect to formed remote clusters #31424

Closed
wants to merge 2 commits into from

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Jun 19, 2018

This change prevent remote cluster connections to be established
to nodes that have not yet joined a cluster and don't have a cluster
UUID. This allows to effectivly detect nodes that are part of the local
cluster. To compare the local cluster UUID to the remote nodes cluster UUID
we need to wait until we recovered a state and a master is elected before
we can connect to remote clusters.

Relates to #31331

This change prevent remote cluster connections to be established
to nodes that have not yet joined a cluster and don't have a cluster
UUID. This allows to effectivly detect nodes that are part of the local
cluster. To compare the local cluster UUID to the remote nodes cluster UUID
we need to wait until we recovered a state and a master is elected before
we can connect to remote clusters.

Relates to elastic#31331
@s1monw s1monw added :Distributed Coordination/Network Http and internode communication implementations v7.0.0 v6.4.0 labels Jun 19, 2018
@s1monw s1monw requested a review from javanna June 19, 2018 10:25
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@s1monw
Copy link
Contributor Author

s1monw commented Jun 19, 2018

@tbrooks @javanna I am not convinced this it the right thing to do. Given that we need to expand the remote connection profile down the road for CCR and then likely use BULK and RECOVERY I wonder if we should simply connect to all nodes with the default profile. I don't think it will have a big impact. cc @bleskes

@Tim-Brooks
Copy link
Contributor

Tim-Brooks commented Jun 19, 2018

I am onboard with using the default connection profile. We could also allow the TransportService to open a new connection that wants more connections (essentially a more prioritized connection profile) and close the less prioritized connection profile. This would require the logic to be expanded beyond just checking if a connection already exists.

@Tim-Brooks
Copy link
Contributor

    public void connectToNode(DiscoveryNode node, ConnectionProfile connectionProfile,
                              CheckedBiConsumer<Connection, ConnectionProfile, IOException> connectionValidator)
        throws ConnectTransportException {
        connectionProfile = resolveConnectionProfile(connectionProfile);
        if (node == null) {
            throw new ConnectTransportException(null, "can't connect to a null node");
        }
        closeLock.readLock().lock(); // ensure we don't open connections while we are closing
        try {
            ensureOpen();
            try (Releasable ignored = connectionLock.acquire(node.getId())) {
                NodeChannels nodeChannels = connectedNodes.get(node);
                

                if (nodeChannels != null && **connection profile is less prioritized than the existing node channels**) {
                    return;
                }
                boolean success = false;
                try {
                    nodeChannels = openConnection(node, connectionProfile);
                    connectionValidator.accept(nodeChannels, connectionProfile);
                    // we acquire a connection lock, so no way there is an existing connection
                    existingConnect = connectedNodes.put(node, nodeChannels);
                    ****
                    If Existing connection {
                             close existing connection;
                    }
                    ****
                    if (logger.isDebugEnabled()) {
                        logger.debug("connected to node [{}]", node);
                    }
                    try {
                        transportService.onNodeConnected(node);
                    } finally {
                        if (nodeChannels.isClosed()) {
                            // we got closed concurrently due to a disconnect or some other event on the channel.
                            // the close callback will close the NodeChannel instance first and then try to remove
                            // the connection from the connected nodes. It will NOT acquire the connectionLock for
                            // the node to prevent any blocking calls on network threads. Yet, we still establish a happens
                            // before relationship to the connectedNodes.put since we check if we can remove the
                            // (DiscoveryNode, NodeChannels) tuple from the map after we closed. Here we check if it's closed an if so we
                            // try to remove it first either way one of the two wins even if the callback has run before we even added the
                            // tuple to the map since in that case we remove it here again
                            if (connectedNodes.remove(node, nodeChannels)) {
                                transportService.onNodeDisconnected(node);
                            }
                            throw new NodeNotConnectedException(node, "connection concurrently closed");
                        }
                    }
                    success = true;
                } catch (ConnectTransportException e) {
                    throw e;
                } catch (Exception e) {
                    throw new ConnectTransportException(node, "general node connection failure", e);
                } finally {
                    if (success == false) { // close the connection if there is a failure
                        logger.trace(() -> new ParameterizedMessage("failed to connect to [{}], cleaning dangling connections", node));
                        IOUtils.closeWhileHandlingException(nodeChannels);
                    }
                }
            }
        } finally {
            closeLock.readLock().unlock();
        }
    }

Hypothetically - it would look something like above. We have the per-node connection lock so this would be safe from a threading standpoint. The trickiest part would probably be what behavior we want from all of the TransportService listeners.

onNodeConnected
onConnectionOpened
onNodeDisconnected
onConnectionClosed

Anyway, I'm just throwing this out as an idea. I am fine with the default connection profile approach.

@s1monw
Copy link
Contributor Author

s1monw commented Jun 20, 2018

@tbrooks8 when I first read it I was a fan but it seems like we are just making things more complicated that way. For instance I there are 2 parts of the code that connect to a node and one of them disconnects from it, should it still be reachable to the other? I think down the road we should maintain a dedicated connection. Like we should maybe modify the discovery node's ephemera ID such that it's a private connection. I am not sure what the best way is but I think we should work on more isolation on that layer such a that I can only get a connection to a remote cluster if I go through the RemoteClusterConnection

@s1monw
Copy link
Contributor Author

s1monw commented Jul 5, 2018

closing in favor of #31835

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >non-issue v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants