-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] RemoteClusterClientTests testEnsureWeReconnect failing with NoSuchRemoteClusterException #52029
Comments
Pinging @elastic/es-distributed (:Distributed/Network) |
two more such failures on master |
Only muted in master d4c609b |
Failed on 7.6 too - https://gradle-enterprise.elastic.co/s/p6yxu3p22ni52 |
Currently the remote connection manager will delegate the size() call to the underlying cluster connection manager. This introduces the possibility that call will return 1 before the nodeConnection method has been triggered to add the connection to the remote connection list. This can cause issues, as the ensureConnected method checks the connection managers size and executes synchronously if the size is > 0. This leads to a potential cluster not connected exception while we are still waiting for the connection opened callback to be triggered. This commit fixes this issue by using the remote connection manager's size to report the connection manager's size. Fixes elastic#52029.
Currently the remote connection manager will delegate the size() call to the underlying cluster connection manager. This introduces the possibility that call will return 1 before the nodeConnection method has been triggered to add the connection to the remote connection list. This can cause issues, as the ensureConnected method checks the connection managers size and executes synchronously if the size is > 0. This leads to a potential cluster not connected exception while we are still waiting for the connection opened callback to be triggered. This commit fixes this issue by using the remote connection manager's size to report the connection manager's size. Fixes #52029.
Currently the remote connection manager will delegate the size() call to the underlying cluster connection manager. This introduces the possibility that call will return 1 before the nodeConnection method has been triggered to add the connection to the remote connection list. This can cause issues, as the ensureConnected method checks the connection managers size and executes synchronously if the size is > 0. This leads to a potential cluster not connected exception while we are still waiting for the connection opened callback to be triggered. This commit fixes this issue by using the remote connection manager's size to report the connection manager's size. Fixes #52029.
Another failure on master today. Reopening the issue. Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part1/4097/console Stack trace:
|
I have merged #54934. This PR may not completely fix this issue. But it will help expose the underlying cause if the test fails again. At the moment we are in wait and see mode to see if this test continues to fail. |
@tbrooks8 The test failed today on CI but with the exception you added in #54934:
I can't reproduce the issue locally. |
Two more failures in master: https://gradle-enterprise.elastic.co/s/iycqzcwkouene |
A case on 7.7 this morning |
I believe this has been fixed by #56654. Closing. |
This test has failed 4 times in the past three days after passing basically 100% of the time for the past month. Look suspicious. Happening on both
master
and7.x
.This same time period has also seen a rather significant jump in average test execution times so perhaps there is something going on here.
https://gradle-enterprise.elastic.co/scans/tests?failures.failureClassification=non_verification&list.offset=0&list.size=50&list.sortColumn=startTime&list.sortOrder=desc&search.buildToolType=gradle&search.buildToolType=maven&search.startTimeMax=1581055741639&search.startTimeMin=1580450941631&search.tags=CI&search.tags=not:nested&search.tags=not:pull-request&tests.container=org.elasticsearch.transport.RemoteClusterClientTests&tests.sortField=FAILED&tests.test=testEnsureWeReconnect&tests.unstableOnly&trends.section=overview&trends.timeResolution=day&viewer.tzOffset=-480
The text was updated successfully, but these errors were encountered: