-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] SharedClusterSnapshotRestoreIT#testGetSnapshotsRequest failing #26480
Comments
The same test failed today for both 6.0 and 6.x because it stalled. In https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+periodic/68/consoleFull we have:
Then the suite gets killed after it's been running for 20 minutes. In https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+oracle-periodic/668/consoleFull we have:
The REPRO commands are:
and:
These don't reproduce the problem locally for me. (It's a different failure to the original issue description, although the same test. If you think it's not related in any way I'm happy to move the stalls into a separate issue.) |
@droberts195 The root cause for the failures that you have observed is that the following assertion tripped (it's always good to first grep for "AssertionError" in the logs):
This in return killed the cluster state applier thread on that node, leading to the suite timeout as the node could not make meaningful progress anymore. |
The reason for this failure is that For example, we have code that does
i.e. these two are separate actions, not atomic. I think the I'm reassigning this to @imotov as I don't think that @talevy identified the root cause (the logs are not available anymore). The logs from the failure reported by @andy-elastic are available, and show the same root cause as what I wrote above. |
This one from today on the 5.6 branch looks also related: I see same AssertionError as @ywelsch mentioned:
|
Same assertion tripped again:
|
This test failed today again: |
This commit changes IndexShardSnapshotStatus so that the Stage is updated coherently with any required information. It also provides a asCopy() method that returns the status of a IndexShardSnapshotStatus at a given point in time, ensuring that all information are coherent. Closes #26480
This commit changes IndexShardSnapshotStatus so that the Stage is updated coherently with any required information. It also provides a asCopy() method that returns the status of a IndexShardSnapshotStatus at a given point in time, ensuring that all information are coherent. Closes #26480
@tlrx I just saw a similar looking failure on 6.1 today. Can you take a look if this looks related and either open this issue again or create a new one if you think it is something else? |
The PR (#28130) was not backported to 6.1 (that was a conscious decision), so we can ignore that test failure. |
I feel that I am to blame for this because of the #26463, but I cannot reproduce locally
link:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+oracle-periodic/605/console
trace
reproduce with:
The text was updated successfully, but these errors were encountered: