Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CorruptedFileIT.testCorruptFileThenSnapshotAndRestore fails occasionally #19591

Closed
danielmitterdorfer opened this issue Jul 26, 2016 · 5 comments
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@danielmitterdorfer
Copy link
Member

CorruptedFileIT.testCorruptFileThenSnapshotAndRestore fails occasionally with the following assertion error:

   > Throwable #1: java.lang.AssertionError: [test][0], node[kXBrV8JKTCG8ZrjOt_W0vg], [P], s[STARTED], a[id=Zz-bwJLZTkG3yl9D8jMQSw]
   > Expected: <1>
   >      but: was <0>
   >    at __randomizedtesting.SeedInfo.seed([A0A4AF57F24AD88E:6D7015D6605692ED]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.index.store.CorruptedFileIT.listShardFiles(CorruptedFileIT.java:692)
   >    at org.elasticsearch.index.store.CorruptedFileIT.testCorruptFileThenSnapshotAndRestore(CorruptedFileIT.java:510)
   >    at java.lang.Thread.run(Thread.java:745)

This means that the node stats for the cluster node cannot be retrieved (or rather that there are no nodes in the corresponding node stats instance).

Latest build where this occurred: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+seq_no+periodic/1606/consoleFull

@danielmitterdorfer danielmitterdorfer added the >test-failure Triaged test failures from CI label Jul 26, 2016
imotov added a commit that referenced this issue Jul 26, 2016
This test fails because of an unknown exceptions in FsService.stats() method, which causes no stats to be returned. With this change the exception that is causing this issue is going to be logged.

Related to #19591 and #17964
@imotov
Copy link
Contributor

imotov commented Jul 26, 2016

This test failure seems to be caused by an exception in FsService which is getting swallowed. I changed the log levels for this test to see which exception is actually getting logged.

@danielmitterdorfer
Copy link
Member Author

Thanks for digging @imotov.

@imotov
Copy link
Contributor

imotov commented Aug 3, 2016

It failed again and @pickypg and I dug a bit more into it and at the moment the only reasonable explanation that we could come up with is some problem in exception handling of stats operation. @pickypg is going to continue digging.

@colings86 colings86 added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Mar 21, 2017
@javanna
Copy link
Member

javanna commented Jun 16, 2017

given that this issue had no activity in ~10 months, shall we close it @pickypg @imotov ?

@pickypg
Copy link
Member

pickypg commented Jun 16, 2017

With #25017 we should catch this if it happens again, so yeah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants