Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexShardTests.testDocStats fails sporadically #32766

Closed
alpar-t opened this issue Aug 10, 2018 · 4 comments
Closed

IndexShardTests.testDocStats fails sporadically #32766

alpar-t opened this issue Aug 10, 2018 · 4 comments
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI v6.5.0 v7.0.0-beta1

Comments

@alpar-t
Copy link
Contributor

alpar-t commented Aug 10, 2018

Example build failure:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+periodic/2518/console

Reproduction line (does not reproduce locally):

./gradlew :server:test \
  -Dtests.seed=A8914B40F85100F0 \
  -Dtests.class=org.elasticsearch.index.shard.IndexShardTests \
  -Dtests.method="testDocStats" \
  -Dtests.security.manager=true \
  -Dtests.locale=th-TH \
  -Dtests.timezone=Asia/Thimphu

Example relevant log:

09:35:38   1> [2561-08-09T15:35:33,402][INFO ][o.e.i.s.IndexShardTests  ] [testDocStats] before test
09:35:38   1> [2561-08-09T15:35:33,495][INFO ][o.e.i.s.IndexShardTests  ] [testDocStats] after test
09:35:38 FAILURE 0.11s J0 | IndexShardTests.testDocStats <<< FAILURES!
09:35:38    > Throwable #1: java.lang.AssertionError: 
09:35:38    > Expected: <12L>
09:35:38    >      but: was <0L>
09:35:38    > 	at __randomizedtesting.SeedInfo.seed([A8914B40F85100F0:F184FAEDD87CEDDD]:0)
09:35:38    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
09:35:38    > 	at org.elasticsearch.index.shard.IndexShardTests.testDocStats(IndexShardTests.java:2442)
09:35:38    > 	at java.lang.Thread.run(Thread.java:748)

Frequency: First failure dates back to February, with another isolated failure on 26th of June, then spikes on July 26-27 with 7 failures across branches and much more on PR builds. Failed at least once per day in the last couple of days.

@alpar-t alpar-t added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI v7.0.0 v6.5.0 labels Aug 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@ywelsch ywelsch added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Search/Search Search-related issues that do not fall into other categories labels Aug 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor

ywelsch commented Aug 14, 2018

This looks related to #32241
@andrershov can you have a look?

@andyb-elastic
Copy link
Contributor

andrershov pushed a commit to andrershov/elasticsearch that referenced this issue Aug 16, 2018
testDocStats test is flaky and sometimes its failing on jenkins on and
failure is not reproducible locally. The reason for this failure is in
timing. If number of deleted documents is greater than 33% of inserted
documents, Lucene will schedule segments merge if TieredMergePolicy is
used (it's not the case for LogMergePolicy, but ES is only using
TieredMergePolicy). If this merge is performed before stats are
retrieved - we will get 0 for deleted counter.
So basically this counter could be either 0 or numOfDeletedDocs at this
point, but this is too loose assertion and we decided to remove it at
all.
Closes elastic#32766
andrershov added a commit that referenced this issue Aug 17, 2018
testDocStats test is flaky and sometimes it's failing on jenkins and
failure is not reproducible locally. The reason for this failure is in
timing. If the number of deleted documents is greater than 33% of inserted
documents, Lucene will schedule segments to merge if TieredMergePolicy is
used (it's not the case for LogMergePolicy, but ES is only using
TieredMergePolicy). If this merge is performed before stats are
retrieved - we will get 0 for "deleted" counter.
So basically this counter could be either 0 or numOfDeletedDocs at this point,
but this is the too loose assertion and we decided to remove it at all.
Closes #32766
andrershov added a commit that referenced this issue Aug 17, 2018
testDocStats test is flaky and sometimes it's failing on jenkins and
failure is not reproducible locally. The reason for this failure is in
timing. If the number of deleted documents is greater than 33% of inserted
documents, Lucene will schedule segments to merge if TieredMergePolicy is
used (it's not the case for LogMergePolicy, but ES is only using
TieredMergePolicy). If this merge is performed before stats are
retrieved - we will get 0 for "deleted" counter.
So basically this counter could be either 0 or numOfDeletedDocs at this point,
but this is the too loose assertion and we decided to remove it at all.
Closes #32766
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI v6.5.0 v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests

6 participants