Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in CloseWhileRelocatingShardsIT #44855

Closed
DaveCTurner opened this issue Jul 25, 2019 · 3 comments
Closed

Failure in CloseWhileRelocatingShardsIT #44855

DaveCTurner opened this issue Jul 25, 2019 · 3 comments
Assignees
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

I am seeing occasional failures of CloseWhileRelocatingShardsIT of the following form:

Suite: Test class org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT
  2> liep. 25, 2019 9:37:59 PRIEŠPIET com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_sd4][generic][T#5],5,TGRP-CloseWhileRelocatingShardsIT]
  2> java.lang.AssertionError: max seq. no. [-1] does not match [207]
  2>    at __randomizedtesting.SeedInfo.seed([AE1B2D049A25A3A3]:0)
  2>    at org.elasticsearch.index.engine.ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:153)
  2>    at org.elasticsearch.index.engine.ReadOnlyEngine.ensureMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:144)
  2>    at org.elasticsearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:113)
  2>    at org.elasticsearch.index.engine.NoOpEngine.<init>(NoOpEngine.java:54)
  2>    at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1595)
  2>    at org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1411)
  2>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:176)
  2>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:552)
  2>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769)
  2>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
  2>    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  2>    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  2>    at java.base/java.lang.Thread.run(Thread.java:835)

  2> REPRODUCE WITH: ./gradlew :server:integTest --tests "org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards" -Dtests.seed=AE1B2D049A25A3A3 -Dtests.security.manager=true -Dtests.jvms=4 -Dtests.locale=lt-LT -Dtests.timezone=America/Buenos_Aires -Dcompiler.java=12 -Druntime.java=12

Doesn't reproduce every time at 6275cd7 but normally fails after only a few iterations. Backing up to 69c94f4 before the merge of #43463 this test passes reliably (tens of successful iterations and counting).

Relates #41536 as this failure is only on the peer-recovery-retention-leases feature branch(es).

@DaveCTurner DaveCTurner added >test-failure Triaged test failures from CI :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jul 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

DaveCTurner added a commit that referenced this issue Jul 25, 2019
@DaveCTurner
Copy link
Contributor Author

Muted in 96dd543.

dnhatn added a commit that referenced this issue Jul 30, 2019
For closed and frozen indices, we should not recover shard locally up to
the global checkpoint before performing peer recovery for that copy
might be offline when the index was closed/frozen.

Relates #43463
Closes #44855
dnhatn added a commit that referenced this issue Jul 30, 2019
For closed and frozen indices, we should not recover shard locally up to
the global checkpoint before performing peer recovery for that copy
might be offline when the index was closed/frozen.

Relates #43463
Closes #44855
@dnhatn
Copy link
Member

dnhatn commented Jul 30, 2019

Fixed in #44887

@dnhatn dnhatn closed this as completed Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants