Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ShuffleForcedMergePolicyTests testDiagnostics failing #88032

Closed
masseyke opened this issue Jun 24, 2022 · 5 comments · Fixed by #88062
Closed

[CI] ShuffleForcedMergePolicyTests testDiagnostics failing #88032

masseyke opened this issue Jun 24, 2022 · 5 comments · Fixed by #88062
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@masseyke
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/uk4cuajaxninu/tests/:server:test/org.elasticsearch.index.engine.ShuffleForcedMergePolicyTests/testDiagnostics

Reproduction line:
null

Applicable branches:
master

Reproduces locally?:
Yes

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.index.engine.ShuffleForcedMergePolicyTests&tests.test=testDiagnostics

Failure excerpt:

java.lang.AssertionError: 
Expected: a value greater than <2>
     but: <2> was equal to <2>

  at __randomizedtesting.SeedInfo.seed([23981A5E229E7C9D:16E6C4A420CE93B4]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.index.engine.ShuffleForcedMergePolicyTests.testDiagnostics(ShuffleForcedMergePolicyTests.java:56)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:833)

@masseyke masseyke added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI labels Jun 24, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@pxsalehi
Copy link
Member

Since this does reproduce locally, I'd like to give it a try. @elastic/es-distributed In case this is blocking others, and you know a quick fix, please let me know.

@pxsalehi pxsalehi self-assigned this Jun 27, 2022
@pxsalehi
Copy link
Member

The failure seems to be a result of bumping Lucene to 9.3.0 (#87932). One of the changes in 9.3.0 (Merge-on-refesh) also makes changes to the base test class used in this test, which means essentially merge-on-refresh is not always disabled in the default config we get from newIndexWriterConfig (apache/lucene#921). I believe, this might lead to more frequent merges in some case, and the test sees only 2 segments rather than more than 2. I have no knowledge of Lucene/Engine yet, but @fcofdez helped me understand the test.

I think, one way to address this is to explicitly disable merge-on-refresh in the test setup. This would essentially give the same test setup as before using 9.3.0.

However, I do have one question about the test itself. The main point of the test seems to be ensuring that some metadata regarding how merging was performed persists across merges (i.e., ShuffleForcedMergePolicy was used). If we just need to make sure we have more than one segment so the force merge call in the code actually merges them to one segment, why do we expect more than 2 segments here (from what I understood, leaves are the segments!)? Wouldn't it be enough to expect more than 1 segment? (which would also fix the test).

@fcofdez
Copy link
Contributor

fcofdez commented Jun 27, 2022

However, I do have one question about the test itself. The main point of the test seems to be ensuring that some metadata regarding how merging was performed persists across merges (i.e., ShuffleForcedMergePolicy was used). If we just need to make sure we have more than one segment so the force merge call in the code actually merges them to one segment, why do we expect more than 2 segments here (from what I understood, leaves are the segments!)? Wouldn't it be enough to expect more than 1 segment? (which would also fix the test).

I think we still need more than 2 segments to test ShuffleForcedMergePolicy as it interleaves old and new segments during force merge. Even though it is not essential for the test as you mention I would prefer to keep the original intention (maybe we should add a comment mentioning why we expect > 2 segments). @pxsalehi

@pxsalehi
Copy link
Member

Thank Francisco, I'll do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants