DiskThresholdDeciderIT.testHighWatermarkNotExceeded failure #62326

romseygeek · 2020-09-14T15:52:54Z

Build scan:
https://gradle-enterprise.elastic.co/s/okz2lziucuxkq/tests/:server:internalClusterTest/org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT/testHighWatermarkNotExceeded

Repro line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testHighWatermarkNotExceeded" -Dtests.seed=60A6AF6A936EF834 -Dtests.security.manager=true -Dtests.locale=ro-RO -Dtests.timezone=America/Pangnirtung -Druntime.java=11

Reproduces locally?: no

Applicable branches: master

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?search.buildToolTypes=gradle&search.buildToolTypes=maven&search.relativeStartTime=P7D&search.timeZoneId=Europe/London&tests.container=org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT&tests.sortField=FAILED&tests.test=testHighWatermarkNotExceeded&tests.unstableOnly=true

Failure excerpt:

java.lang.AssertionError: |  
  | Expected: a collection with size <1> |  
  | but: collection size was <0>

at __randomizedtesting.SeedInfo.seed([60A6AF6A936EF834:89874ED813A831DA]:0) |  
  |   | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  |   | at org.junit.Assert.assertThat(Assert.java:956) |  
  |   | at org.junit.Assert.assertThat(Assert.java:923) |  
  |   | at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testHighWatermarkNotExceeded(DiskThresholdDeciderIT.java:160) |  
  |

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-09-14T15:53:23Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

original-brownbear · 2020-09-14T17:05:38Z

This reproduces pretty easily for me locally over a few tens of runs. I'll look into a fix tomorrow

…erIT (#62358) Closes #62326

…erIT (#62358) (#62467) Closes #62326

droberts195 · 2020-09-21T15:37:11Z

The test failed in the same way in 7.x in https://gradle-enterprise.elastic.co/s/x4m3oyeudq7ck in a commit that is more recent than the fix.

   java.lang.AssertionError:
   Expected: a collection with size <1>
   but: collection size was <0>

Relates elastic#62326

Relates #62326

danielmitterdorfer · 2020-09-24T11:27:03Z

Muted via

Mute DiskThresholdDeciderIT #62858 in master
Mute DiskThresholdDeciderIT (#62858) #62859 in 7.x

The first refreshDiskUsage() refreshes the ClusterInfo update which in turn calls listeners like DiskThreshMonitor. This one triggers a reroute as expected and turns an internal checkInProgress flag before submitting a cluster state update to relocate shards (the internal flag is toggled again once the cluster state update is processed). In the test I suspect that the second refreshDiskUsage() may complete before DiskThreshMonitor's internal flag is set back to its initial state, resulting in the second ClusterInfo update to be ignored and message like "[node_t0] skipping monitor as a check is already in progress" to be logged. Adding another wait for languid events to be processed before executing the second refreshDiskUsage() should help here. Closes #62326

The first refreshDiskUsage() refreshes the ClusterInfo update which in turn calls listeners like DiskThreshMonitor. This one triggers a reroute as expected and turns an internal checkInProgress flag before submitting a cluster state update to relocate shards (the internal flag is toggled again once the cluster state update is processed). In the test I suspect that the second refreshDiskUsage() may complete before DiskThreshMonitor's internal flag is set back to its initial state, resulting in the second ClusterInfo update to be ignored and message like "[node_t0] skipping monitor as a check is already in progress" to be logged. Adding another wait for languid events to be processed before executing the second refreshDiskUsage() should help here. Closes elastic#62326

) The first refreshDiskUsage() refreshes the ClusterInfo update which in turn calls listeners like DiskThreshMonitor. This one triggers a reroute as expected and turns an internal checkInProgress flag before submitting a cluster state update to relocate shards (the internal flag is toggled again once the cluster state update is processed). In the test I suspect that the second refreshDiskUsage() may complete before DiskThreshMonitor's internal flag is set back to its initial state, resulting in the second ClusterInfo update to be ignored and message like "[node_t0] skipping monitor as a check is already in progress" to be logged. Adding another wait for languid events to be processed before executing the second refreshDiskUsage() should help here. Closes #62326

cbuescher · 2020-10-08T13:10:51Z

Reopening since we face similar issues again on master and 7.x:
https://gradle-enterprise.elastic.co/s/dvlwxfzwu4yzc
https://gradle-enterprise.elastic.co/s/hcrfpa4y2h2io

Will mute on master, 7.x and 7.10

cbuescher · 2020-10-08T13:17:47Z

Muted with a615845, 0db9dd1 and 517d3e4

This is another attempt to fix #62326 as my previous attempts failed (#63112, #63385).

This is another attempt to fix elastic#62326 as my previous attempts failed (elastic#63112, elastic#63385).

This is another attempt to fix #62326 as my previous attempts failed (#63112, #63385).

romseygeek added >test-failure :Distributed Coordination/Allocation labels Sep 14, 2020

elasticmachine added the Team:Distributed (Obsolete) label Sep 14, 2020

original-brownbear self-assigned this Sep 14, 2020

tlrx assigned tlrx and unassigned original-brownbear Sep 15, 2020

tlrx mentioned this issue Sep 15, 2020

Wait for relocations and disk threshold monitor in DiskThresholdDeciderIT #62358

Merged

tlrx closed this as completed in #62358 Sep 16, 2020

tlrx added a commit that referenced this issue Sep 16, 2020

Wait for relocations and disk threshold monitor in DiskThresholdDecid…

fd7f936

…erIT (#62358) Closes #62326

tlrx mentioned this issue Sep 16, 2020

Wait for relocations and disk threshold monitor in DiskThresholdDeciderIT #62467

Merged

tlrx added a commit that referenced this issue Sep 16, 2020

Wait for relocations and disk threshold monitor in DiskThresholdDecid…

8a2e9e6

…erIT (#62358) (#62467) Closes #62326

droberts195 reopened this Sep 21, 2020

danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Sep 24, 2020

Mute DiskThresholdDeciderIT

cefe7f4

Relates elastic#62326

danielmitterdorfer mentioned this issue Sep 24, 2020

Mute DiskThresholdDeciderIT #62858

Merged

danielmitterdorfer added a commit that referenced this issue Sep 24, 2020

Mute DiskThresholdDeciderIT (#62858)

e62d9f8

Relates #62326

danielmitterdorfer mentioned this issue Sep 24, 2020

Mute DiskThresholdDeciderIT (#62858) #62859

Merged

danielmitterdorfer added a commit that referenced this issue Sep 24, 2020

Mute DiskThresholdDeciderIT (#62858) (#62859)

aec7c65

Relates #62326

tlrx mentioned this issue Oct 1, 2020

Fix DiskThresholdDeciderIT.testHighWatermarkNotExceeded #63112

Merged

tlrx closed this as completed in #63112 Oct 7, 2020

tlrx mentioned this issue Oct 7, 2020

Fix DiskThresholdDeciderIT.testHighWatermarkNotExceeded (#63112) #63385

Merged

cbuescher reopened this Oct 8, 2020

tlrx mentioned this issue Oct 13, 2020

Maybe fix DiskThresholdDeciderIT #63614

Merged

tlrx closed this as completed in #63614 Oct 14, 2020

tlrx added a commit that referenced this issue Oct 14, 2020

Try to fix DiskThresholdDeciderIT (#63614)

65c983a

This is another attempt to fix #62326 as my previous attempts failed (#63112, #63385).

tlrx added a commit to tlrx/elasticsearch that referenced this issue Oct 14, 2020

Try to fix DiskThresholdDeciderIT (elastic#63614)

d8fd8e6

This is another attempt to fix elastic#62326 as my previous attempts failed (elastic#63112, elastic#63385).

tlrx added a commit to tlrx/elasticsearch that referenced this issue Oct 14, 2020

Try to fix DiskThresholdDeciderIT (elastic#63614)

9900e5b

This is another attempt to fix elastic#62326 as my previous attempts failed (elastic#63112, elastic#63385).

tlrx mentioned this issue Oct 14, 2020

Try to fix DiskThresholdDeciderIT (#63614) #63675

Merged

tlrx mentioned this issue Oct 15, 2020

Try to fix DiskThresholdDeciderIT #63721

Merged

tlrx added a commit that referenced this issue Oct 16, 2020

Try to fix DiskThresholdDeciderIT (#63614) (#63675)

ddc419c

This is another attempt to fix #62326 as my previous attempts failed (#63112, #63385).

tlrx added a commit that referenced this issue Oct 16, 2020

Try to fix DiskThresholdDeciderIT (#63614) (#63721)

7ea44d2

This is another attempt to fix #62326 as my previous attempts failed (#63112, #63385).

dreamer-89 mentioned this issue Jun 26, 2022

Break down gradle check task to smaller maintainable verification tasks to improve flaky test failures opensearch-project/OpenSearch#1975

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiskThresholdDeciderIT.testHighWatermarkNotExceeded failure #62326

DiskThresholdDeciderIT.testHighWatermarkNotExceeded failure #62326

romseygeek commented Sep 14, 2020

elasticmachine commented Sep 14, 2020

original-brownbear commented Sep 14, 2020

droberts195 commented Sep 21, 2020

danielmitterdorfer commented Sep 24, 2020

cbuescher commented Oct 8, 2020

cbuescher commented Oct 8, 2020

DiskThresholdDeciderIT.testHighWatermarkNotExceeded failure #62326

DiskThresholdDeciderIT.testHighWatermarkNotExceeded failure #62326

Comments

romseygeek commented Sep 14, 2020

elasticmachine commented Sep 14, 2020

original-brownbear commented Sep 14, 2020

droberts195 commented Sep 21, 2020

danielmitterdorfer commented Sep 24, 2020

cbuescher commented Oct 8, 2020

cbuescher commented Oct 8, 2020