Fix flaky test `**/testSearchTaskCancellationWithHighCpu` & shard variant #7978

stephen-crawford · 2023-06-08T20:52:34Z

Description

This change resolves the issue of the SearchBackpressureIT.testSearchTaskCancellationWithHighCPU encountered here #7972.

The issue was happening because the selected timelimit threshold for the test was too high leading the test to pass on occasion. To check this, I ran the test with additional logging at line 51 of the CpuUsageTracker.java class which is what processes the reason for failure. I then ran the test to verify the issue and found that the Optional.empty() was being returned in cases where the completion time of the request was around .9 seconds (just under the 1 second threshold).

I dropped the threshold to be .05 seconds so that it should always trigger. Since this value is just for testing that the correct exception messages are logged, it should not matter what the value is. The default value remains 3 seconds and is unaffected by this change.

Related Issues

#7972

Check List

~~New functionality includes testing.~~
- A~~ll tests pass~~
~~New functionality has been documented.~~
- ~~New functionality has javadoc added~~
Commits are signed per the DCO using --signoff
~~Commit changes are listed out in CHANGELOG.md file (See: Changelog)~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Stephen Crawford <[email protected]>

github-actions · 2023-06-08T21:24:00Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

URL: https://build.ci.opensearch.org/job/gradle-check/17171/
CommitID: a778c69
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2023-06-08T21:29:17Z

Codecov Report

Merging #7978 (1ed681f) into main (52326d7) will increase coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main    #7978      +/-   ##
============================================
+ Coverage     70.84%   70.86%   +0.01%     
+ Complexity    56530    56502      -28     
============================================
  Files          4714     4714              
  Lines        267213   267213              
  Branches      39182    39182              
============================================
+ Hits         189310   189348      +38     
- Misses        61916    61920       +4     
+ Partials      15987    15945      -42

see 450 files with indirect coverage changes

stephen-crawford · 2023-06-09T13:03:21Z

Gradle Check (Jenkins) Run Completed with:

* **RESULT:** UNSTABLE ❕

* **TEST FAILURES:**

      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

* **URL:** https://build.ci.opensearch.org/job/gradle-check/17171/

* **CommitID:** [a778c69](https://github.com/opensearch-project/OpenSearch/commit/a778c69acbacc08444939f250323d4866c4b15bc)
  Please review all [flaky tests](https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#flaky-tests) that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Well, I guess that did not work as it appeared locally...

I suspect the runner is more powerful then my local computer so performs the operations faster and still comes in under the threshold. Going to look into finding a more appropriate value or way to prevent the different systems from getting different results.

Signed-off-by: Stephen Crawford <[email protected]>

github-actions · 2023-06-09T13:53:02Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      2 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/numeric profiler}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/170_cardinality_metric/profiler double}

URL: https://build.ci.opensearch.org/job/gradle-check/17213/
CommitID: f42190f
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2023-06-09T13:53:35Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/17215/
CommitID: 23e48a6
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-06-09T13:54:05Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
      1 org.opensearch.index.ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting
      1 org.opensearch.index.ShardIndexingPressureSettingsIT.classMethod
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/170_cardinality_metric/profiler int}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/10_histogram/histogram profiler}

URL: https://build.ci.opensearch.org/job/gradle-check/17212/
CommitID: c867646
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2023-06-09T13:58:59Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/17214/
CommitID: 23e48a6
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-06-09T14:04:56Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

URL: https://build.ci.opensearch.org/job/gradle-check/17216/
CommitID: 1ed681f
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

stephen-crawford · 2023-06-09T14:08:41Z

Should be set @reta. Dropped the threshold a lot, I don't know at what point it becomes unreasonable, but since the tests are just verifying that the CpuUsageTracker throws when the time is above the threshold, it should be fine. The purpose does not seem to be to check the performance so much as make sure there is an appropriate exception.

There are two more flaky tests that are from other classes. I can look at those next.

reta

Thanks a lot @scrawfor99 !

…iant (#7978) * Fix flake Signed-off-by: Stephen Crawford <[email protected]> * Drop threshold further Signed-off-by: Stephen Crawford <[email protected]> * Remove empty line Signed-off-by: Stephen Crawford <[email protected]> * Change threshold for ShardCPU test Signed-off-by: Stephen Crawford <[email protected]> --------- Signed-off-by: Stephen Crawford <[email protected]> (cherry picked from commit 812b3e3) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…iant (#7978) (#7987) * Fix flake * Drop threshold further * Remove empty line * Change threshold for ShardCPU test --------- (cherry picked from commit 812b3e3) Signed-off-by: Stephen Crawford <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…iant (opensearch-project#7978) (opensearch-project#7987) * Fix flake * Drop threshold further * Remove empty line * Change threshold for ShardCPU test --------- (cherry picked from commit 812b3e3) Signed-off-by: Stephen Crawford <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…iant (opensearch-project#7978) * Fix flake Signed-off-by: Stephen Crawford <[email protected]> * Drop threshold further Signed-off-by: Stephen Crawford <[email protected]> * Remove empty line Signed-off-by: Stephen Crawford <[email protected]> * Change threshold for ShardCPU test Signed-off-by: Stephen Crawford <[email protected]> --------- Signed-off-by: Stephen Crawford <[email protected]> Signed-off-by: Rishab Nahata <[email protected]>

…iant (opensearch-project#7978) * Fix flake Signed-off-by: Stephen Crawford <[email protected]> * Drop threshold further Signed-off-by: Stephen Crawford <[email protected]> * Remove empty line Signed-off-by: Stephen Crawford <[email protected]> * Change threshold for ShardCPU test Signed-off-by: Stephen Crawford <[email protected]> --------- Signed-off-by: Stephen Crawford <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>

Fix flake

a778c69

Signed-off-by: Stephen Crawford <[email protected]>

stephen-crawford requested review from reta, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, mch2, nknize, owaiskazi19, Rishikesh1159, ryanbogan, saratvemulapalli, shwetathareja, dreamer-89, tlfeng, VachaShah and dbwiddis as code owners June 8, 2023 20:52

stephen-crawford added the skip-changelog label Jun 8, 2023

stephen-crawford changed the title ~~Fix flaky test **/SearchBackpressureIT.testSearchTaskCancellationWithHighCpu~~ Fix flaky test **/testSearchTaskCancellationWithHighCpu Jun 8, 2023

stephen-crawford mentioned this pull request Jun 8, 2023

[BUG] org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu is flaky #7972

Closed

stephen-crawford added 2 commits June 9, 2023 09:17

Drop threshold further

c867646

Signed-off-by: Stephen Crawford <[email protected]>

Remove empty line

f42190f

Signed-off-by: Stephen Crawford <[email protected]>

stephen-crawford and others added 2 commits June 9, 2023 09:19

Merge branch 'opensearch-project:main' into backpressureFlaky

23e48a6

Change threshold for ShardCPU test

1ed681f

Signed-off-by: Stephen Crawford <[email protected]>

stephen-crawford changed the title ~~Fix flaky test **/testSearchTaskCancellationWithHighCpu~~ Fix flaky test **/testSearchTaskCancellationWithHighCpu & shard variant Jun 9, 2023

reta approved these changes Jun 9, 2023

View reviewed changes

reta merged commit 812b3e3 into opensearch-project:main Jun 9, 2023

reta added the backport 2.x Backport to 2.x branch label Jun 9, 2023

opensearch-trigger-bot bot mentioned this pull request Jun 9, 2023

[Backport 2.x] Fix flaky test **/testSearchTaskCancellationWithHighCpu & shard variant #7987

Merged

stephen-crawford mentioned this pull request Jun 9, 2023

[BUG] Flaky test SearchBackpressureIT.testSearchTaskCancellationWithHighCpu #7750

Closed

stephen-crawford deleted the backpressureFlaky branch June 9, 2023 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test `**/testSearchTaskCancellationWithHighCpu` & shard variant #7978

Fix flaky test `**/testSearchTaskCancellationWithHighCpu` & shard variant #7978

stephen-crawford commented Jun 8, 2023 •

edited

Loading

github-actions bot commented Jun 8, 2023

codecov bot commented Jun 8, 2023 •

edited

Loading

stephen-crawford commented Jun 9, 2023 •

edited

Loading

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 9, 2023

github-actions bot commented Jun 9, 2023

github-actions bot commented Jun 9, 2023

github-actions bot commented Jun 9, 2023

github-actions bot commented Jun 9, 2023

stephen-crawford commented Jun 9, 2023

reta left a comment

Fix flaky test **/testSearchTaskCancellationWithHighCpu & shard variant #7978

Fix flaky test **/testSearchTaskCancellationWithHighCpu & shard variant #7978

Conversation

stephen-crawford commented Jun 8, 2023 • edited Loading

Description

Related Issues

Check List

github-actions bot commented Jun 8, 2023

Gradle Check (Jenkins) Run Completed with:

codecov bot commented Jun 8, 2023 • edited Loading

Codecov Report

stephen-crawford commented Jun 9, 2023 • edited Loading

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

stephen-crawford commented Jun 9, 2023

reta left a comment

Choose a reason for hiding this comment

Fix flaky test `**/testSearchTaskCancellationWithHighCpu` & shard variant #7978

Fix flaky test `**/testSearchTaskCancellationWithHighCpu` & shard variant #7978

stephen-crawford commented Jun 8, 2023 •

edited

Loading

codecov bot commented Jun 8, 2023 •

edited

Loading

stephen-crawford commented Jun 9, 2023 •

edited

Loading