Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test **/testSearchTaskCancellationWithHighCpu & shard variant #7978

Merged
merged 5 commits into from
Jun 9, 2023

Conversation

stephen-crawford
Copy link
Contributor

@stephen-crawford stephen-crawford commented Jun 8, 2023

Description

This change resolves the issue of the SearchBackpressureIT.testSearchTaskCancellationWithHighCPU encountered here #7972.

The issue was happening because the selected timelimit threshold for the test was too high leading the test to pass on occasion. To check this, I ran the test with additional logging at line 51 of the CpuUsageTracker.java class which is what processes the reason for failure. I then ran the test to verify the issue and found that the Optional.empty() was being returned in cases where the completion time of the request was around .9 seconds (just under the 1 second threshold).

I dropped the threshold to be .05 seconds so that it should always trigger. Since this value is just for testing that the correct exception messages are logged, it should not matter what the value is. The default value remains 3 seconds and is unaffected by this change.

Related Issues

#7972

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Stephen Crawford <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

@codecov
Copy link

codecov bot commented Jun 8, 2023

Codecov Report

Merging #7978 (1ed681f) into main (52326d7) will increase coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main    #7978      +/-   ##
============================================
+ Coverage     70.84%   70.86%   +0.01%     
+ Complexity    56530    56502      -28     
============================================
  Files          4714     4714              
  Lines        267213   267213              
  Branches      39182    39182              
============================================
+ Hits         189310   189348      +38     
- Misses        61916    61920       +4     
+ Partials      15987    15945      -42     

see 450 files with indirect coverage changes

@stephen-crawford
Copy link
Contributor Author

stephen-crawford commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

* **RESULT:** UNSTABLE ❕

* **TEST FAILURES:**
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu
* **URL:** https://build.ci.opensearch.org/job/gradle-check/17171/

* **CommitID:** [a778c69](https://github.com/opensearch-project/OpenSearch/commit/a778c69acbacc08444939f250323d4866c4b15bc)
  Please review all [flaky tests](https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#flaky-tests) that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Well, I guess that did not work as it appeared locally...

I suspect the runner is more powerful then my local computer so performs the operations faster and still comes in under the threshold. Going to look into finding a more appropriate value or way to prevent the different systems from getting different results.

Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: Stephen Crawford <[email protected]>
@stephen-crawford stephen-crawford changed the title Fix flaky test **/testSearchTaskCancellationWithHighCpu Fix flaky test **/testSearchTaskCancellationWithHighCpu & shard variant Jun 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/numeric profiler}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/170_cardinality_metric/profiler double}

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
      1 org.opensearch.index.ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting
      1 org.opensearch.index.ShardIndexingPressureSettingsIT.classMethod
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/170_cardinality_metric/profiler int}
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/10_histogram/histogram profiler}

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@stephen-crawford
Copy link
Contributor Author

Should be set @reta. Dropped the threshold a lot, I don't know at what point it becomes unreasonable, but since the tests are just verifying that the CpuUsageTracker throws when the time is above the threshold, it should be fine. The purpose does not seem to be to check the performance so much as make sure there is an appropriate exception.

There are two more flaky tests that are from other classes. I can look at those next.

Copy link
Collaborator

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @scrawfor99 !

@reta reta merged commit 812b3e3 into opensearch-project:main Jun 9, 2023
@reta reta added the backport 2.x Backport to 2.x branch label Jun 9, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 9, 2023
…iant (#7978)

* Fix flake

Signed-off-by: Stephen Crawford <[email protected]>

* Drop threshold further

Signed-off-by: Stephen Crawford <[email protected]>

* Remove empty line

Signed-off-by: Stephen Crawford <[email protected]>

* Change threshold for ShardCPU test

Signed-off-by: Stephen Crawford <[email protected]>

---------

Signed-off-by: Stephen Crawford <[email protected]>
(cherry picked from commit 812b3e3)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@stephen-crawford stephen-crawford deleted the backpressureFlaky branch June 9, 2023 15:39
dbwiddis pushed a commit that referenced this pull request Jun 10, 2023
…iant (#7978) (#7987)

* Fix flake



* Drop threshold further



* Remove empty line



* Change threshold for ShardCPU test



---------


(cherry picked from commit 812b3e3)

Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
…iant (opensearch-project#7978) (opensearch-project#7987)

* Fix flake



* Drop threshold further



* Remove empty line



* Change threshold for ShardCPU test



---------


(cherry picked from commit 812b3e3)

Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jun 27, 2023
…iant (opensearch-project#7978)

* Fix flake

Signed-off-by: Stephen Crawford <[email protected]>

* Drop threshold further

Signed-off-by: Stephen Crawford <[email protected]>

* Remove empty line

Signed-off-by: Stephen Crawford <[email protected]>

* Change threshold for ShardCPU test

Signed-off-by: Stephen Crawford <[email protected]>

---------

Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…iant (opensearch-project#7978)

* Fix flake

Signed-off-by: Stephen Crawford <[email protected]>

* Drop threshold further

Signed-off-by: Stephen Crawford <[email protected]>

* Remove empty line

Signed-off-by: Stephen Crawford <[email protected]>

* Change threshold for ShardCPU test

Signed-off-by: Stephen Crawford <[email protected]>

---------

Signed-off-by: Stephen Crawford <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants