Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Flaky Integration Tests #369

Merged
merged 7 commits into from
Feb 9, 2022

Conversation

amitgalitz
Copy link
Member

Signed-off-by: Amit Galitzky [email protected]

Description

Fixes several flaky tests that can be solved by adding some sleep time between creating detector and the next action.

Additionally org.opensearch.ad.e2e.DetectionResultEvalutationIT.testDataset has been failing due to precision occasionally being around 0.4-0.5 so minPrecision was lowered to 0.4.

Currently draft PR as I am continuing to investigate other flaky tests that aren't simply solved by adding more sleep time or are difficult to replicate.

Issues Resolved

#278

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ohltyler
Copy link
Member

ohltyler commented Feb 4, 2022

@amitgalitz the WarningFailureExceptions listed in #278 will need some changes to the tests, specifically by not querying hidden indices directly. I can take care of these separately if you'd like.

@amitgalitz
Copy link
Member Author

@amitgalitz the WarningFailureExceptions listed in #278 will need some changes to the tests, specifically by not querying hidden indices directly. I can take care of these separately if you'd like.

If you possibly have time and since you have more knowledge of these specific tests it could be nice to cut a different PR for those.

@amitgalitz
Copy link
Member Author

@amitgalitz the WarningFailureExceptions listed in #278 will need some changes to the tests, specifically by not querying hidden indices directly. I can take care of these separately if you'd like.

If you possibly have time and since you have more knowledge of these specific tests it could be nice to cut a different PR for those.

Do you also know why this showed up in the integ testing through jenkins and not previously on our github repo before? Is the assumption that it's just flaky and we just haven't seen it fail yet on AD repo cause randomness or are there other factors that make it flaky on jenkins side?

@ohltyler
Copy link
Member

ohltyler commented Feb 4, 2022

@amitgalitz the WarningFailureExceptions listed in #278 will need some changes to the tests, specifically by not querying hidden indices directly. I can take care of these separately if you'd like.

If you possibly have time and since you have more knowledge of these specific tests it could be nice to cut a different PR for those.

Do you also know why this showed up in the integ testing through jenkins and not previously on our github repo before? Is the assumption that it's just flaky and we just haven't seen it fail yet on AD repo cause randomness or are there other factors that make it flaky on jenkins side?

It's actually not flaky on the Jenkins side, and consistently occurs on a security-enabled, bundled binary cluster. The root cause is something related to the client config when the specific plugin's test client config is interacting with a remote cluster. Long story short I have a fix to override the client for these specific requests made in the specific tests that are failing. See #372 and related ISM issue for more details

@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.07%. Comparing base (e143bf1) to head (b4eb9bb).
Report is 225 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #369      +/-   ##
============================================
- Coverage     79.26%   79.07%   -0.20%     
+ Complexity     4095     4092       -3     
============================================
  Files           295      295              
  Lines         17207    17207              
  Branches       1826     1826              
============================================
- Hits          13639    13606      -33     
- Misses         2671     2706      +35     
+ Partials        897      895       -2     
Flag Coverage Δ
plugin 79.07% <ø> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 10 files with indirect coverage changes

@ohltyler
Copy link
Member

ohltyler commented Feb 4, 2022

Can we add a sleep time inside the create anomaly detector utility fns themselves? I feel it's hard to maintain this strategy / remember to add a thread.sleep() cmd after creating some detector.

@amitgalitz
Copy link
Member Author

Can we add a sleep time inside the create anomaly detector utility fns themselves? I feel it's hard to maintain this strategy / remember to add a thread.sleep() cmd after creating some detector.

Thats a good call, finding myself adding it to more and more methods. I'll send a commit just adding the sleep to the create detector base method that gets called. Hopefully it doesn't make our CI process too much longer :)

@amitgalitz amitgalitz added infra Changes to infrastructure, testing, CI/CD, pipelines, etc. v1.3.0 labels Feb 4, 2022
ohltyler
ohltyler previously approved these changes Feb 4, 2022
Copy link
Member

@ohltyler ohltyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for taking the time on fixing these, should help with future tests too.

Signed-off-by: Amit Galitzky <[email protected]>
@amitgalitz amitgalitz marked this pull request as ready for review February 7, 2022 17:28
@amitgalitz amitgalitz requested a review from a team February 7, 2022 17:28
@amitgalitz amitgalitz requested a review from ylwu-amzn February 8, 2022 00:34
@ohltyler ohltyler mentioned this pull request Feb 8, 2022
@amitgalitz amitgalitz merged commit e0bc4a2 into opensearch-project:main Feb 9, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 1, 2022
* fixing flaky tests by lowering precision and adding sleep time

Signed-off-by: Amit Galitzky <[email protected]>
(cherry picked from commit e0bc4a2)
ohltyler pushed a commit that referenced this pull request Mar 2, 2022
* fixing flaky tests by lowering precision and adding sleep time

Signed-off-by: Amit Galitzky <[email protected]>
(cherry picked from commit e0bc4a2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.x infra Changes to infrastructure, testing, CI/CD, pipelines, etc. v1.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants