-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky BWC test #296
Fix flaky BWC test #296
Conversation
Signed-off-by: Mohammad Qureshi <[email protected]>
// TODO: Change the next execution time of the Monitor manually instead since this inflates | ||
// the test execution by a lot (might have to wait for Job Scheduler plugin integration first) | ||
// Waiting a minute to ensure the Monitor ran again at least once before checking if the job is running | ||
// on time | ||
Thread.sleep(60000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually a consistent way to avoid sleeps is by using assertBusy
, given if we expect some change of state within a stipulated time threshold. It allows to runs the code block for the provided interval, waiting for no assertions to trip, and return as soon as all the assertions are met. So wondering of there exists some state related to monitor execution, such as count, that we can rely upon as a trigger here, and then verify the stats response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for the sleep was that the minimum Monitor execution interval is 1 minute
and we were going to ensure the Monitor executed again before checking the stats to make sure that any late running Monitor manifested itself. The test itself is not perfect and we'd normally want to just fast forward the job execution like we do for the Index Management test but Alerting uses an older job scheduling implementation where the next job execution time is not stored on the document but rather in some ConcurrentHashMap
object that is not test friendly (is not easily mutable in written tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying @qreshi. So my suggestion was, is there is a way (API/Object) which can specify if monitor was run again (such as a running-counter). If yes, then we can do assertBusy, which provides an exponential backoff mechanism to wait until the condition is met.
If not, I understand having sleep is the only valid option.
…#296) Signed-off-by: Mohammad Qureshi <[email protected]>
…#296) Signed-off-by: Mohammad Qureshi <[email protected]>
Signed-off-by: Mohammad Qureshi <[email protected]>
…#296) Signed-off-by: Mohammad Qureshi <[email protected]> Signed-off-by: AWSHurneyt <[email protected]>
Signed-off-by: Mohammad Qureshi [email protected]
Issue #, if available: #241
Description of changes:
The BWC tests will occasionally fail on the
twoThirdsUpgradedClusterTask
. It seems that there is some race condition that can lead to some test cluster changes to not go through due to the sleep being done in the mixed cluster scenario. For the time being, the sleep will be removed fromMIXED_CLUSTER
scenario as this was shown to remove flakiness.The sleeps were initially put in to give time for a Monitor execution in between assertions but the current checks in mixed state will suffice. Once we have an easy way to artificially forward Monitor executions (which will likely be after integration with Job Scheduler since the next execution time isn't stored in the Monitor job currently) then we can add additional checks without relying on sleeps.
Some local verification:
Multiple test runs before change
Multiple test runs after removal of sleep
CheckList:
[x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.