Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries #174247

Open
Tracked by #173508 ...
MadameSheema opened this issue Jan 4, 2024 · 5 comments
Labels
Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@MadameSheema
Copy link
Member

We all know that flakiness may happen from time to time, ideally, the only flakiness that we should face, is the one regarding external factors as slow machines or network issues.

With Cypress we have the test retries functionality enabled. Test retries has been configured with 1 retry attempt, Cypress will retry a failed test an additional time (for a total of 2 attempts) before potentially being marked as a failed test. When a test is re-executed, the each hooks will be re-run as well, however, failures in before and after hooks will not trigger a retry and the test will be marked as failure.

So in order to have 'retriable' tests, we should get rid off the before and after hooks in favor of the beforeEach and afterEach hook. Or at least make sure that the code executed in the before and after hook is not prone to fail (i.e. es_archiver).

Another thing we need to take into consideration to guarantee that a test can be retried is to make sure that the data that the test might generate is properly cleaned.

Each spec file is executed on a clean environment, but, retries are not. Retries are executed on the same environment the execution was initiated, this is why is pretty important to make sure that the data the test may generate is cleaned at the beginning.

@MadameSheema MadameSheema added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Jan 4, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror
Copy link
Contributor

@MadameSheema I'd like to understand the necessity and urgency of the changes proposed in this ticket. In other words:

  • Do we absolutely need to do this and why?
  • Do we need to do it for all tests or we can do it only for those that can be considered potentially flaky?
  • Is it a blocker for enabling the 2nd quality gate or a nice to have thing?
  • If we agree that we need it, can we do it later? What would be a deadline, if any?

I'm sceptical about the proposed change, because it doesn't match common testing best practices. In any kind of automated tests, running some common cleanup or setup logic in before and after hooks is a normal thing to do. That way you don't run it many times as opposed to running it in beforeEach and afterEach. Also, I'm not sure that by doing this we'd necessarily make Cypress tests more stable in MKI or in general.

Maybe, it would be enough (and better) to add retry logic to a select subset of functions that we consider prone to flakiness, as it was done in #173998 and #176316.

@MadameSheema
Copy link
Member Author

Do we absolutely need to do this and why?

It would be great. Once our tests are integrated with the kibana second quality gate, any failure on any test will block a deployment.

Do we need to do it for all tests or we can do it only for those that can be considered potentially flaky?

Ideally, any test, starting with those considered potentially flaky would be great! We can of course discuss different strategies to achieve the same point.

Is it a blocker for enabling the 2nd quality gate or a nice to have thing?

The main blocker for enabling the tests on the 2nd quality gate is the number of flaky tests we have been facing. This is a strategy to minimize the risk of failure which I don't like because I agree with you that sometimes does not match with common best practices as the execution time.

If we agree that we need it, can we do it later? What would be a deadline, if any?

We need to make sure our tests are stable on MKI environment, this is just a task that might help with it.

I'm sceptical about the proposed change, because it doesn't match common testing best practices. In any kind of automated tests, running some common cleanup or setup logic in before and after hooks is a normal thing to do. That way you don't run it many times as opposed to running it in beforeEach and afterEach. Also, I'm not sure that by doing this we'd necessarily make Cypress tests more stable in MKI or in general.

Maybe, it would be enough (and better) to add retry logic to a select subset of functions that we consider prone to flakiness, as it was done in #173998 and #176316.

I understand your point. Cypress has a built-in retry, if the test fails one time it executes the test again. The main issue of that built-in mode is that if the failure happens in a before or after hook, the retry is not triggered and the test is marked as failed inmediately.

@hop-dev
Copy link
Contributor

hop-dev commented Feb 14, 2024

Or at least make sure that the code executed in the before and after hook is not prone to fail (i.e. es_archiver).

@MadameSheema just to confirm, are you saying that es_archiver calls are not prone to fail? (and therefore OK in before/after calls?)

@MadameSheema
Copy link
Member Author

@MadameSheema just to confirm, are you saying that es_archiver calls are not prone to fail? (and therefore OK in before/after calls?)

Yup :)

hop-dev added a commit that referenced this issue Feb 20, 2024
…before` and `after` calls in cypress tests (#177175)

## Summary

A couple of small tweaks to our cypress tests to move any non-archiver
code out of `before` and `after` hooks and into `beforeEach`.

Flaky test run serverless and ess
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5219
✅ ✅

See #174247 for context

Co-authored-by: Kibana Machine <[email protected]>
fkanout pushed a commit to fkanout/kibana that referenced this issue Mar 4, 2024
…before` and `after` calls in cypress tests (elastic#177175)

## Summary

A couple of small tweaks to our cypress tests to move any non-archiver
code out of `before` and `after` hooks and into `beforeEach`.

Flaky test run serverless and ess
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5219
✅ ✅

See elastic#174247 for context

Co-authored-by: Kibana Machine <[email protected]>
@MadameSheema MadameSheema reopened this Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

No branches or pull requests

4 participants