[Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries #174247

MadameSheema · 2024-01-04T11:17:14Z

We all know that flakiness may happen from time to time, ideally, the only flakiness that we should face, is the one regarding external factors as slow machines or network issues.

With Cypress we have the test retries functionality enabled. Test retries has been configured with 1 retry attempt, Cypress will retry a failed test an additional time (for a total of 2 attempts) before potentially being marked as a failed test. When a test is re-executed, the each hooks will be re-run as well, however, failures in before and after hooks will not trigger a retry and the test will be marked as failure.

So in order to have 'retriable' tests, we should get rid off the before and after hooks in favor of the beforeEach and afterEach hook. Or at least make sure that the code executed in the before and after hook is not prone to fail (i.e. es_archiver).

Another thing we need to take into consideration to guarantee that a test can be retried is to make sure that the data that the test might generate is properly cleaned.

Each spec file is executed on a clean environment, but, retries are not. Retries are executed on the same environment the execution was initiated, this is why is pretty important to make sure that the data the test may generate is cleaned at the beginning.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-01-04T11:17:16Z

Pinging @elastic/security-solution (Team: SecuritySolution)

banderror · 2024-02-12T08:46:57Z

@MadameSheema I'd like to understand the necessity and urgency of the changes proposed in this ticket. In other words:

Do we absolutely need to do this and why?
Do we need to do it for all tests or we can do it only for those that can be considered potentially flaky?
Is it a blocker for enabling the 2nd quality gate or a nice to have thing?
If we agree that we need it, can we do it later? What would be a deadline, if any?

I'm sceptical about the proposed change, because it doesn't match common testing best practices. In any kind of automated tests, running some common cleanup or setup logic in before and after hooks is a normal thing to do. That way you don't run it many times as opposed to running it in beforeEach and afterEach. Also, I'm not sure that by doing this we'd necessarily make Cypress tests more stable in MKI or in general.

Maybe, it would be enough (and better) to add retry logic to a select subset of functions that we consider prone to flakiness, as it was done in #173998 and #176316.

MadameSheema · 2024-02-12T11:01:54Z

Do we absolutely need to do this and why?

It would be great. Once our tests are integrated with the kibana second quality gate, any failure on any test will block a deployment.

Do we need to do it for all tests or we can do it only for those that can be considered potentially flaky?

Ideally, any test, starting with those considered potentially flaky would be great! We can of course discuss different strategies to achieve the same point.

Is it a blocker for enabling the 2nd quality gate or a nice to have thing?

The main blocker for enabling the tests on the 2nd quality gate is the number of flaky tests we have been facing. This is a strategy to minimize the risk of failure which I don't like because I agree with you that sometimes does not match with common best practices as the execution time.

If we agree that we need it, can we do it later? What would be a deadline, if any?

We need to make sure our tests are stable on MKI environment, this is just a task that might help with it.

I'm sceptical about the proposed change, because it doesn't match common testing best practices. In any kind of automated tests, running some common cleanup or setup logic in before and after hooks is a normal thing to do. That way you don't run it many times as opposed to running it in beforeEach and afterEach. Also, I'm not sure that by doing this we'd necessarily make Cypress tests more stable in MKI or in general.

Maybe, it would be enough (and better) to add retry logic to a select subset of functions that we consider prone to flakiness, as it was done in #173998 and #176316.

I understand your point. Cypress has a built-in retry, if the test fails one time it executes the test again. The main issue of that built-in mode is that if the failure happens in a before or after hook, the retry is not triggered and the test is marked as failed inmediately.

hop-dev · 2024-02-14T16:18:15Z

Or at least make sure that the code executed in the before and after hook is not prone to fail (i.e. es_archiver).

@MadameSheema just to confirm, are you saying that es_archiver calls are not prone to fail? (and therefore OK in before/after calls?)

MadameSheema · 2024-02-14T16:25:27Z

@MadameSheema just to confirm, are you saying that es_archiver calls are not prone to fail? (and therefore OK in before/after calls?)

Yup :)

…before` and `after` calls in cypress tests (#177175) ## Summary A couple of small tweaks to our cypress tests to move any non-archiver code out of `before` and `after` hooks and into `beforeEach`. Flaky test run serverless and ess https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5219 ✅ ✅ See #174247 for context Co-authored-by: Kibana Machine <[email protected]>

…before` and `after` calls in cypress tests (elastic#177175) ## Summary A couple of small tweaks to our cypress tests to move any non-archiver code out of `before` and `after` hooks and into `beforeEach`. Flaky test run serverless and ess https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5219 ✅ ✅ See elastic#174247 for context Co-authored-by: Kibana Machine <[email protected]>

MadameSheema added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Jan 4, 2024

MindyRS changed the title ~~[Security Solution] Guarantee Cypress retrievability~~ [Security Solution] Guarantee Cypress Reliability Jan 18, 2024

MindyRS changed the title ~~[Security Solution] Guarantee Cypress Reliability~~ [Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries Jan 18, 2024

banderror mentioned this issue Feb 12, 2024

[Security Solution] Detection Engine Test Automation and Coverage #153633

Open

hop-dev mentioned this issue Feb 19, 2024

[Entity Analytics] Remove everything except esarchiver calls from before and after calls in cypress tests #177175

Merged

MadameSheema closed this as completed Apr 8, 2024

MadameSheema reopened this Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries #174247

[Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries #174247

MadameSheema commented Jan 4, 2024

elasticmachine commented Jan 4, 2024

banderror commented Feb 12, 2024

MadameSheema commented Feb 12, 2024

hop-dev commented Feb 14, 2024

MadameSheema commented Feb 14, 2024

[Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries #174247

[Security Solution] Guarantee Cypress Reliability: Stabilizing Using Clean Retries #174247

Comments

MadameSheema commented Jan 4, 2024

elasticmachine commented Jan 4, 2024

banderror commented Feb 12, 2024

MadameSheema commented Feb 12, 2024

hop-dev commented Feb 14, 2024

MadameSheema commented Feb 14, 2024