Flaky alert assignment tests #176930

e40pud · 2024-02-14T15:58:01Z

Summary

Addresses:

Failing test: Security Solution Cypress.x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments_serverless_essentials·cy·ts - Alert user assignment - Serverless Essentials Authorization / RBAC "before each" hook for "users with editing privileges should be able to update assignees" "before each" hook for "users with editing privileges should be able to update assignees" #176529
Failing test: Security Solution Cypress.x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments_serverless_complete·cy·ts - Alert user assignment - Serverless Complete Authorization / RBAC users with editing privileges should be able to update assignees users with editing privileges should be able to update assignees #172557
Failing test: Security Solution Cypress.x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments_serverless_complete·cy·ts - Alert user assignment - Serverless Complete Authorization / RBAC "before each" hook for "users with editing privileges should be able to update assignees" "before each" hook for "users with editing privileges should be able to update assignees" #177573
Failing test: Security Solution Cypress.x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments·cy·ts - Alert user assignment - ESS & Serverless Basic rendering "before each" hook for "alert with no assignees in alerts table" "before each" hook for "alert with no assignees in alerts table" #173429

Fix flaky alert assignments tests. I split assignments tests into two groups: tests with one assignee available and tests with multiple assignees.

Right now there is a flakiness in tests with multiple assignees. Most probably it is happening because we do multiple login calls in a row to make sure we activate different users to make them available for assignments:

// Login into accounts so that they got activated and visible in user profiles list
       login(ROLES.t1_analyst);
       login(ROLES.t2_analyst);
       login(ROLES.t3_analyst);
       login(ROLES.soc_manager);
       login(ROLES.detections_admin);
       login(ROLES.platform_engineer);

These tests are tend to be flaky and it is possible that kibana operations team will skip those. To make sure that we run basic cypress verification of alert assignments feature we decided to add tests with only one assignee available (current user) which allows us to avoid multiple consecutive login calls.

Also, as part of these changes I removed unnecessary logins and un-skipped #176529

NOTE

After discussing these failure with the team, we decided to remove tests which are covered by the integration and unit tests. While fixing the flakiness we realised that we do unnecessary work trying to fight the internal errors within elastic search on serverless when we do multiple user logins in a row. Instead we will rely on:

integration tests coverage of API related functionality including RBAC
unit tests coverage of all assignments UI components
cypress tests coverage of basic UI interaction with the alert assignments with only one user available for the assignments

cc @yctercero

Checklist

Delete any items that are not applicable to this PR.

Flaky Test Runner was used on any tests changed
- ESS 50 times
- Serverless 97 times

e40pud · 2024-02-14T15:58:08Z

/ci

elasticmachine · 2024-02-15T08:41:38Z

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine · 2024-02-15T08:41:40Z

Pinging @elastic/security-detection-engine (Team:Detection Engine)

e40pud · 2024-02-16T13:06:42Z

@elasticmachine merge upstream

e40pud · 2024-02-19T09:36:58Z

@elasticmachine merge upstream

rylnd

I had a lot of thoughts here, sorry 😅 .

I think the only needed change here is removal of the redundant single-user tests; everything else is a step in the right direction.

I was looking for examples of where these tests failed, to validate the multi-user hypothesis and see if there wasn't some information in the error/failure. However, the original skip PR linked here only shows that ES threw a 503 during the test, and the flaky test runner seemingly only had timeouts and not any legitimate failures.

If there are particular error messages that we're basing this PR on, it would be great to call those out both in this PR and the "skipped test" issue, for posterity.

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts

rylnd · 2024-02-20T23:28:24Z

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts

@@ -77,42 +67,23 @@ describe('Alert user assignment - ESS & Serverless', { tags: ['@ess', '@serverle
    });

    it('alert with some assignees in alerts table', () => {
-      const users = [ROLES.detections_admin, ROLES.t1_analyst];
+      const users = [getDefaultUserName()];
      updateAssigneesForFirstAlert(users);
      alertsTableShowsAssigneesForAlert(users);
    });

    it(`alert with some assignees in alert's details flyout`, () => {


Broader question outside the scope of this particular PR: why is this test not part of the one above it? The script appears to be:

login

create rule, wait for alerts

assign alert to user

make assertions about assignment

Other than violating the "one assertion per test" rule (which I don't believe is relevant to cypress), is there a reason for not consolidating these? I can imagine that having more tests seems like it would make the suite more robust, but given the amount of work that happens before the assertions (that then gets repeated in every independent test), I believe the opposite is true.

Scrolling further it seems as though both of these tests are now redundant with Updating assignees (single alert) adding new assignees via 'More actions' in alerts table; that one tests everything the two of these do.

Agree! I will walk through tests and will consolidate where it is possible.

rylnd · 2024-02-20T23:32:09Z

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts

      updateAssigneesForFirstAlert(users);
      alertsTableShowsAssigneesForAlert(users);
    });

    it(`alert with some assignees in alert's details flyout`, () => {
-      const users = [ROLES.detections_admin, ROLES.t1_analyst];
+      const users = [getDefaultUserName()];
      updateAssigneesForFirstAlert(users);
      expandFirstAlert();
      alertDetailsFlyoutShowsAssignees(users);


Nit: it would be nice to have some convention (maybe either function naming (assert as a prefix), or folder location (/assertions), or both) to identify these tasks as performing assertions.

I know that for a while @MadameSheema was requesting that we not abstract assertions into helpers at all, but I think if we do so we should make those assertions a bit more discoverable.

@MadameSheema any thought/preferences on this one?

Hey!! the overall preference is to NOT abstract assertions.

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts

e40pud · 2024-02-23T13:24:47Z

@elasticmachine merge upstream

e40pud · 2024-02-23T13:48:34Z

Thank you for the review @rylnd!!

I think the only needed change here is removal of the redundant single-user tests; everything else is a step in the right direction.

I agree with redundancy of update assignee in single user case and will update/remove unnecessary test cases.

I was looking for examples of where these tests failed, to validate the multi-user hypothesis and see if there wasn't some information in the error/failure. However, the original skip PR linked here only shows that ES threw a 503 during the test, and the flaky test runner seemingly only had timeouts and not any legitimate failures.

If there are particular error messages that we're basing this PR on, it would be great to call those out both in this PR and the "skipped test" issue, for posterity.

Yes, the 503 error is what casing the issue. It happens within beforeEach block on deletion of indices and lists. We do exactly the same steps as in all other tests except in our case we do multiple login calls to activate multiple accounts. That's why this is the only reason I can think of that could cause that internal error. While I will be investigating that issue further, I would like to have at least some stable tests covering assignments functionality.

e40pud · 2024-02-23T16:35:08Z

@elasticmachine merge upstream

rylnd

Thank you for the changes and helpful responses. LGTM, let's get these back online and hopefully avoid those 503s in the future 👍 .

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts

e40pud · 2024-03-05T10:12:54Z

@elasticmachine merge upstream

MadameSheema · 2024-03-05T17:26:50Z

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts

-    it('alert with some assignees in alerts table', () => {
-      const users = [ROLES.detections_admin, ROLES.t1_analyst];
+    it('alert with some assignees in alerts table & details flyout', () => {
+      const users = [getDefaultUserName()];


NIT: Change all the users constants to user since we only have one.

MadameSheema

Security Engineering Productivity changes LGTM!! Lots of thanks for addressing the flakiness! :)

NIT: Change all the users constants to user since we only have one.

Doubt: Is there any impact from the functional point of view on doing the testing with just one user instead of more?

Thanks!

rylnd · 2024-03-05T20:00:24Z

@MadameSheema we discussed the idea of testing one user vs multiple here, and the potential loss of coverage, and it was argued that having:

a cypress test to verify that a single user assignment propagates correctly to the UI
a cypress test to verify that multiple user assignments propagate correctly to the UI
multiple jest tests to test component behavior for the myriad users/roles assigned

would provide the same coverage as all the previous cypress tests, which much less cost/downside. Do you agree with that?

# Conflicts: # x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments_serverless_complete.cy.ts

kibana-ci · 2024-03-06T11:13:46Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 757591d

Failed CI Steps

Defend Workflows Cypress Tests on Serverless #10

Test Failures

[job] [logs] Defend Workflows Cypress Tests on Serverless #10 / User Roles for Security Complete PLI with Endpoint Complete addon for role: endpoint_operations_analyst should have access to response action: processes should have access to response action: processes

Metrics [docs]

✅ unchanged

History

💛 Build #195940 was flaky 161649f
💛 Build #195846 was flaky 915ea6f
💛 Build #195319 was flaky c098902
💔 Build #195269 failed 093e8bb
💚 Build #194100 succeeded 0f6d112
💚 Build #194013 succeeded 432f4de

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @e40pud

Flaky alert assignment tests

d1d2f4c

e40pud added release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Engine Security Solution Detection Engine Area labels Feb 14, 2024

e40pud self-assigned this Feb 14, 2024

e40pud requested review from yctercero and MadameSheema February 15, 2024 08:40

e40pud marked this pull request as ready for review February 15, 2024 08:41

e40pud requested review from a team as code owners February 15, 2024 08:41

Merge branch 'main' into security/tests/flaky-alert-assignments

432f4de

yctercero requested review from rylnd and removed request for yctercero February 17, 2024 06:09

Merge branch 'main' into security/tests/flaky-alert-assignments

0f6d112

rylnd reviewed Feb 20, 2024

View reviewed changes

Merge branch 'main' into security/tests/flaky-alert-assignments

a9f8b2f

Review feedback

093e8bb

e40pud requested a review from rylnd February 23, 2024 14:33

Merge branch 'main' into security/tests/flaky-alert-assignments

c098902

rylnd approved these changes Feb 23, 2024

View reviewed changes

...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts Show resolved Hide resolved

Merge branch 'main' into security/tests/flaky-alert-assignments

915ea6f

Remove redundant cypress tests

161649f

MadameSheema reviewed Mar 5, 2024

View reviewed changes

MadameSheema approved these changes Mar 5, 2024

View reviewed changes

Merge branch 'main' into security/tests/flaky-alert-assignments

757591d

# Conflicts: # x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments_serverless_complete.cy.ts

e40pud merged commit 31cd917 into elastic:main Mar 6, 2024
36 checks passed

kibanamachine added v8.14.0 backport:skip This commit does not require backporting labels Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky alert assignment tests #176930

Flaky alert assignment tests #176930

e40pud commented Feb 14, 2024 •

edited

Loading

e40pud commented Feb 14, 2024

elasticmachine commented Feb 15, 2024

elasticmachine commented Feb 15, 2024

e40pud commented Feb 16, 2024

e40pud commented Feb 19, 2024

rylnd left a comment

rylnd Feb 20, 2024

rylnd Feb 20, 2024

e40pud Feb 23, 2024

rylnd Feb 20, 2024

e40pud Feb 23, 2024

MadameSheema Mar 5, 2024

e40pud commented Feb 23, 2024

e40pud commented Feb 23, 2024

e40pud commented Feb 23, 2024

rylnd left a comment

e40pud commented Mar 5, 2024

MadameSheema Mar 5, 2024

MadameSheema left a comment

rylnd commented Mar 5, 2024

kibana-ci commented Mar 6, 2024

Flaky alert assignment tests #176930

Flaky alert assignment tests #176930

Conversation

e40pud commented Feb 14, 2024 • edited Loading

Summary

NOTE

Checklist

e40pud commented Feb 14, 2024

elasticmachine commented Feb 15, 2024

elasticmachine commented Feb 15, 2024

e40pud commented Feb 16, 2024

e40pud commented Feb 19, 2024

rylnd left a comment

Choose a reason for hiding this comment

rylnd Feb 20, 2024

Choose a reason for hiding this comment

rylnd Feb 20, 2024

Choose a reason for hiding this comment

e40pud Feb 23, 2024

Choose a reason for hiding this comment

rylnd Feb 20, 2024

Choose a reason for hiding this comment

e40pud Feb 23, 2024

Choose a reason for hiding this comment

MadameSheema Mar 5, 2024

Choose a reason for hiding this comment

e40pud commented Feb 23, 2024

e40pud commented Feb 23, 2024

e40pud commented Feb 23, 2024

rylnd left a comment

Choose a reason for hiding this comment

e40pud commented Mar 5, 2024

MadameSheema Mar 5, 2024

Choose a reason for hiding this comment

MadameSheema left a comment

Choose a reason for hiding this comment

rylnd commented Mar 5, 2024

kibana-ci commented Mar 6, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

History

e40pud commented Feb 14, 2024 •

edited

Loading