-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky alert assignment tests #176930
Flaky alert assignment tests #176930
Conversation
/ci |
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-detection-engine (Team:Detection Engine) |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a lot of thoughts here, sorry 😅 .
I think the only needed change here is removal of the redundant single-user tests; everything else is a step in the right direction.
I was looking for examples of where these tests failed, to validate the multi-user hypothesis and see if there wasn't some information in the error/failure. However, the original skip PR linked here only shows that ES threw a 503 during the test, and the flaky test runner seemingly only had timeouts and not any legitimate failures.
If there are particular error messages that we're basing this PR on, it would be great to call those out both in this PR and the "skipped test" issue, for posterity.
...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts
Show resolved
Hide resolved
@@ -77,42 +67,23 @@ describe('Alert user assignment - ESS & Serverless', { tags: ['@ess', '@serverle | |||
}); | |||
|
|||
it('alert with some assignees in alerts table', () => { | |||
const users = [ROLES.detections_admin, ROLES.t1_analyst]; | |||
const users = [getDefaultUserName()]; | |||
updateAssigneesForFirstAlert(users); | |||
alertsTableShowsAssigneesForAlert(users); | |||
}); | |||
|
|||
it(`alert with some assignees in alert's details flyout`, () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broader question outside the scope of this particular PR: why is this test not part of the one above it? The script appears to be:
- login
- create rule, wait for alerts
- assign alert to user
- make assertions about assignment
Other than violating the "one assertion per test" rule (which I don't believe is relevant to cypress), is there a reason for not consolidating these? I can imagine that having more tests seems like it would make the suite more robust, but given the amount of work that happens before the assertions (that then gets repeated in every independent test), I believe the opposite is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scrolling further it seems as though both of these tests are now redundant with Updating assignees (single alert) adding new assignees via 'More actions' in alerts table
; that one tests everything the two of these do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree! I will walk through tests and will consolidate where it is possible.
updateAssigneesForFirstAlert(users); | ||
alertsTableShowsAssigneesForAlert(users); | ||
}); | ||
|
||
it(`alert with some assignees in alert's details flyout`, () => { | ||
const users = [ROLES.detections_admin, ROLES.t1_analyst]; | ||
const users = [getDefaultUserName()]; | ||
updateAssigneesForFirstAlert(users); | ||
expandFirstAlert(); | ||
alertDetailsFlyoutShowsAssignees(users); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it would be nice to have some convention (maybe either function naming (assert
as a prefix), or folder location (/assertions
), or both) to identify these tasks as performing assertions.
I know that for a while @MadameSheema was requesting that we not abstract assertions into helpers at all, but I think if we do so we should make those assertions a bit more discoverable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MadameSheema any thought/preferences on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey!! the overall preference is to NOT abstract assertions.
...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts
Outdated
Show resolved
Hide resolved
...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts
Show resolved
Hide resolved
@elasticmachine merge upstream |
Thank you for the review @rylnd!!
I agree with redundancy of
Yes, the 503 error is what casing the issue. It happens within beforeEach block on deletion of indices and lists. We do exactly the same steps as in all other tests except in our case we do multiple login calls to activate multiple accounts. That's why this is the only reason I can think of that could cause that internal error. While I will be investigating that issue further, I would like to have at least some stable tests covering assignments functionality. |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes and helpful responses. LGTM, let's get these back online and hopefully avoid those 503s in the future 👍 .
...press/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments.cy.ts
Show resolved
Hide resolved
@elasticmachine merge upstream |
it('alert with some assignees in alerts table', () => { | ||
const users = [ROLES.detections_admin, ROLES.t1_analyst]; | ||
it('alert with some assignees in alerts table & details flyout', () => { | ||
const users = [getDefaultUserName()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Change all the users
constants to user
since we only have one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security Engineering Productivity changes LGTM!! Lots of thanks for addressing the flakiness! :)
NIT: Change all the users constants to user since we only have one.
Doubt: Is there any impact from the functional point of view on doing the testing with just one user instead of more?
Thanks!
@MadameSheema we discussed the idea of testing one user vs multiple here, and the potential loss of coverage, and it was argued that having:
would provide the same coverage as all the previous cypress tests, which much less cost/downside. Do you agree with that? |
# Conflicts: # x-pack/test/security_solution_cypress/cypress/e2e/detection_response/detection_engine/detection_alerts/assignments/assignments_serverless_complete.cy.ts
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @e40pud |
Summary
Addresses:
Fix flaky alert assignments tests. I split assignments tests into two groups: tests with one assignee available and tests with multiple assignees.
Right now there is a flakiness in tests with multiple assignees. Most probably it is happening because we do multiple login calls in a row to make sure we activate different users to make them available for assignments:
These tests are tend to be flaky and it is possible that kibana operations team will skip those. To make sure that we run basic cypress verification of alert assignments feature we decided to add tests with only one assignee available (current user) which allows us to avoid multiple consecutive login calls.
Also, as part of these changes I removed unnecessary logins and un-skipped #176529
NOTE
After discussing these failure with the team, we decided to remove tests which are covered by the integration and unit tests. While fixing the flakiness we realised that we do unnecessary work trying to fight the internal errors within elastic search on serverless when we do multiple user logins in a row. Instead we will rely on:
cc @yctercero
Checklist
Delete any items that are not applicable to this PR.