-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RAM][Maintenance Window][8.8]Fix window maintenance workflow #156427
[RAM][Maintenance Window][8.8]Fix window maintenance workflow #156427
Conversation
}": has active maintenance windows ${alert.getMaintenanceWindowIds()}.` | ||
); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this check into generateExecutables
method please?
We create executables (action + alert or summarizedAlerts)
array in there and iterate through them and schedule here. We skip them here only for rate limiting, so we can log how many actions were created and how may of them have been skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to just after for (const [alertId, alert] of alertsArray) {
should be enough.
Note: AFAIU you want to send notifications for recovered alerts if the alert was created before the maintenance window. In order to have that check, you may need to add if(!this.isRecoveredAlert(action.group))
condition as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this check into generateExecutables method please?
We decided to have the code here because we still want to count up these actions as generated actions
via ruleRunMetricsStore.incrementNumberOfGeneratedActionsByConnectorType(actionTypeId);
. We can move it if we don't care about this count, @XavierM what do you think?
AFAIU you want to send notifications for recovered alerts if the alert was created before the maintenance window. In order to have that check, you may need to add if(!this.isRecoveredAlert(action.group)) condition as well.
We make this determination using the maintenance_window_ids
field, so we don't need to check isRecoveredAlert
, any alerts marked with a maintenance window id will never send notifications even if it recovered after an active window. New active alerts that originated before a window, but recovered during a window, that recovered alert will send a notification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way we did it here because we have a nice story to our user in our logs We talked about it internally and we decided that for now, we will follow @ersin-erdal 's advice and we will add new metrics in the future for snooze and maintenance window. So we can show this indicator to our user.
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
@@ -49,6 +50,9 @@ const augmentAlerts = <T>({ | |||
[ALERT_START]: currentTimeOverride ?? new Date(), | |||
[ALERT_LAST_DETECTED]: currentTimeOverride ?? new Date(), | |||
[VERSION]: kibanaVersion, | |||
...(options?.maintenanceWindowIds?.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are persisted alerts, we do not need to update previous alert instances like we need to for AAD. We just need to make sure they reflect the current active maintenance windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments.
LGTM, Tested locally, works as expected.
@@ -520,6 +525,16 @@ export class ExecutionHandler< | |||
if (alert.isFilteredOut(summarizedAlerts)) { | |||
continue; | |||
} | |||
|
|||
if (alert && alert.getMaintenanceWindowIds().length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alert &&
is not needed. alert is always there.
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Page load bundle
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
…c#156427) ## Summary The way that we canceled every notification for our alert life cycle during an active maintenance window was not close enough to what our customers were expecting. For our persisted security solution alerts, we did not have to change the logic because it will always be a new alert. Therefore, @shanisagiv1, @mdefazio, @JiaweiWu, and @XavierM had a discussion about this problem and we decided this: To summarize, we will only keep the notification during a maintenance window if an alert has been created/active outside of window maintenance. We created three different scenarios to explain the new logic and we will make the assumption that our alert has an action per status change. For you to understand the different scenarios, I created this legend below: <img width="223" alt="image" src="https://user-images.githubusercontent.com/189600/236045974-f4fa379b-db5e-41f8-91a8-2689b9f24dab.png"> ### Scenario I If an alert is active/created before a maintenance window and recovered inside of the maintenance window then we will send notifications <img width="463" alt="image" src="https://user-images.githubusercontent.com/189600/236046473-d04df836-d3e6-42d8-97be-8b4f1544cc1a.png"> ### Scenario II If an alert is active/created and recovered inside of window maintenance then we will NOT send notifications <img width="407" alt="image" src="https://user-images.githubusercontent.com/189600/236046913-c2f77131-9ff1-4864-9dab-89c4c429152e.png"> ### Scenario III if an alert is active/created in a maintenance window and recovered outside of the maintenance window then we will not send notifications <img width="496" alt="image" src="https://user-images.githubusercontent.com/189600/236047613-e63efe52-87fa-419e-9e0e-965b1d10ae18.png"> ### Checklist - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Xavier Mouligneau <[email protected]> Co-authored-by: Kibana Machine <[email protected]> (cherry picked from commit ea40798)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…156427) (#156770) # Backport This will backport the following commits from `main` to `8.8`: - [[RAM][Maintenance Window][8.8]Fix window maintenance workflow (#156427)](#156427) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Jiawei Wu","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-05-05T00:11:26Z","message":"[RAM][Maintenance Window][8.8]Fix window maintenance workflow (#156427)\n\n## Summary\r\n\r\nThe way that we canceled every notification for our alert life cycle\r\nduring an active maintenance window was not close enough to what our\r\ncustomers were expecting. For our persisted security solution alerts, we\r\ndid not have to change the logic because it will always be a new alert.\r\nTherefore, @shanisagiv1, @mdefazio, @JiaweiWu, and @XavierM had a\r\ndiscussion about this problem and we decided this:\r\n\r\nTo summarize, we will only keep the notification during a maintenance\r\nwindow if an alert has been created/active outside of window\r\nmaintenance. We created three different scenarios to explain the new\r\nlogic and we will make the assumption that our alert has an action per\r\nstatus change. For you to understand the different scenarios, I created\r\nthis legend below:\r\n<img width=\"223\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236045974-f4fa379b-db5e-41f8-91a8-2689b9f24dab.png\">\r\n\r\n### Scenario I\r\nIf an alert is active/created before a maintenance window and recovered\r\ninside of the maintenance window then we will send notifications\r\n<img width=\"463\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236046473-d04df836-d3e6-42d8-97be-8b4f1544cc1a.png\">\r\n\r\n### Scenario II\r\nIf an alert is active/created and recovered inside of window maintenance\r\nthen we will NOT send notifications\r\n<img width=\"407\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236046913-c2f77131-9ff1-4864-9dab-89c4c429152e.png\">\r\n\r\n### Scenario III\r\nif an alert is active/created in a maintenance window and recovered\r\noutside of the maintenance window then we will not send notifications\r\n<img width=\"496\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236047613-e63efe52-87fa-419e-9e0e-965b1d10ae18.png\">\r\n\r\n\r\n### Checklist\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Xavier Mouligneau <[email protected]>\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"ea407983bbd6a364f23f6780ff1049f679f53488","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","backport","release_note:skip","Team:ResponseOps","Feature:Alerting/RulesManagement","v8.8.0","v8.9.0"],"number":156427,"url":"https://github.com/elastic/kibana/pull/156427","mergeCommit":{"message":"[RAM][Maintenance Window][8.8]Fix window maintenance workflow (#156427)\n\n## Summary\r\n\r\nThe way that we canceled every notification for our alert life cycle\r\nduring an active maintenance window was not close enough to what our\r\ncustomers were expecting. For our persisted security solution alerts, we\r\ndid not have to change the logic because it will always be a new alert.\r\nTherefore, @shanisagiv1, @mdefazio, @JiaweiWu, and @XavierM had a\r\ndiscussion about this problem and we decided this:\r\n\r\nTo summarize, we will only keep the notification during a maintenance\r\nwindow if an alert has been created/active outside of window\r\nmaintenance. We created three different scenarios to explain the new\r\nlogic and we will make the assumption that our alert has an action per\r\nstatus change. For you to understand the different scenarios, I created\r\nthis legend below:\r\n<img width=\"223\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236045974-f4fa379b-db5e-41f8-91a8-2689b9f24dab.png\">\r\n\r\n### Scenario I\r\nIf an alert is active/created before a maintenance window and recovered\r\ninside of the maintenance window then we will send notifications\r\n<img width=\"463\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236046473-d04df836-d3e6-42d8-97be-8b4f1544cc1a.png\">\r\n\r\n### Scenario II\r\nIf an alert is active/created and recovered inside of window maintenance\r\nthen we will NOT send notifications\r\n<img width=\"407\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236046913-c2f77131-9ff1-4864-9dab-89c4c429152e.png\">\r\n\r\n### Scenario III\r\nif an alert is active/created in a maintenance window and recovered\r\noutside of the maintenance window then we will not send notifications\r\n<img width=\"496\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236047613-e63efe52-87fa-419e-9e0e-965b1d10ae18.png\">\r\n\r\n\r\n### Checklist\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Xavier Mouligneau <[email protected]>\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"ea407983bbd6a364f23f6780ff1049f679f53488"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/156427","number":156427,"mergeCommit":{"message":"[RAM][Maintenance Window][8.8]Fix window maintenance workflow (#156427)\n\n## Summary\r\n\r\nThe way that we canceled every notification for our alert life cycle\r\nduring an active maintenance window was not close enough to what our\r\ncustomers were expecting. For our persisted security solution alerts, we\r\ndid not have to change the logic because it will always be a new alert.\r\nTherefore, @shanisagiv1, @mdefazio, @JiaweiWu, and @XavierM had a\r\ndiscussion about this problem and we decided this:\r\n\r\nTo summarize, we will only keep the notification during a maintenance\r\nwindow if an alert has been created/active outside of window\r\nmaintenance. We created three different scenarios to explain the new\r\nlogic and we will make the assumption that our alert has an action per\r\nstatus change. For you to understand the different scenarios, I created\r\nthis legend below:\r\n<img width=\"223\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236045974-f4fa379b-db5e-41f8-91a8-2689b9f24dab.png\">\r\n\r\n### Scenario I\r\nIf an alert is active/created before a maintenance window and recovered\r\ninside of the maintenance window then we will send notifications\r\n<img width=\"463\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236046473-d04df836-d3e6-42d8-97be-8b4f1544cc1a.png\">\r\n\r\n### Scenario II\r\nIf an alert is active/created and recovered inside of window maintenance\r\nthen we will NOT send notifications\r\n<img width=\"407\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236046913-c2f77131-9ff1-4864-9dab-89c4c429152e.png\">\r\n\r\n### Scenario III\r\nif an alert is active/created in a maintenance window and recovered\r\noutside of the maintenance window then we will not send notifications\r\n<img width=\"496\" alt=\"image\"\r\nsrc=\"https://user-images.githubusercontent.com/189600/236047613-e63efe52-87fa-419e-9e0e-965b1d10ae18.png\">\r\n\r\n\r\n### Checklist\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Xavier Mouligneau <[email protected]>\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"ea407983bbd6a364f23f6780ff1049f679f53488"}}]}] BACKPORT--> Co-authored-by: Jiawei Wu <[email protected]>
* main: (153 commits) [Security Solution] {{state.signals_count}} Object not working (#156472) (#156707) [Synthetics] refresh data on visualization scrubbing (#156777) [RAM] Docs for slack improvements (#153885) [RAM] Alert search bar only KQL (#155947) [ML] Functional tests - stabilize export job tests (#156586) [Saved Search] Update saved search schema to allow empty `sort` arrays (#156769) [ML] Rename `curated` model type to `elastic` (#156684) [Discover] Enable sharing for text based languages (#156652) [api-docs] 2023-05-05 Daily api_docs build (#156781) Upgrade EUI to v77.2.2 (#155208) [RAM][Maintenance Window][8.8]Fix window maintenance workflow (#156427) [DOCS] Case file attachments (#156459) [D4C] additional error handling for 'block' action added + policy editor UI fixes (#156629) [Enterprise Search] refactor(SearchApplications): rename telemetry ids (#156733) [Enterprise Search] Add telemetry to ELSER deployment buttons + error (#156545) [Security Solution] fixes Data Quality dashboard errors when a `basePath` is configured (#156233) [Logs onboarding] StepsFooter outside of main panel (#156686) [Security Solution] Add a migration to unmute custom Security Solution rules (#156593) [Enterprise Search][Behavioral Analytics] Update formulas (#156704) Add API Events to Endpoint Security Advanced Policy (#156718) ...
Summary
The way that we canceled every notification for our alert life cycle during an active maintenance window was not close enough to what our customers were expecting. For our persisted security solution alerts, we did not have to change the logic because it will always be a new alert. Therefore, @shanisagiv1, @mdefazio, @JiaweiWu, and @XavierM had a discussion about this problem and we decided this:
To summarize, we will only keep the notification during a maintenance window if an alert has been created/active outside of window maintenance. We created three different scenarios to explain the new logic and we will make the assumption that our alert has an action per status change. For you to understand the different scenarios, I created this legend below:
Scenario I
If an alert is active/created before a maintenance window and recovered inside of the maintenance window then we will send notifications
Scenario II
If an alert is active/created and recovered inside of window maintenance then we will NOT send notifications
Scenario III
if an alert is active/created in a maintenance window and recovered outside of the maintenance window then we will not send notifications
Checklist