-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alert statuses #51099
Comments
I think those 4 are sufficient and having a small number is preferable.
Say for a CPU usage alert, if none of my metricbeat have send data 1 hour, and my alert is "when avg CPU is above 90% over the last 5 minutes" - there'd be no documents in elasticsearch and I would expect this to show "No Data" state. |
No Data sounds like:
Both are actually interesting, but we don't have a mechanism to allow an alert type to return a "No Data" condition as in Peter's definition, that I'm aware of. I'd say get rid of No Data for now, or change to something like has not run yet (Mike's definition). No Data sounds a bit confusing and vague to me. For the remaining, how do we determine these values - the last state when the alert function ran? It either threw an error (Error), ran but scheduled no actions (OK), or ran and scheduled actions (Active). Just the last state seen? If so, perhaps storing that in the alert itself would be appropriate. Presumably things like muted, *throttled show up in a separate column/icon/property indicating those states, so that kind of state isn't appropriate for this "status". |
If someone is authoring an alert, what do we expect them to do in the case where they don't have enough data to evaluate the condition? Throw an error? Return and treat it as normal? No data/missing data is a pretty common scenario and I think it's an important cue. Data often arrives late, and it's not really an If we don't treat it as a state here, we need account for it somewhere. I understand if we don't have a mechanism for it, but we could create one. It could be an expected type of error for example, thrown by an alert execution? |
One option I can see to add the mechanism to handle the "no data" scenario is to change the return structure of the alert type executor. Currently it returns something like this:
and we could change it to something like this:
Should be fairly straightforward to do and more future proof if ever we want to return more attributes than Other options instead of |
From how I see it, yes it would be based on the last execution / interval.
I think since we'll have a filter in the UI for statuses, it would make sense to store the status with the alert for searchability. After each execution, we would do an update on the alert document to update its status. |
re: the "no data" status It sounds like this could just be treated as an action group, for alert types that are sensitive to this. Eg, if they didn't have enough data, they'd schedule the action group "no-data", and could have whatever actions they wanted associated with that. That would at least make that state "actionable", but wouldn't give us the ability to have it show up as a "status" value, without any kind of existing API changes, such as what Mike suggested. If we end up making this part of the API signature, and alert status, feels like maybe "not enough data" is probably a better phrasing for this vs no data. Maybe something in the vein "inconclusive" or such ... |
Ya, what I was thinking. Hopefully we can piggy-back this on top of an existing update, like the scheduling of the next run. This also means we won't need the event log to determine that status ... |
Posting this question here instead of slack:
Perhaps just |
Also, is the warning level a status? |
I think the status would be
A disabled alert has no status - could it be blank? If we need a value to filter on then I think |
Repeating a comment from #58366 (comment): we should be able to filter alerts by their status if possible. |
One thing not mentioned yet is "alert instance status". It seems like an alert instance can have most of the status values of the alert itself, except perhaps "error", since "error" indicates the alert executor ran into some problem. Note this specifically includes "no data", as some alert types may know the possible domain of their instances, and be able to determine if an instance has not produced data. But not all alerts will be able to do this - index threshold for instance doesn't know the domain of the possible groupings it uses for it's instance ids. |
Just happened to think in a chat with Mike, we'll have the opportunity to "migrate" old alerts to contain data in this new I think we're only talking about the And it's not really important what's in the SO itself, but what we return from alertClient methods and http requests. So, do we want these to be optional? What a PITA that would be, when the only possible time they could be null is right after a migration, up until the alert function is executed for the first time after a migration. Thinking we can have another I don't think we will, looking at the current web ui. But that made me realize we probably want this new status field in the alerts table view: |
…d object resolves elastic#51099 This formalizes the concept of "alert status", in terms of it's execution, with some new fields in the alert saved object and types used with the alert client and http APIs. These fields are read-only from the client point-of-view; they are provided in the alert structures, but are only updated by the alerting framework itself. The values will be updated after each run of the alert type executor. interim commits: calculate the execution status, some refactoring write the execution status to the alert after execution use real date in execution status on create add an await to an async fn comment out status update to see if SIEM FT succeeds fix SIEM FT alert deletion issue use partial updates and retries in alerts clients to avoid conflicts fix jest tests clean up conflict-fixin code moar conflict-prevention fixing fix type error with find result add reasons to alert execution errors add some jest tests add some function tests fix status update to use alert namespace fix function test
…d object resolves elastic#51099 This formalizes the concept of "alert status", in terms of it's execution, with some new fields in the alert saved object and types used with the alert client and http APIs. These fields are read-only from the client point-of-view; they are provided in the alert structures, but are only updated by the alerting framework itself. The values will be updated after each run of the alert type executor. The data is added to the alert as the `executionStatus` field, with the following shape: ```ts interface AlertExecutionStatus { status: 'ok' | 'active' | 'error' | 'unknown'; date: Date; error?: { reason: 'read' | 'decrypt' | 'execute' | 'unknown'; message: string; }; } ``` interim commits: calculate the execution status, some refactoring write the execution status to the alert after execution use real date in execution status on create add an await to an async fn comment out status update to see if SIEM FT succeeds fix SIEM FT alert deletion issue use partial updates and retries in alerts clients to avoid conflicts fix jest tests clean up conflict-fixin code moar conflict-prevention fixing fix type error with find result add reasons to alert execution errors add some jest tests add some function tests fix status update to use alert namespace fix function test finish function tests more fixes after rebase fix type checks and jest tests after rebase add migration and find functional tests fix relative import
…d object (#75553) resolves #51099 This formalizes the concept of "alert status", in terms of it's execution, with some new fields in the alert saved object and types used with the alert client and http APIs. These fields are read-only from the client point-of-view; they are provided in the alert structures, but are only updated by the alerting framework itself. The values will be updated after each run of the alert type executor. The data is added to the alert as the `executionStatus` field, with the following shape: ```ts interface AlertExecutionStatus { status: 'ok' | 'active' | 'error' | 'pending' | 'unknown'; lastExecutionDate: Date; error?: { reason: 'read' | 'decrypt' | 'execute' | 'unknown'; message: string; }; } ```
…d object (elastic#75553) resolves elastic#51099 This formalizes the concept of "alert status", in terms of it's execution, with some new fields in the alert saved object and types used with the alert client and http APIs. These fields are read-only from the client point-of-view; they are provided in the alert structures, but are only updated by the alerting framework itself. The values will be updated after each run of the alert type executor. The data is added to the alert as the `executionStatus` field, with the following shape: ```ts interface AlertExecutionStatus { status: 'ok' | 'active' | 'error' | 'pending' | 'unknown'; lastExecutionDate: Date; error?: { reason: 'read' | 'decrypt' | 'execute' | 'unknown'; message: string; }; } ```
…d object (#75553) (#79227) resolves #51099 This formalizes the concept of "alert status", in terms of it's execution, with some new fields in the alert saved object and types used with the alert client and http APIs. These fields are read-only from the client point-of-view; they are provided in the alert structures, but are only updated by the alerting framework itself. The values will be updated after each run of the alert type executor. The data is added to the alert as the `executionStatus` field, with the following shape: ```ts interface AlertExecutionStatus { status: 'ok' | 'active' | 'error' | 'pending' | 'unknown'; lastExecutionDate: Date; error?: { reason: 'read' | 'decrypt' | 'execute' | 'unknown'; message: string; }; } ```
To enrich the user experience within the alerts table (under Kibana management section), we should display the status for each alert.
To make sure we're on the same page on what alert statuses we should have, I've opened this issue for discussion. The UI would display them as a column within the alerts table and there would be a filter for the status. The statuses would be calculated on read based on the result of a few queries (activity log, alert instances, etc).
As a starting point, the mockups contain four potential statuses:
Is there any proposal for different statuses?
cc @elastic/kibana-stack-services @alexfrancoeur @peterschretlen
The text was updated successfully, but these errors were encountered: