-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UX: Resolving PD/SN issues on alert recovery - Customer feedback #89166
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
I seem to remember discussing this at some point, as a generic toggle for every alert "also run the action when the alert instance recovers?" Kinda funny, but to some extent, having the lack of context variables available for the recovered action group means there's less of a need for customers to customize the action variables. So - does that mean we could construct a valid set of parameters for the recovered action group for every action, based on the parameters for a non-recovered action group action? For server log and slack, we can use a message string we construct ourselves. For email, we could copy the to/cc/bcc and maybe subject (to help with message threading in an email client?) and use a message string we construct ourselves. Hard to imagine how the index and webhook actions could work. The issue tracking ones seem possible, but would need their own "rules" as to how to compute the values. Seems like such a thing might be possible, would require changing the shape of the actions we store in the alerts, to indicate they also handle recover - we can't just add normal recover action group actions, because we would need to be able to recognize the difference between an actual recover action group, and one of these "automatic" ones. |
We have a similar issue opened on the Maps team request #84174, where they have a bad UX when requires from the user to create the action first and then decide when it should run. |
I think the maps-related issue #84174, referenced above, is slightly different, in that the issue seems to indicate we should be prompting for which action group to add actions to, instead of adding an action and picking the action group, if there are > 1 non-recovered action groups in the alert-type. I think the goal of this one is that once a user gets into the actual action (either picking an action for an alert with 1 non-recovered action group, or picking the action group first and then picking the action) somewhere in that action form, we'd allow some kind of toggle to "also use this action for the recovered action group" (if the action type supports that). There may be some other interaction there that I'm missing ... |
Yeah, something like that. From the feedback I recall, the solution the customer expected was a toggle/checkbox to say auto-resolve PagerDuty incident. Otherwise, they said it's too easy to forget the extra configuration, and they would have to communicate that to each user how to do so. |
Update: We have received consistent feedback from two customers around the alert recovery notification UX. The feedback was received in the context of PagerDuty but we may want to consider this more broadly. Both customers requested that they should not have to add a separate action group "run when the alert recovers". According to their feedback, the current experience is prone to errors and it should be simpler. In addition they offered the following suggestions:
There is now a related issue: #91583. The same issue addresses the "Do we need the summary field to be required when resolving an incident on recovery?" question. Apparently, we do not need it all. cc @YulNaumenko |
I think the checkbox (toggle/checkbox to say auto-resolve PagerDuty incident) makes sense. I will try and mock this up today/tomorrow. |
Here's a mockup (2nd version after realizing I posted to the wrong issue). I'm also reformatting the form a bit to hide the optional fields. It's possible that we need a question mark tooltip next to the button with some more explanation. |
(just for the record and visibility: some of the questions and discussion that led to this path) |
@arisonl Is this still an issue for PagerDuty? If this is just an issue for the SN connector, we will transfer to the Cases team. |
@ymao1 I think that this is customer feedback that should inform future iterations and UX decisions. Is the action UX on the flyout a Cases team responsibility? |
@arisonl, No the general actions UX is still us :) Thanks for looking. We can keep this open. |
Context: the new feature of resolving issues on PagerDuty:
Customers would like to be able to set this behaviour as the default one, without having to create a separate action group "run when".
cc @mdefazio @mikecote
The text was updated successfully, but these errors were encountered: