-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerts don't group their incidents by a dedupKey / Object ID #77772
Comments
The default seems very likely to be context dependent on the customer. It does seem like it would make sense that if a PD action was added to a "resolved" action group, and a dedup key was provided (I believe it's required when using the resolve action), that a PD action used in an "active" action group should use the same context variable as the "resolved" one. You'd almost want a validation of that. But of course, "it depends". You could certainly come up with some use case where a customer would want something different. It feels to me like we'll need some special doc on this in the PD action - and if the other incident management actions (servicenow, resilient, jira) have similar sorts of "capabilities", we'd need it there as well. Where we can explain these flows. Also, PD provides customization of some of this stuff on their end, regarding what happens to incidents posted with a dedupkey that have already been resolved - open it again, just append the incident but leave it resolved, etc. It feels like setting the dedupkey to the narrowest possible value would be the "safest" thing - ie, the alert instance. Otherwise, incidents posted using the alert id, and then resolved by that same alert id, are going to end up resolving things that perhaps shouldn't be resolved. Again, "it depends". |
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
I'd phrase this differently and say that it's dependent on the context of the alert itself, and not so much on the customer (intended as the infrastructure being monitored). The relevance of this is that the context is built from the data being monitored. Unless this is what you meant, of course. :D
I started writing a bunch of text explaining a bit more about Riemann internals and how we do things but ended up writing something way more verbose and complex than what you are probably after here. So, here are a few examples of deduplication keys that we create, based on the event data. All the deduplication keys are computed at runtime. This is to say, we don't have any hardcoded deduplication keys as they all depend on the event's data and the monitoring logic.
All the above data exists in the event already, which is available in the context of the alert logic. This data would be indexed in the cluster if we were using Kibana Alert. The more data we can make available in the context of the alert, the better. Having access to the data in the records being queried would be key.
In our case, a new incident is always opened if there is no incident opened that has the same deduplication key.
If a default will be provided, then I think using the narrowest option is correct. However, I would go as far as saying that perhaps a default is not really needed and a deduplication key should be required. I think this would remove surprises for users. Regardless, it is important to allow for the deduplication key to be customized. |
Narrowest option sounds right, if we have to choose a default. Is an |
++ on using the instance id. We have a roadmap item to allow summarizing the instances in a single action call (#68828) which would allow to create a single incident encapsulating all the instances and let the end user choose what they want. Also, they could change the default dedupKey / Object ID for now and get similar behaviour. |
There is no ObjectID field in Jira, ServiceNow or IBM Resilient. We have a service params for internal action execution needs which is called savedObjectId, but this field is not saved in the incidents itself. We can choose some existing Jira, ServiceNow and IBM Resilient fields to store this info, but it definitely will be a different fields. The purpose of this fields make sence if we are going to support the deduplication for this external services incidents. |
This is a follow up from #76908.
Each execution of the PagerDuty action is seen by the PagerDuty service as a unique incident.
To group incidents together a user can set the optional
dedupKey
parameter, but this means that the default behaviour is that multiple executions on the same alert will not be grouped.This will become especially painful when the Action On Resolve issue is addressed, as a user could drag two PagerDuty actions on the same alert (one to open an incident and one to resolve) and by default they will not be grouped (meaning resolution won't actually resolve the incident in Pager Duty).
We should group these by default (but allow users to override the default value), probably based on either the Alert ID or the AlertInstance ID (I can see arguments for both).
EDIT: The same applies to Jira, ServiceNow and IBM Resilient with the
Object ID
field. This issue should fix all 4 integrations.The text was updated successfully, but these errors were encountered: