Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Allow rule types to specify custom timeout values #111804

Closed
ymao1 opened this issue Sep 9, 2021 · 4 comments · Fixed by #113487
Closed

[Alerting] Allow rule types to specify custom timeout values #111804

ymao1 opened this issue Sep 9, 2021 · 4 comments · Fixed by #113487
Assignees
Labels
estimate:small Small Estimated Level of Effort Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@ymao1
Copy link
Contributor

ymao1 commented Sep 9, 2021

The task manager has a default timeout value of 5m for all tasks. This can be overridden in the task definition, but alerting currently does not take advantage of this ability. This issue is to allow rule type producers to specify a custom timeout value when registering a rule type and for the alerting framework to use these custom timeouts when registering the task.

We can also think about whether we want to set a framework level default timeout value for all rule types that is longer than the task manager default of 5m. Perhaps this should be configurable as well.

@ymao1 ymao1 added estimate:small Small Estimated Level of Effort Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Sep 9, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@gmmorris
Copy link
Contributor

@jasonrhodes / @oatkiller / @chrisronline / @devin / @madirey - as prolific rule type implementers I'm curious to hear your thoughts on this idea.

If you had the option of setting a custom timeout for your rule types (or even individual rules) would you use this ability?
What would you base this custom timeout on?

@MikePaquette
Copy link

Thanks @ymao1 for pointing out this ticket!
@gmmorris

If you had the option of setting a custom timeout for your rule types (or even individual rules) would you use this ability?

Yes. As users adopt frozen tier data, it is reasonable to expect that they'll have a few rules that query that data tier, for example looking for matches of new threat indicators against archived events, and they might be OK if that rule takes longer to run, say 15-minutes, so we would not want their rule to be cancelled/fail after 5 minutes.

Today, we have one security rule_type (Indicator Match) which is more resource-intensive, actually issuing multiple queries as part of its execution, and thus routinely takes much longer to execute than other rule_types, so a custom timeout for the rule_type would be useful. In addition, allowing per-individual rule timeout values would also be useful, so "regular" custom query rules or EQL rules could be pointed at searchable snapshot data and run successfully even if their query takes longer than 5-min.

What would you base this custom timeout on?

From the user perspective, we don't want to burden the rule_author with details about the underlying alerting framework or the task manager. We wonder if the timeout could take into consideration the configured rule interval, for example, maybe it could be 5-minutes OR the configured rule interval, whichever is greater, or something like that?

cc: @oatkiller @jethr0null

@ymao1
Copy link
Contributor Author

ymao1 commented Oct 4, 2021

@MikePaquette Opened an issue to investigate allowing individual rule instances to specify their own timeout values: #113823

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate:small Small Estimated Level of Effort Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants