-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Initial implementation of alerting task cancel()
#114289
Conversation
…ution is cancelled
…ing/cancel-alerting-tasks
…ing/cancel-alerting-tasks
cancel()
@elasticmachine merge upstream |
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
@@ -0,0 +1,703 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a new file for the cancel
tests bc task_runner.test.ts
is enormous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kibana-docker LGTM
I'm not seeing the execution status updating properly:
but I also see the
-->
Maybe I missed a step in the setup? I'm going to dive in a little more but wanted to give you an early heads up |
I added a
|
@chrisronline Thanks! I'll take a look. First thought is maybe a race condition when the rule timeout is the same as the schedule interval 🤔 |
…ing/cancel-alerting-tasks
…ing/cancel-alerting-tasks
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; left a bunch of comments, mainly questions/nits - I think docs is the only thing that we need to have, out of all the comments.
@@ -199,6 +199,7 @@ kibana_vars=( | |||
xpack.alerting.invalidateApiKeysTask.interval | |||
xpack.alerting.invalidateApiKeysTask.removalDelay | |||
xpack.alerting.defaultRuleTaskTimeout | |||
xpack.alerting.cancelAlertsOnRuleTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be adding docs for this config as well, right? Presumably we'll create a PR to cloud-enable this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a cloud PR for this config when this PR gets merged but I didn't think user docs make sense for this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think user docs make sense for this config?
Why not?
I think the one-liner that you added makes sense and is good enough :-)
x-pack/plugins/task_manager/server/task_running/ephemeral_task_runner.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/task_manager/server/task_running/ephemeral_task_runner.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! I had a couple of questions to start
…g/cancel-alerting-tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes LGTM; thanks!
@@ -199,6 +199,7 @@ kibana_vars=( | |||
xpack.alerting.invalidateApiKeysTask.interval | |||
xpack.alerting.invalidateApiKeysTask.removalDelay | |||
xpack.alerting.defaultRuleTaskTimeout | |||
xpack.alerting.cancelAlertsOnRuleTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think user docs make sense for this config?
Why not?
I think the one-liner that you added makes sense and is good enough :-)
…g/cancel-alerting-tasks
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: cc @ymao1 |
…igrate-away-from-injected-css-js * 'main' of github.com:elastic/kibana: (221 commits) [Reporting] Add log level to config (elastic#118319) [Metrics UI] Skip failing waffle chart color palette test (elastic#118527) [APM] chore: Unify naming of 'apm/scripts/**/*' with snake_case (elastic#118328) skip flaky suites (elastic#99581, elastic#118356, elastic#118474) [Alerting] Initial implementation of alerting task `cancel()` (elastic#114289) chore(NA): creates pkg_npm_types bazel rule (elastic#116465) skip flaky suite (elastic#116892) Bump chromedriver to 95.0.0 (elastic#116724) [Data visualizer] Improve design of expanded rows (elastic#118125) [APM] prefer ECS field names for HTTP and URL (elastic#118485) Update query_debugging_in_development_and_production.md (elastic#118491) [Uptime] adjust Elastic Synthetics integration functional tests (elastic#118163) [kbn/rule-data-utils] add submodules and require public use them (elastic#117963) [DOCS] Refresh APM correlation screenshots (elastic#116723) (elastic#118577) Handles ns to ms conversion for event loop delay metrics (elastic#118447) [Cases] Rename functional tests folder (elastic#118490) dummy commit skip flaky suite (elastic#118593) Improve workpad schema validation (elastic#97838) skip flaky suite (elastic#118584) ... # Conflicts: # src/plugins/dashboard/public/application/embeddable/viewport/dashboard_viewport.tsx
…c#114289) * Added cancel() to alerting task runner and writing event log document * Updating rule saved object with timeout execution status * Skip scheduling actions and logging event log for alerts if rule execution is cancelled * Adding config for disabling skipping actions * Fixing types * Adding flag for rule types to opt out of skipping acitons * Using task runner uuid to differentiate between task instances * Adding functional test * Default to timestamp when startedAt is not available * Reverting previous change and updating task pool filter instead * Fixing functional test * Adding debug logging * Fixing unit tests * Fixing unit tests * Adding rule name to event log doc and rule type timeout to log messages * Simplifying register logic and adding check to see if already cancelled * Updating task uuid based on PR comments * Removing observable * Fixing functional test * Adding to docs Co-authored-by: Kibana Machine <[email protected]>
* Added cancel() to alerting task runner and writing event log document * Updating rule saved object with timeout execution status * Skip scheduling actions and logging event log for alerts if rule execution is cancelled * Adding config for disabling skipping actions * Fixing types * Adding flag for rule types to opt out of skipping acitons * Using task runner uuid to differentiate between task instances * Adding functional test * Default to timestamp when startedAt is not available * Reverting previous change and updating task pool filter instead * Fixing functional test * Adding debug logging * Fixing unit tests * Fixing unit tests * Adding rule name to event log doc and rule type timeout to log messages * Simplifying register logic and adding check to see if already cancelled * Updating task uuid based on PR comments * Removing observable * Fixing functional test * Adding to docs Co-authored-by: Kibana Machine <[email protected]>
Resolves #113459
Summary
With this PR, the alerting task runner implements a
cancel()
function (currently only called by the task manager when an alerting task exceeds its timeout value).execute-timeout
status:error
,reason:timeout
.cancelAlertsOnRuleTimeout
to control action skipping behavior at framework level, defaults totrue
cancelAlertsOnRuleTimeout
to control action skipping behavior at rule type level. if not set, defaults to whatever the value of the alerting configcancelAlertsOnRuleTimeout
iscancelAlertsOnRuleTimeout
inkibana.yml
is true AND the rule typecancelAlertsOnRuleTimeout
is trueTo Verify
x-pack/plugins/stack_alerts/server/alert_types/es_query/alert_type.ts
:This sets the
ruleTaskTimeout
to1m
and adds a 2 minute delay to the executorerror
status withreason: timeout
:Rule SO execution status
execute-timeout
:Event log doc
Verify you receive no action
Now add
xpack.alerting.cancelAlertsOnRuleTimeout: false
to your kibana config and repeat steps 1-7. Verify that all the timeout docs/statuses get written as expected, but you also receive your action at the end of rule execution (after timeout).Now remove
xpack.alerting.cancelAlertsOnRuleTimeout: false
from your kibana config and addcancelAlertsOnRuleTimeout: false
to the Elasticsearch rule type definition inx-pack/plugins/stack_alerts/server/alert_types/es_query/alert_type.ts
. Repeat steps 1-7 and verify that all the timeout docs/statuses get written as expected, but you also receive your action at the end of rule execution (after timeout).Checklist
Delete any items that are not applicable to this PR.