-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Log warning when rules are not rescheduled due to Saved Object not found error #101591
[Alerting] Log warning when rules are not rescheduled due to Saved Object not found error #101591
Conversation
…ect not found and doesn't reschedule rule
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
@@ -587,6 +587,9 @@ export class TaskRunner< | |||
), | |||
schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => { | |||
if (isAlertSavedObjectNotFoundError(error, alertId)) { | |||
this.logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about adding telemetry regarding this (in a separate ticket)? Do we have any insight into how often this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can create a separate issue for this. I think we would have to query the event log index for rule executions that end in an error status, but maybe would need a different field to aggregate on since the event log captures the Saved object not found
message but since that contains the rule id, it's different for each rule. Currently it looks like the telemetry runs on the .kibana
index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisronline I opened a generic issue for adding information from the event log to telemetry: #101809
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment about adding some more info to the message we log.
@@ -587,6 +587,9 @@ export class TaskRunner< | |||
), | |||
schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => { | |||
if (isAlertSavedObjectNotFoundError(error, alertId)) { | |||
this.logger.warn( | |||
`Unable to execute rule "${alertId}" because ${error.message} - this rule will not be rescheduled. To restart rule execution, try disabling and re-enabling this rule.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this message is so actionable, I feel like we should include more info so we can make it as easy as possible for the user to find the alert to fix it. Probably means adding the Kibana space and rule name. Rule type would be interesting for diagnostic purposes / telemetry, but probably doesn't help a user that much - I would assume the rule name would provide most of the context to find the rule in Kibana ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmuellr So, this might be a little tricky since in this context, we only have access to the ruleId
and presumably, since we're seeing the Saved object not found
error, we have been unable to retrieve the saved object that would give us the spaceId
and the ruleName
.
ETA. My mistake, it looks like we do have access to the namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, right! bummer!
I think we have seen this message when we get transient network problems, in which case we might have gotten the alert SO at the some point, then failed later when the alert ran. In which case, some piece of code had that info. How we'd keep track on that seems ... difficult. Ah well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the spaceId if defined to the message in this commit d314a54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - adding the space will help a bit anyway! Thx!
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @ymao1 |
…ject not found error (elastic#101591) * Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule * Adding space id to warning message Co-authored-by: Kibana Machine <[email protected]>
💚 Backport successful
This backport PR will be merged automatically after passing CI. |
…ject not found error (#101591) (#101827) * Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule * Adding space id to warning message Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: ymao1 <[email protected]>
…add-agent-flyout * 'master' of github.com:elastic/kibana: (35 commits) [Cases] Improve connectors mapping (elastic#101145) [ML] Fixes display of job group badges in recognizer wizard (elastic#101775) Fix es_archives path (elastic#101737) [kbnArchiver] convert archive names to root-relative paths (elastic#101839) [Reporting] Make "ScreenCapturePanel" shareable for Canvas (elastic#100623) [Alerting UI] Converted Rules and Connectors management pages to new layout. (elastic#101697) [Fleet] Support granular integrations in policy editor (elastic#101531) [Security Solution][Detections] Update detection alert mappings to ECS v1.10.0 (elastic#101680) [Fleet] Integrations UI: Adjust policies list UI (elastic#101600) chore(NA): moving @kbn/server-route-repository into bazel (elastic#101484) Support owner and description attributes inside the Manifest file, use in API docs (elastic#101786) [Security Solution] fix security empty overview links (elastic#101536) Unskips migration tests now that elastic search is fixed (elastic#101682) Fix endpoint -> integrations onboarding link (elastic#101804) [Alerting] Log warning when rules are not rescheduled due to Saved Object not found error (elastic#101591) Update datafeed_high_count_network_denies.json (elastic#101681) [Index patterns] Field editor example app (elastic#100524) [DOCS] Adding file upload to add data page (elastic#101674) [Security Solution][Endpoint] Adds Endpoint Host Isolation Status common component (elastic#101782) Upgrade ws v7.3.1->v7.4.2 and v6.2.1->v6.2.2 (elastic#101402) ... # Conflicts: # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/agent_policy_selection.tsx # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/index.tsx # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/managed_instructions.tsx # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/standalone_instructions.tsx
…add-integrations-redirect * 'master' of github.com:elastic/kibana: (44 commits) Allow navigating discover flyout via arrow keys (elastic#101772) [Cases] Improve connectors mapping (elastic#101145) [ML] Fixes display of job group badges in recognizer wizard (elastic#101775) Fix es_archives path (elastic#101737) [kbnArchiver] convert archive names to root-relative paths (elastic#101839) [Reporting] Make "ScreenCapturePanel" shareable for Canvas (elastic#100623) [Alerting UI] Converted Rules and Connectors management pages to new layout. (elastic#101697) [Fleet] Support granular integrations in policy editor (elastic#101531) [Security Solution][Detections] Update detection alert mappings to ECS v1.10.0 (elastic#101680) [Fleet] Integrations UI: Adjust policies list UI (elastic#101600) chore(NA): moving @kbn/server-route-repository into bazel (elastic#101484) Support owner and description attributes inside the Manifest file, use in API docs (elastic#101786) [Security Solution] fix security empty overview links (elastic#101536) Unskips migration tests now that elastic search is fixed (elastic#101682) Fix endpoint -> integrations onboarding link (elastic#101804) [Alerting] Log warning when rules are not rescheduled due to Saved Object not found error (elastic#101591) Update datafeed_high_count_network_denies.json (elastic#101681) [Index patterns] Field editor example app (elastic#100524) [DOCS] Adding file upload to add data page (elastic#101674) [Security Solution][Endpoint] Adds Endpoint Host Isolation Status common component (elastic#101782) ... # Conflicts: # x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/create_package_policy_page/index.tsx # x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/details_page/components/package_policies/package_policies_table.tsx
…ject not found error (#101591) * Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule * Adding space id to warning message Co-authored-by: Kibana Machine <[email protected]>
Resolves #101227
Summary
Logging a warning with a suggestion to disable/reenable to restart rule execution.
Checklist
Delete any items that are not applicable to this PR.