Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Log warning when rules are not rescheduled due to Saved Object not found error #101591

Merged

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jun 8, 2021

Resolves #101227

Summary

Logging a warning with a suggestion to disable/reenable to restart rule execution.

Checklist

Delete any items that are not applicable to this PR.

@ymao1 ymao1 self-assigned this Jun 8, 2021
@ymao1 ymao1 added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.14.0 v8.0.0 labels Jun 8, 2021
@ymao1 ymao1 marked this pull request as ready for review June 8, 2021 16:20
@ymao1 ymao1 requested a review from a team as a code owner June 8, 2021 16:20
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ymao1
Copy link
Contributor Author

ymao1 commented Jun 9, 2021

@elasticmachine merge upstream

@chrisronline chrisronline self-requested a review June 9, 2021 16:40
@@ -587,6 +587,9 @@ export class TaskRunner<
),
schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => {
if (isAlertSavedObjectNotFoundError(error, alertId)) {
this.logger.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about adding telemetry regarding this (in a separate ticket)? Do we have any insight into how often this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can create a separate issue for this. I think we would have to query the event log index for rule executions that end in an error status, but maybe would need a different field to aggregate on since the event log captures the Saved object not found message but since that contains the rule id, it's different for each rule. Currently it looks like the telemetry runs on the .kibana index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisronline I opened a generic issue for adding information from the event log to telemetry: #101809

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about adding some more info to the message we log.

@@ -587,6 +587,9 @@ export class TaskRunner<
),
schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => {
if (isAlertSavedObjectNotFoundError(error, alertId)) {
this.logger.warn(
`Unable to execute rule "${alertId}" because ${error.message} - this rule will not be rescheduled. To restart rule execution, try disabling and re-enabling this rule.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this message is so actionable, I feel like we should include more info so we can make it as easy as possible for the user to find the alert to fix it. Probably means adding the Kibana space and rule name. Rule type would be interesting for diagnostic purposes / telemetry, but probably doesn't help a user that much - I would assume the rule name would provide most of the context to find the rule in Kibana ...

Copy link
Contributor Author

@ymao1 ymao1 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmuellr So, this might be a little tricky since in this context, we only have access to the ruleId and presumably, since we're seeing the Saved object not found error, we have been unable to retrieve the saved object that would give us the spaceId and the ruleName.

ETA. My mistake, it looks like we do have access to the namespace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, right! bummer!

I think we have seen this message when we get transient network problems, in which case we might have gotten the alert SO at the some point, then failed later when the alert ran. In which case, some piece of code had that info. How we'd keep track on that seems ... difficult. Ah well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the spaceId if defined to the message in this commit d314a54

@ymao1 ymao1 requested a review from pmuellr June 9, 2021 17:13
Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - adding the space will help a bit anyway! Thx!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 added the auto-backport Deprecated - use backport:version if exact versions are needed label Jun 9, 2021
@ymao1 ymao1 merged commit 2fd3c37 into elastic:master Jun 9, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 9, 2021
…ject not found error (elastic#101591)

* Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule

* Adding space id to warning message

Co-authored-by: Kibana Machine <[email protected]>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jun 9, 2021
…ject not found error (#101591) (#101827)

* Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule

* Adding space id to warning message

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: ymao1 <[email protected]>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jun 10, 2021
…add-agent-flyout

* 'master' of github.com:elastic/kibana: (35 commits)
  [Cases] Improve connectors mapping (elastic#101145)
  [ML] Fixes display of job group badges in recognizer wizard (elastic#101775)
  Fix es_archives path (elastic#101737)
  [kbnArchiver] convert archive names to root-relative paths (elastic#101839)
  [Reporting] Make "ScreenCapturePanel" shareable for Canvas (elastic#100623)
  [Alerting UI] Converted Rules and Connectors management pages to new layout. (elastic#101697)
  [Fleet] Support granular integrations in policy editor (elastic#101531)
  [Security Solution][Detections] Update detection alert mappings to ECS v1.10.0 (elastic#101680)
  [Fleet] Integrations UI: Adjust policies list UI (elastic#101600)
  chore(NA): moving @kbn/server-route-repository into bazel (elastic#101484)
  Support owner and description attributes inside the Manifest file, use in API docs (elastic#101786)
  [Security Solution] fix security empty overview links (elastic#101536)
  Unskips migration tests now that elastic search is fixed (elastic#101682)
  Fix endpoint -> integrations onboarding link (elastic#101804)
  [Alerting] Log warning when rules are not rescheduled due to Saved Object not found error (elastic#101591)
  Update datafeed_high_count_network_denies.json (elastic#101681)
  [Index patterns] Field editor example app (elastic#100524)
  [DOCS] Adding file upload to add data page (elastic#101674)
  [Security Solution][Endpoint] Adds Endpoint Host Isolation Status common component (elastic#101782)
  Upgrade ws v7.3.1->v7.4.2 and v6.2.1->v6.2.2 (elastic#101402)
  ...

# Conflicts:
#	x-pack/plugins/fleet/public/components/agent_enrollment_flyout/agent_policy_selection.tsx
#	x-pack/plugins/fleet/public/components/agent_enrollment_flyout/index.tsx
#	x-pack/plugins/fleet/public/components/agent_enrollment_flyout/managed_instructions.tsx
#	x-pack/plugins/fleet/public/components/agent_enrollment_flyout/standalone_instructions.tsx
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jun 10, 2021
…add-integrations-redirect

* 'master' of github.com:elastic/kibana: (44 commits)
  Allow navigating discover flyout via arrow keys (elastic#101772)
  [Cases] Improve connectors mapping (elastic#101145)
  [ML] Fixes display of job group badges in recognizer wizard (elastic#101775)
  Fix es_archives path (elastic#101737)
  [kbnArchiver] convert archive names to root-relative paths (elastic#101839)
  [Reporting] Make "ScreenCapturePanel" shareable for Canvas (elastic#100623)
  [Alerting UI] Converted Rules and Connectors management pages to new layout. (elastic#101697)
  [Fleet] Support granular integrations in policy editor (elastic#101531)
  [Security Solution][Detections] Update detection alert mappings to ECS v1.10.0 (elastic#101680)
  [Fleet] Integrations UI: Adjust policies list UI (elastic#101600)
  chore(NA): moving @kbn/server-route-repository into bazel (elastic#101484)
  Support owner and description attributes inside the Manifest file, use in API docs (elastic#101786)
  [Security Solution] fix security empty overview links (elastic#101536)
  Unskips migration tests now that elastic search is fixed (elastic#101682)
  Fix endpoint -> integrations onboarding link (elastic#101804)
  [Alerting] Log warning when rules are not rescheduled due to Saved Object not found error (elastic#101591)
  Update datafeed_high_count_network_denies.json (elastic#101681)
  [Index patterns] Field editor example app (elastic#100524)
  [DOCS] Adding file upload to add data page (elastic#101674)
  [Security Solution][Endpoint] Adds Endpoint Host Isolation Status common component (elastic#101782)
  ...

# Conflicts:
#	x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/create_package_policy_page/index.tsx
#	x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/details_page/components/package_policies/package_policies_table.tsx
semd pushed a commit that referenced this pull request Jun 10, 2021
…ject not found error (#101591)

* Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule

* Adding space id to warning message

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[alerting] log warning when alert tasks are disabled due to saved object not found
6 participants