Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[alerting] gracefully handle error in initialization of Alert TaskRunner #54335

Merged
merged 5 commits into from
Jan 13, 2020

Conversation

gmmorris
Copy link
Contributor

@gmmorris gmmorris commented Jan 9, 2020

Summary

Prevents an edge cases where Alerts can end up in a zombie state.

  1. Decrypting attributes throws an error
  2. Fetching an Api Key throws an error
  3. Getting Services with user permissions throws an error

closes #54334

Checklist

Use strikethroughs to remove checklist items you don't feel are applicable to this PR.

For maintainers

@gmmorris gmmorris added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.6.0 v8.0.0 labels Jan 9, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@gmmorris gmmorris added the release_note:skip Skip the PR/issue when compiling release notes label Jan 9, 2020
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except descriptions of the test cases. They seem either wrong, or it's not clear why the mocked errors are testing what the say they are testing.

I have a feeling this change is going to cause some further issues when merged. Whereas before this, I guess tasks became zombies in some cases, now they won't, but they will likely still fail - if it couldn't decrypt encrypted attributes before, it's not going to be able to now either. Relatedly, we don't seem to report these errors anywhere. So it seems like they're still cases where they are going to be zombie-ish (the task won't run), only instead of not running it anymore, we'll keep running it.

If that's the case, I'm still fine with this, but let's open an issue to discuss. We may want to just rely on the event log (in the future) to be able to have folks notice this, which would give us a place to "log" the error.

* master:
  Remove eslint overwrite for src/legacy/core_plugins/kibana (elastic#54222)
* master: (69 commits)
  [Graph] Fix various a11y issues (elastic#54097)
  Add ApplicationService app status management (elastic#50223)
  logs in one time (elastic#54447)
  Deprecate using `elasticsearch.ssl.certificate` without `elasticsearch.ssl.key` and vice versa (elastic#54392)
  [Optimizer] Fix a stack overflow with watch_cache when it attempts to delete very large folders. (elastic#54457)
  Security - Role Mappings UI (elastic#53620)
  [SIEM] [Detection engine] Permission II (elastic#54292)
  Allow User to Cleanup Repository from UI  (elastic#53047)
  [Detection engine] Some UX for rule creation (elastic#54471)
  share specific instances of some ui packages (elastic#54079)
  [ML] APM modules configs for RUM Javascript and NodeJS (elastic#53792)
  [APM] Delay rendering invalid license notification (elastic#53924)
  [Graph] Improve error message on graph requests (elastic#54230)
  [ILM] Kibana should allow a min_age setting of 0ms in ILM policy phases (elastic#53719)
  Unit Tests for common/lib (elastic#53736)
  [Graph] Only show explorable fields (elastic#54101)
  remove linting rule exception for markdown (elastic#54232)
  [Monitoring] Fetch shard data more efficiently (elastic#54028)
  [Maps] Add hiddenLayers option to embeddable map input (elastic#54355)
  Pass termOrder and hasTermsAgg properties to serializeThresholdWatch function (elastic#54391)
  ...
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@gmmorris gmmorris merged commit e8b2b28 into elastic:master Jan 13, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jan 13, 2020
…ner (elastic#54335)

Prevents an edge cases where Alerts can end up in a zombie state.

1. Decrypting attributes throws an error
2. Fetching an Api Key throws an error
3. Getting Services with user permissions throws an error
gmmorris added a commit that referenced this pull request Jan 13, 2020
…askRunner (#54335) (#54603)

Prevents an edge cases where Alerts can end up in a zombie state.

1. Decrypting attributes throws an error
2. Fetching an Api Key throws an error
3. Getting Services with user permissions throws an error
jkelastic pushed a commit to jkelastic/kibana that referenced this pull request Jan 17, 2020
…ner (elastic#54335)

Prevents an edge cases where Alerts can end up in a zombie state.

1. Decrypting attributes throws an error
2. Fetching an Api Key throws an error
3. Getting Services with user permissions throws an error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.6.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent Zombie Alerts when encountering issues decrypting attributes
4 participants