-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Smarter retry interval for ES Connectivity errors #123642
Conversation
Pinging @elastic/response-ops (Team:ResponseOps) |
}); | ||
|
||
const runnerResult = await taskRunner.run(); | ||
expect(runnerResult.schedule!.interval).toEqual('10s'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick : Maybe we can use mockedTaskInstance.schedule?.interval
rather than a hardcoded string. it took me some time to figure out where do we get this number from :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in this commit: 1e5e9f9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…g/smarter-retry-interval
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @ymao1 |
I can't imagine there's any way to build a functional test for this, is there? I wonder if anyone else in Kibana has functional tests that require ES connectivity errors? I'm thinking even if we could, we could only reasonably test that intervals < 5m would run before the default 5m timeout; don't want to wait 5m in a functional test to see if intervals > 5m would run again in 5m :-) |
I double-checked the original core PR for adding the connectivity error type and there are no functional tests for it. I imagine it would be difficult to mimic connectivity errors in the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolves #122390
Summary
When the alerting task runner throws an error, we check for instances of the ES Unavailable error introduced in this PR and adjust the retry interval accordingly. We set the default connectivity retry to
5m
. If the alerting rule schedule is less than 5 minutes, we use the alerting rule schedule, otherwise we set the retry to5m
.Checklist