-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ResponseOps][Actions] Improve Task Manager’s retry logic for ad-hoc tasks #143860
[ResponseOps][Actions] Improve Task Manager’s retry logic for ad-hoc tasks #143860
Conversation
Pinging @elastic/response-ops (Team:ResponseOps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified that I see actions retry after 30 seconds and then in 5 minute increments
@@ -338,9 +338,9 @@ export default function ({ getService }: FtrProviderContext) { | |||
|
|||
await retry.try(async () => { | |||
const scheduledTask = await currentTask(task.id); | |||
expect(scheduledTask.attempts).to.be.greaterThan(0); | |||
expect(scheduledTask.attempts).to.be.greaterThan(1); | |||
expect(Date.parse(scheduledTask.runAt)).to.be.greaterThan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run this through the flaky test runner to make sure it's not flaky? I think we've had issues with these types of date based tests before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the flaky test runner results: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/1489
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolves #143048
Summary
Updated the retry logic in the task manager. If the first attempt encounters a failure we will retry 30 seconds later. If the second attempt fails, we will start a retry 5-minute multiple from the previous run. It will look like this
Attempt 1: now
Attempt 2: 30s after the first attempt
Attempt 3: 5m after the second attempt
Attempt 4: 10m after the third attempt
Attempt 5: 20m after the fourth attempt
Checklist
To verify