Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mget task claim strategy the default #194625

Closed
3 tasks
mikecote opened this issue Oct 1, 2024 · 1 comment · Fixed by #197070
Closed
3 tasks

Make mget task claim strategy the default #194625

mikecote opened this issue Oct 1, 2024 · 1 comment · Fixed by #197070
Assignees
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

mikecote commented Oct 1, 2024

Whenever xpack.taks_manager.claim_strategy isn't set, we should now default to mget. Be aware that changing this value may cause some functional tests to become flaky due to tasks running faster (500ms poll interval, etc).

Definition of Done

  • xpack.taks_manager.claim_strategy defaults to mget
  • Flaky test runner passing 100x on the alerting and task manager functional tests
  • Flaky tests fixed before or with the PR that changes the default value
@mikecote mikecote added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Oct 1, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@mikecote mikecote self-assigned this Oct 18, 2024
mikecote added a commit that referenced this issue Oct 21, 2024
…aimer being the default (#196399)

Resolves #184942
Resolves #192023
Resolves #195573

In this PR, I'm improving the flakiness found in our functional tests in
preperation for mget being the default task claimer that all these tests
run with (#194625). Because the
mget task claimer works differently and also polls more frequently, we
end-up in situations where tasks run faster than they were with
update_by_query, creating more race conditions that are now fixed in
this PR.

Issues were surfaced via #190148
where I set `mget` as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185
(for
0fcf1ae)
mikecote added a commit to mikecote/kibana that referenced this issue Oct 21, 2024
…aimer being the default (elastic#196399)

Resolves elastic#184942
Resolves elastic#192023
Resolves elastic#195573

In this PR, I'm improving the flakiness found in our functional tests in
preperation for mget being the default task claimer that all these tests
run with (elastic#194625). Because the
mget task claimer works differently and also polls more frequently, we
end-up in situations where tasks run faster than they were with
update_by_query, creating more race conditions that are now fixed in
this PR.

Issues were surfaced via elastic#190148
where I set `mget` as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185
(for
elastic@0fcf1ae)

(cherry picked from commit 3b8cf12)

# Conflicts:
#	x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/alerts_as_data/alerts_as_data_flapping.ts
mikecote added a commit that referenced this issue Oct 21, 2024
…ask claimer being the default (#196399) (#197062)

# Backport

This will backport the following commits from `main` to `8.x`:
- [Improve task manager functional tests in preperation for mget task
claimer being the default
(#196399)](#196399)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Mike
Côté","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-21T13:02:59Z","message":"Improve
task manager functional tests in preperation for mget task claimer being
the default (#196399)\n\nResolves
https://github.com/elastic/kibana/issues/184942\r\nResolves
https://github.com/elastic/kibana/issues/192023\r\nResolves
https://github.com/elastic/kibana/issues/195573\r\n\r\nIn this PR, I'm
improving the flakiness found in our functional tests in\r\npreperation
for mget being the default task claimer that all these tests\r\nrun with
(#194625). Because the\r\nmget
task claimer works differently and also polls more frequently,
we\r\nend-up in situations where tasks run faster than they were
with\r\nupdate_by_query, creating more race conditions that are now
fixed in\r\nthis PR.\r\n\r\nIssues were surfaced via
https://github.com/elastic/kibana/pull/190148\r\nwhere I set `mget` as
the default task claiming strategy.\r\n\r\nFlaky test runs (some of
these failed on other tests that are
flaky):\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185\r\n(for\r\nhttps://github.com//pull/196399/commits/0fcf1ae68927277a8f544278903edbf5912a1649)","sha":"3b8cf1236b1b6ba67862f35f47fcb250d88ac4c0","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task
Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.17.0"],"number":196399,"url":"https://github.com/elastic/kibana/pull/196399","mergeCommit":{"message":"Improve
task manager functional tests in preperation for mget task claimer being
the default (#196399)\n\nResolves
https://github.com/elastic/kibana/issues/184942\r\nResolves
https://github.com/elastic/kibana/issues/192023\r\nResolves
https://github.com/elastic/kibana/issues/195573\r\n\r\nIn this PR, I'm
improving the flakiness found in our functional tests in\r\npreperation
for mget being the default task claimer that all these tests\r\nrun with
(#194625). Because the\r\nmget
task claimer works differently and also polls more frequently,
we\r\nend-up in situations where tasks run faster than they were
with\r\nupdate_by_query, creating more race conditions that are now
fixed in\r\nthis PR.\r\n\r\nIssues were surfaced via
https://github.com/elastic/kibana/pull/190148\r\nwhere I set `mget` as
the default task claiming strategy.\r\n\r\nFlaky test runs (some of
these failed on other tests that are
flaky):\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185\r\n(for\r\nhttps://github.com//pull/196399/commits/0fcf1ae68927277a8f544278903edbf5912a1649)","sha":"3b8cf1236b1b6ba67862f35f47fcb250d88ac4c0"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196399","number":196399,"mergeCommit":{"message":"Improve
task manager functional tests in preperation for mget task claimer being
the default (#196399)\n\nResolves
https://github.com/elastic/kibana/issues/184942\r\nResolves
https://github.com/elastic/kibana/issues/192023\r\nResolves
https://github.com/elastic/kibana/issues/195573\r\n\r\nIn this PR, I'm
improving the flakiness found in our functional tests in\r\npreperation
for mget being the default task claimer that all these tests\r\nrun with
(#194625). Because the\r\nmget
task claimer works differently and also polls more frequently,
we\r\nend-up in situations where tasks run faster than they were
with\r\nupdate_by_query, creating more race conditions that are now
fixed in\r\nthis PR.\r\n\r\nIssues were surfaced via
https://github.com/elastic/kibana/pull/190148\r\nwhere I set `mget` as
the default task claiming strategy.\r\n\r\nFlaky test runs (some of
these failed on other tests that are
flaky):\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185\r\n(for\r\nhttps://github.com//pull/196399/commits/0fcf1ae68927277a8f544278903edbf5912a1649)","sha":"3b8cf1236b1b6ba67862f35f47fcb250d88ac4c0"}},{"branch":"8.x","label":"v8.17.0","labelRegex":"^v8.17.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Oct 25, 2024
Resolves elastic#194625

In this PR, I'm setting `mget` as the default task claiming strategy
along the following changes:
- Given we no longer need the 8.16 specific PRs
(elastic#196317 and
elastic#196757), I've also reverted them.
- Given we now use `met` as the default, I've renamed
`task_manager_claimer_mget` to `task_manager_claimer_update_by_query`
and made tests in that folder test using the `update_by_query` claim
strategy.
- Stabilize flaky tests caused by mget + polling for tasks more
frequently

Flaky test runners:
-
[[59b71bc](elastic@59b71bc)]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7197
-
[[aea910e](elastic@aea910e)]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7199
-
[[4723ced](elastic@4723ced)]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7206
-
[[d28c8c5](elastic@d28c8c5)]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7209
-
[[dd7773a](elastic@dd7773a)]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7224

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit c31f11e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants