Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement client-side updates w/ mget to prune stale docs #155770

Open
kobelb opened this issue Apr 25, 2023 · 6 comments · May be fixed by #159459
Open

Implement client-side updates w/ mget to prune stale docs #155770

kobelb opened this issue Apr 25, 2023 · 6 comments · May be fixed by #159459
Assignees
Labels
Feature:Task Manager Meta Project:Serverless MVP Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@kobelb
Copy link
Contributor

kobelb commented Apr 25, 2023

Feature Description

Change the task manager task claiming algorithm to use a _search to retrieve candidate tasks, a _mget to prune the docs whose version number doesn't match, and then a _bulk to claim the tasks. This will increase the background task capacity in Serverless.

Business Value

Increased background task capacity, reducing the COGS for running alerting rules and actions, and providing a lower MTTD/MTTR.

Definition of Done

  • Changes to the algorithm
  • Performance tests against Stateful Elasticsearch to ensure performance did not degrade significantly
  • Performance tests against Serverless Elasticsearch to measure performance
  • Unit tests

Phases

  1. [Task Manager] allow multiple task claiming strategies #171677
  2. Create alternative task claimer that performs client-side updates w/ mget to prune stale docs #181325
  3. Performance test the alternative task claimer that uses client-side updates w/ mget to prune stale docs #181326
  4. Rollout client-side updates w/ mget as the default task claimer #181327

Implementation: multiple PRs:

@kobelb kobelb added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Project:Serverless MVP labels Apr 25, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@kobelb
Copy link
Contributor Author

kobelb commented Apr 25, 2023

/cc @pmuellr

@pmuellr pmuellr self-assigned this Apr 26, 2023
@pmuellr
Copy link
Member

pmuellr commented Apr 27, 2023

@kobelb looks like a POC of this is here: main...kobelb:kibana:task_clientside_update

Is that right?

@kobelb
Copy link
Contributor Author

kobelb commented Apr 27, 2023

@pmuellr main...kobelb:kibana:task_clientside_update was a super early attempt at this that I wouldn't recommend treating as a proof-of-concept for this implementation.

#150769 minus the parts that do "task partitioning" are closer to what we want here. Happy to discuss further.

@mikecote mikecote moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors May 1, 2023
@mikecote mikecote moved this from Todo to Blocked / On hold in AppEx: ResponseOps - Execution & Connectors May 1, 2023
@mikecote
Copy link
Contributor

mikecote commented May 30, 2023

@pmuellr here's a PR (#157156) to what I did for ON week, it contains some client-side update code that you can pick from as well (minus the cost parts). I removed a little bit of RxJS as well.

If ever we wanted to explore having the search -> update logic in a worker thread, I've POC'ed it here: mikecote#5.

I've also POC'ed how to skip the claiming phase if ever we wanted to save an update -> mikecote#6. But it may be useful to not do this if ever we decide to use worker threads..

@mikecote mikecote moved this from Blocked / On hold to Todo in AppEx: ResponseOps - Execution & Connectors Jun 5, 2023
@pmuellr pmuellr linked a pull request Jun 12, 2023 that will close this issue
9 tasks
@pmuellr pmuellr moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Jun 12, 2023
@mikecote mikecote moved this from In Progress to Blocked / On hold in AppEx: ResponseOps - Execution & Connectors Aug 17, 2023
@mikecote mikecote moved this from Blocked / On hold to Todo in AppEx: ResponseOps - Execution & Connectors Oct 30, 2023
@pmuellr pmuellr moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Nov 14, 2023
pmuellr added a commit to pmuellr/kibana that referenced this issue Nov 27, 2023
see elastic#155770

Make the task manager task claiming algorithm selectable, to allow
alternative implementations in the future.  No other implementations
are provided here, this is setup for adding the next one.
pmuellr added a commit that referenced this issue Nov 28, 2023
see #155770

Make the task manager task claiming algorithm selectable, to allow
alternative implementations in the future. No other implementations are
provided here, this is setup for adding the next algorithm. Task Manager
behavior should not be changed by this PR - code has just be re-org'd.

This exposes a new config key which is exposed to Docker -
`xpack.task_manager.claim_strategy`. The only allowed value is
`default`. No plans at present to document this, or allow-list for the
cloud. We may end up changing the config key to just test for serverless
instead, when we implement the next task claiming algorithm (see
referenced issue ^^^, which is aimed for serverless).

The jest tests were coarsely re-org'd. Once we have > 1 algorithm, we'll
like want to re-org a bit more, so we can test all the implementations
"in a loop".
@mikecote mikecote moved this from In Progress to On Hold in AppEx: ResponseOps - Execution & Connectors Dec 18, 2023
@mikecote
Copy link
Contributor

Here's a rollout plan after discussing with @kobelb:

  • We'll need the task delay monitoring (https://github.com/elastic/response-ops-team/issues/138) before we start enabling the new polling mechanism on projects
  • A configuration change will be required to opt-in to the new polling mechanism (feature flag)
  • We'll slowly enable it on select projects and expand as we see the poller working fine
  • We'll test as much as we can in unit tests and integration tests
  • Code reviews

Following this plan should allow us to implement the new polling mechanism while mitigating risk.

cc @pmuellr

@mikecote mikecote moved this from On Hold to In Progress in AppEx: ResponseOps - Execution & Connectors Apr 15, 2024
@mikecote mikecote added the Meta label Apr 22, 2024
pmuellr added a commit to pmuellr/kibana that referenced this issue May 7, 2024
resolves: elastic#155770

Adds a new task claiming strategy `mget`, which can be used instead of
the default one `default`.  Add the following to your `kibana.yml` to
enable it:

    xpack.task_manager.claim_strategy: 'mget'
pmuellrgitoff pushed a commit to pmuellrgitoff/kibana that referenced this issue Oct 5, 2024
resolves elastic/kibana#155770

Change the task manager task claiming algorithm to use a _search
to retrieve candidate tasks, a _mget to prune the docs whose
version number doesn't match, and then a _bulk to claim the tasks.
This will increase the background task capacity in Serverless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Meta Project:Serverless MVP Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants