-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put the auto calculation of capacity behind a feature flag, for now #195390
Put the auto calculation of capacity behind a feature flag, for now #195390
Conversation
Pinging @elastic/response-ops (Team:ResponseOps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left one comment about adding the config to the logger
@@ -286,6 +286,7 @@ export class TaskManagerPlugin | |||
const isServerless = this.initContext.env.packageInfo.buildFlavor === 'serverless'; | |||
|
|||
const defaultCapacity = getDefaultCapacity({ | |||
autoCalculateDefaultEchCapacity: this.config.auto_calculate_default_ech_capacity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we update the logger.info
message below with this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be useful, I added it to the log message in this commit: 1deaff2
@@ -204,6 +204,7 @@ export const configSchema = schema.object( | |||
}), | |||
claim_strategy: schema.string({ defaultValue: CLAIM_STRATEGY_UPDATE_BY_QUERY }), | |||
request_timeouts: requestTimeoutsConfig, | |||
auto_calculate_default_ech_capacity: schema.boolean({ defaultValue: false }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add to the docker allowlist and the cloud allowlist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. The original thinking was to revert this PR by 8.18 and ensure we're happy with the HEAP_TO_CAPACITY_MAP
config based on production experiments, and use xpack.task_manager.capacity
as the opt-out strategy. But I can see where we could use this as an opt-out mechanism as well. I'll take note to think it through, I'll add it to the dockerfile anyway in this PR 1deaff2, leaving the cloud allow list that we'll need to add if ever we continue with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it in the cloud allow list, it will only be possible to set it with the operator overrides capability. That should be fine if we only need to deal with a few cases.
Also to keep in mind, I believe the cloud allow list stuff is only updated on releases, but not sure. Meaning we may need to wait for a point release to wait for it to go into effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I can expand a bit into the two options in 8.18 to rollback the auto calculated capacity:
- Customer sets
xpack.task_manager.capacity
to an explicit value, which will take precedence over the calculated default value. In this case, we can make them set10
or something else. - If we keep the feature flag, customers can set
xpack.task_manager. auto_calculate_default_ech_capacity
tofalse
, which means we'll default to10
normal tasks until they specify otherwise viaxpack.task_manager.capacity
. It's pretty much the same as asking them to put a capacity of10
but with the added benefit that we can re-opt them into auto calculating when removing theauto_calculate_default_ech_capacity
setting (breaking change).
It feels like option 1 is ok and we can remove this new auto_calculate_default_ech_capacity
setting in 8.18 when we no longer need this functionality off by default. I was thinking of this approach as an alternate way of removing the code and adding it back in for 8.18
💚 Build Succeeded
Metrics [docs]
History
cc @mikecote |
Starting backport for target branches: 8.x https://github.com/elastic/kibana/actions/runs/11240892056 |
…lastic#195390) In this PR, I'm preparing for the 8.16 release where we'd like to start rolling out the `mget` task claiming strategy separately from the added concurrency. To accomplish this, we need to put the capacity calculation behind a feature flag that is default to false for now, until we do a second rollout with an increased concurrency. The increased concurrency can be calculated and adjusted based on experiments of clusters setting `xpack.task_manager.capacity` to a higher value and observe the resource usage. PR to deploy to Cloud and verify that we always default to 10 normal tasks: elastic#195392 (cherry picked from commit 9c8f689)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
… now (#195390) (#195486) # Backport This will backport the following commits from `main` to `8.x`: - [Put the auto calculation of capacity behind a feature flag, for now (#195390)](#195390) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Mike Côté","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-08T17:48:07Z","message":"Put the auto calculation of capacity behind a feature flag, for now (#195390)\n\nIn this PR, I'm preparing for the 8.16 release where we'd like to start\r\nrolling out the `mget` task claiming strategy separately from the added\r\nconcurrency. To accomplish this, we need to put the capacity calculation\r\nbehind a feature flag that is default to false for now, until we do a\r\nsecond rollout with an increased concurrency. The increased concurrency\r\ncan be calculated and adjusted based on experiments of clusters setting\r\n`xpack.task_manager.capacity` to a higher value and observe the resource\r\nusage.\r\n\r\nPR to deploy to Cloud and verify that we always default to 10 normal\r\ntasks: https://github.com/elastic/kibana/pull/195392","sha":"9c8f689aca23ed8b1f560c57a9a660d318375412","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.16.0"],"title":"Put the auto calculation of capacity behind a feature flag, for now","number":195390,"url":"https://github.com/elastic/kibana/pull/195390","mergeCommit":{"message":"Put the auto calculation of capacity behind a feature flag, for now (#195390)\n\nIn this PR, I'm preparing for the 8.16 release where we'd like to start\r\nrolling out the `mget` task claiming strategy separately from the added\r\nconcurrency. To accomplish this, we need to put the capacity calculation\r\nbehind a feature flag that is default to false for now, until we do a\r\nsecond rollout with an increased concurrency. The increased concurrency\r\ncan be calculated and adjusted based on experiments of clusters setting\r\n`xpack.task_manager.capacity` to a higher value and observe the resource\r\nusage.\r\n\r\nPR to deploy to Cloud and verify that we always default to 10 normal\r\ntasks: https://github.com/elastic/kibana/pull/195392","sha":"9c8f689aca23ed8b1f560c57a9a660d318375412"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/195390","number":195390,"mergeCommit":{"message":"Put the auto calculation of capacity behind a feature flag, for now (#195390)\n\nIn this PR, I'm preparing for the 8.16 release where we'd like to start\r\nrolling out the `mget` task claiming strategy separately from the added\r\nconcurrency. To accomplish this, we need to put the capacity calculation\r\nbehind a feature flag that is default to false for now, until we do a\r\nsecond rollout with an increased concurrency. The increased concurrency\r\ncan be calculated and adjusted based on experiments of clusters setting\r\n`xpack.task_manager.capacity` to a higher value and observe the resource\r\nusage.\r\n\r\nPR to deploy to Cloud and verify that we always default to 10 normal\r\ntasks: https://github.com/elastic/kibana/pull/195392","sha":"9c8f689aca23ed8b1f560c57a9a660d318375412"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Mike Côté <[email protected]>
In this PR, I'm preparing for the 8.16 release where we'd like to start rolling out the
mget
task claiming strategy separately from the added concurrency. To accomplish this, we need to put the capacity calculation behind a feature flag that is default to false for now, until we do a second rollout with an increased concurrency. The increased concurrency can be calculated and adjusted based on experiments of clusters settingxpack.task_manager.capacity
to a higher value and observe the resource usage.PR to deploy to Cloud and verify that we always default to 10 normal tasks: #195392