Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put the auto calculation of capacity behind a feature flag, for now #195390

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions x-pack/plugins/task_manager/server/config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ describe('config validation', () => {
expect(configSchema.validate(config)).toMatchInlineSnapshot(`
Object {
"allow_reading_invalid_state": true,
"auto_calculate_default_ech_capacity": false,
"claim_strategy": "update_by_query",
"discovery": Object {
"active_nodes_lookback": "30s",
Expand Down Expand Up @@ -75,6 +76,7 @@ describe('config validation', () => {
expect(configSchema.validate(config)).toMatchInlineSnapshot(`
Object {
"allow_reading_invalid_state": true,
"auto_calculate_default_ech_capacity": false,
"claim_strategy": "update_by_query",
"discovery": Object {
"active_nodes_lookback": "30s",
Expand Down Expand Up @@ -135,6 +137,7 @@ describe('config validation', () => {
expect(configSchema.validate(config)).toMatchInlineSnapshot(`
Object {
"allow_reading_invalid_state": true,
"auto_calculate_default_ech_capacity": false,
"claim_strategy": "update_by_query",
"discovery": Object {
"active_nodes_lookback": "30s",
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/task_manager/server/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ export const configSchema = schema.object(
}),
claim_strategy: schema.string({ defaultValue: CLAIM_STRATEGY_UPDATE_BY_QUERY }),
request_timeouts: requestTimeoutsConfig,
auto_calculate_default_ech_capacity: schema.boolean({ defaultValue: false }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add to the docker allowlist and the cloud allowlist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. The original thinking was to revert this PR by 8.18 and ensure we're happy with the HEAP_TO_CAPACITY_MAP config based on production experiments, and use xpack.task_manager.capacity as the opt-out strategy. But I can see where we could use this as an opt-out mechanism as well. I'll take note to think it through, I'll add it to the dockerfile anyway in this PR 1deaff2, leaving the cloud allow list that we'll need to add if ever we continue with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it in the cloud allow list, it will only be possible to set it with the operator overrides capability. That should be fine if we only need to deal with a few cases.

Also to keep in mind, I believe the cloud allow list stuff is only updated on releases, but not sure. Meaning we may need to wait for a point release to wait for it to go into effect.

Copy link
Contributor Author

@mikecote mikecote Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I can expand a bit into the two options in 8.18 to rollback the auto calculated capacity:

  1. Customer sets xpack.task_manager.capacity to an explicit value, which will take precedence over the calculated default value. In this case, we can make them set 10 or something else.
  2. If we keep the feature flag, customers can set xpack.task_manager. auto_calculate_default_ech_capacity to false, which means we'll default to 10 normal tasks until they specify otherwise via xpack.task_manager.capacity. It's pretty much the same as asking them to put a capacity of 10 but with the added benefit that we can re-opt them into auto calculating when removing the auto_calculate_default_ech_capacity setting (breaking change).

It feels like option 1 is ok and we can remove this new auto_calculate_default_ech_capacity setting in 8.18 when we no longer need this functionality off by default. I was thinking of this approach as an alternate way of removing the code and adding it back in for 8.18

},
{
validate: (config) => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ describe('EphemeralTaskLifecycle', () => {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
...config,
},
elasticsearchAndSOAvailability$,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ describe('managed configuration', () => {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
});
logger = context.logger.get('taskManager');

Expand Down Expand Up @@ -209,6 +210,7 @@ describe('managed configuration', () => {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
});
logger = context.logger.get('taskManager');

Expand Down Expand Up @@ -334,6 +336,7 @@ describe('managed configuration', () => {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
});
logger = context.logger.get('taskManager');

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ const config = {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
};

const getStatsWithTimestamp = ({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,56 @@ import { CLAIM_STRATEGY_UPDATE_BY_QUERY, CLAIM_STRATEGY_MGET, DEFAULT_CAPACITY }
import { getDefaultCapacity } from './get_default_capacity';

describe('getDefaultCapacity', () => {
it('returns default capacity when autoCalculateDefaultEchCapacity=false', () => {
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: false,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: false,
isBackgroundTaskNodeOnly: false,
claimStrategy: CLAIM_STRATEGY_MGET,
})
).toBe(DEFAULT_CAPACITY);

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: false,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: true,
isBackgroundTaskNodeOnly: false,
claimStrategy: CLAIM_STRATEGY_MGET,
})
).toBe(DEFAULT_CAPACITY);

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: false,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: false,
isBackgroundTaskNodeOnly: true,
claimStrategy: CLAIM_STRATEGY_MGET,
})
).toBe(DEFAULT_CAPACITY);

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: false,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: true,
isBackgroundTaskNodeOnly: true,
claimStrategy: CLAIM_STRATEGY_MGET,
})
).toBe(DEFAULT_CAPACITY);
});

it('returns default capacity when not in cloud', () => {
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: false,
Expand All @@ -22,6 +69,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: true,
Expand All @@ -32,6 +80,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: false,
Expand All @@ -42,6 +91,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: true,
Expand All @@ -54,6 +104,7 @@ describe('getDefaultCapacity', () => {
it('returns default capacity when default claim strategy', () => {
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: true,
isServerless: false,
Expand All @@ -64,6 +115,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: true,
isServerless: false,
Expand All @@ -76,6 +128,7 @@ describe('getDefaultCapacity', () => {
it('returns default capacity when serverless', () => {
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: true,
Expand All @@ -86,6 +139,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: false,
isServerless: true,
Expand All @@ -96,6 +150,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: true,
isServerless: true,
Expand All @@ -106,6 +161,7 @@ describe('getDefaultCapacity', () => {

expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: true,
isServerless: true,
Expand All @@ -119,6 +175,7 @@ describe('getDefaultCapacity', () => {
// 1GB
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: true,
isServerless: false,
Expand All @@ -130,6 +187,7 @@ describe('getDefaultCapacity', () => {
// 1GB but somehow background task node only is true
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 851443712,
isCloud: true,
isServerless: false,
Expand All @@ -141,6 +199,7 @@ describe('getDefaultCapacity', () => {
// 2GB
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 1702887424,
isCloud: true,
isServerless: false,
Expand All @@ -152,6 +211,7 @@ describe('getDefaultCapacity', () => {
// 2GB but somehow background task node only is true
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 1702887424,
isCloud: true,
isServerless: false,
Expand All @@ -163,6 +223,7 @@ describe('getDefaultCapacity', () => {
// 4GB
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 3405774848,
isCloud: true,
isServerless: false,
Expand All @@ -174,6 +235,7 @@ describe('getDefaultCapacity', () => {
// 4GB background task only
expect(
getDefaultCapacity({
autoCalculateDefaultEchCapacity: true,
heapSizeLimit: 3405774848,
isCloud: true,
isServerless: false,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import { CLAIM_STRATEGY_MGET, DEFAULT_CAPACITY } from '../config';

interface GetDefaultCapacityOpts {
autoCalculateDefaultEchCapacity: boolean;
claimStrategy?: string;
heapSizeLimit: number;
isCloud: boolean;
Expand All @@ -24,14 +25,20 @@ const HEAP_TO_CAPACITY_MAP = [
];

export function getDefaultCapacity({
autoCalculateDefaultEchCapacity,
claimStrategy,
heapSizeLimit: heapSizeLimitInBytes,
isCloud,
isServerless,
isBackgroundTaskNodeOnly,
}: GetDefaultCapacityOpts) {
// perform heap size based calculations only in cloud
if (isCloud && !isServerless && claimStrategy === CLAIM_STRATEGY_MGET) {
if (
autoCalculateDefaultEchCapacity &&
isCloud &&
!isServerless &&
claimStrategy === CLAIM_STRATEGY_MGET
) {
// convert bytes to GB
const heapSizeLimitInGB = heapSizeLimitInBytes / 1e9;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ const config: TaskManagerConfig = {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
};

describe('createAggregator', () => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ describe('Configuration Statistics Aggregator', () => {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
};

const managedConfig = {
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/task_manager/server/plugin.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ const pluginInitializerContextParams = {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
};

describe('TaskManagerPlugin', () => {
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/task_manager/server/plugin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,7 @@ export class TaskManagerPlugin
const isServerless = this.initContext.env.packageInfo.buildFlavor === 'serverless';

const defaultCapacity = getDefaultCapacity({
autoCalculateDefaultEchCapacity: this.config.auto_calculate_default_ech_capacity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update the logger.info message below with this config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be useful, I added it to the log message in this commit: 1deaff2

claimStrategy: this.config?.claim_strategy,
heapSizeLimit: this.heapSizeLimit,
isCloud: cloud?.isCloudEnabled ?? false,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ describe('TaskPollingLifecycle', () => {
request_timeouts: {
update_by_query: 1000,
},
auto_calculate_default_ech_capacity: false,
},
taskStore: mockTaskStore,
logger: taskManagerLogger,
Expand Down