Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Initial implementation of alerting task cancel() #114289

Merged
merged 40 commits into from
Nov 16, 2021
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
793b2ac
Added cancel() to alerting task runner and writing event log document
ymao1 Oct 6, 2021
5bccd7c
Updating rule saved object with timeout execution status
ymao1 Oct 6, 2021
b7d4590
Skip scheduling actions and logging event log for alerts if rule exec…
ymao1 Oct 6, 2021
56e960b
Adding config for disabling skipping actions
ymao1 Oct 6, 2021
226be69
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 7, 2021
c2a8602
Fixing types
ymao1 Oct 7, 2021
e376c56
Adding flag for rule types to opt out of skipping acitons
ymao1 Oct 8, 2021
1ed70c5
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 8, 2021
658ec46
Merge branch 'master' into alerting/cancel-alerting-tasks
kibanamachine Oct 11, 2021
36a9558
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 12, 2021
44db452
Using task runner uuid to differentiate between task instances
ymao1 Oct 12, 2021
11a9cde
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 12, 2021
7c6db75
Adding functional test
ymao1 Oct 12, 2021
cff64f1
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 12, 2021
fa84e8d
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 13, 2021
0ec005a
Default to timestamp when startedAt is not available
ymao1 Oct 13, 2021
8ea6ff0
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 13, 2021
4401dbd
Reverting previous change and updating task pool filter instead
ymao1 Oct 13, 2021
096762a
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 13, 2021
5b08470
Fixing functional test
ymao1 Oct 13, 2021
7119630
Merge branch 'master' of https://github.com/elastic/kibana into alert…
ymao1 Oct 13, 2021
109ce9b
Merge branch 'master' into alerting/cancel-alerting-tasks
kibanamachine Oct 14, 2021
f327b36
Merge branch 'main' of https://github.com/elastic/kibana into alertin…
ymao1 Nov 1, 2021
9b7570e
Adding debug logging
ymao1 Nov 2, 2021
8b2ebd9
Merge branch 'main' of https://github.com/elastic/kibana into alertin…
ymao1 Nov 2, 2021
44eec3b
Fixing unit tests
ymao1 Nov 2, 2021
5e45e9d
Merge branch 'main' of https://github.com/elastic/kibana into alertin…
ymao1 Nov 2, 2021
3e9b3a9
Fixing unit tests
ymao1 Nov 2, 2021
e5906d4
Merge branch 'main' into alerting/cancel-alerting-tasks
kibanamachine Nov 5, 2021
4672e4a
Merge branch 'main' into alerting/cancel-alerting-tasks
kibanamachine Nov 8, 2021
3ebfbfe
Merge branch 'main' into alerting/cancel-alerting-tasks
kibanamachine Nov 10, 2021
939e665
Merging in main
ymao1 Nov 15, 2021
a7a39a2
Adding rule name to event log doc and rule type timeout to log messages
ymao1 Nov 15, 2021
1787d61
Simplifying register logic and adding check to see if already cancelled
ymao1 Nov 15, 2021
6851db2
Updating task uuid based on PR comments
ymao1 Nov 15, 2021
d769723
Removing observable
ymao1 Nov 15, 2021
0102c2e
Fixing functional test
ymao1 Nov 15, 2021
cc4781b
Merge branch 'main' of https://github.com/elastic/kibana into alertin…
ymao1 Nov 15, 2021
285911b
Adding to docs
ymao1 Nov 16, 2021
030a1fb
Merge branch 'main' of https://github.com/elastic/kibana into alertin…
ymao1 Nov 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ kibana_vars=(
xpack.alerting.invalidateApiKeysTask.interval
xpack.alerting.invalidateApiKeysTask.removalDelay
xpack.alerting.defaultRuleTaskTimeout
xpack.alerting.cancelAlertsOnRuleTimeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be adding docs for this config as well, right? Presumably we'll create a PR to cloud-enable this as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a cloud PR for this config when this PR gets merged but I didn't think user docs make sense for this config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think user docs make sense for this config?

Why not?

I think the one-liner that you added makes sense and is good enough :-)

xpack.alerts.healthCheck.interval
xpack.alerts.invalidateApiKeysTask.interval
xpack.alerts.invalidateApiKeysTask.removalDelay
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ The following table describes the properties of the `options` object.
|producer|The id of the application producing this rule type.|string|
|minimumLicenseRequired|The value of a minimum license. Most of the rules are licensed as "basic".|string|
|ruleTaskTimeout|The length of time a rule can run before being cancelled due to timeout. By default, this value is "5m".|string|
|cancelAlertsOnRuleTimeout|Whether to skip writing alerts and scheduling actions if a rule execution is cancelled due to timeout. By default, this value is set to "true".|boolean|
|useSavedObjectReferences.extractReferences|(Optional) When developing a rule type, you can choose to implement hooks for extracting saved object references from rule parameters. This hook will be invoked when a rule is created or updated. Implementing this hook is optional, but if an extract hook is implemented, an inject hook must also be implemented.|Function
|useSavedObjectReferences.injectReferences|(Optional) When developing a rule type, you can choose to implement hooks for injecting saved object references into rule parameters. This hook will be invoked when a rule is retrieved (get or find). Implementing this hook is optional, but if an inject hook is implemented, an extract hook must also be implemented.|Function
|isExportable|Whether the rule type is exportable from the Saved Objects Management UI.|boolean|
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/common/alert.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ export enum AlertExecutionStatusErrorReasons {
Execute = 'execute',
Unknown = 'unknown',
License = 'license',
Timeout = 'timeout',
}

export interface AlertExecutionStatus {
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/server/config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ describe('config validation', () => {
const config: Record<string, unknown> = {};
expect(configSchema.validate(config)).toMatchInlineSnapshot(`
Object {
"cancelAlertsOnRuleTimeout": true,
"defaultRuleTaskTimeout": "5m",
"healthCheck": Object {
"interval": "60m",
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugins/alerting/server/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ export const configSchema = schema.object({
defaultValue: DEFAULT_MAX_EPHEMERAL_ACTIONS_PER_ALERT,
}),
defaultRuleTaskTimeout: schema.string({ validate: validateDurationSchema, defaultValue: '5m' }),
cancelAlertsOnRuleTimeout: schema.boolean({ defaultValue: true }),
});

export type AlertsConfig = TypeOf<typeof configSchema>;
8 changes: 8 additions & 0 deletions x-pack/plugins/alerting/server/health/get_state.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
}),
pollInterval
).subscribe();
Expand Down Expand Up @@ -108,6 +109,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
}),
pollInterval,
retryDelay
Expand Down Expand Up @@ -154,6 +156,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
})
).toPromise();

Expand Down Expand Up @@ -186,6 +189,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
})
).toPromise();

Expand Down Expand Up @@ -218,6 +222,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
})
).toPromise();

Expand Down Expand Up @@ -247,6 +252,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
}),
retryDelay
).subscribe((status) => {
Expand Down Expand Up @@ -279,6 +285,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
}),
retryDelay
).subscribe((status) => {
Expand Down Expand Up @@ -317,6 +324,7 @@ describe('getHealthServiceStatusWithRetryAndErrorHandling', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '20m',
cancelAlertsOnRuleTimeout: true,
})
).toPromise();

Expand Down
14 changes: 14 additions & 0 deletions x-pack/plugins/alerting/server/plugin.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -73,6 +74,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -153,6 +155,15 @@ describe('Alerting Plugin', () => {
await setup.registerType(ruleType);
expect(ruleType.ruleTaskTimeout).toBe('5m');
});

it('should apply default config value for cancelAlertsOnRuleTimeout if undefined on rule type', async () => {
const ruleType = {
...sampleAlertType,
minimumLicenseRequired: 'basic',
} as AlertType<never, never, never, never, never, 'default', never>;
await setup.registerType(ruleType);
expect(ruleType.cancelAlertsOnRuleTimeout).toBe(true);
});
});
});

Expand All @@ -169,6 +180,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
const plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -210,6 +222,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 10,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
const plugin = new AlertingPlugin(context);

Expand Down Expand Up @@ -265,6 +278,7 @@ describe('Alerting Plugin', () => {
},
maxEphemeralActionsPerAlert: 100,
defaultRuleTaskTimeout: '5m',
cancelAlertsOnRuleTimeout: true,
});
const plugin = new AlertingPlugin(context);

Expand Down
40 changes: 23 additions & 17 deletions x-pack/plugins/alerting/server/plugin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ export const EVENT_LOG_ACTIONS = {
newInstance: 'new-instance',
recoveredInstance: 'recovered-instance',
activeInstance: 'active-instance',
executeTimeout: 'execute-timeout',
};
export const LEGACY_EVENT_LOG_ACTIONS = {
resolvedInstance: 'resolved-instance',
Expand Down Expand Up @@ -306,9 +307,11 @@ export class AlertingPlugin {
if (!(alertType.minimumLicenseRequired in LICENSE_TYPE)) {
throw new Error(`"${alertType.minimumLicenseRequired}" is not a valid license type`);
}
if (!alertType.ruleTaskTimeout) {
if (!alertType.ruleTaskTimeout || alertType.cancelAlertsOnRuleTimeout === undefined) {
ymao1 marked this conversation as resolved.
Show resolved Hide resolved
ymao1 marked this conversation as resolved.
Show resolved Hide resolved
alertingConfig.then((config) => {
alertType.ruleTaskTimeout = config.defaultRuleTaskTimeout;
alertType.ruleTaskTimeout = alertType.ruleTaskTimeout ?? config.defaultRuleTaskTimeout;
alertType.cancelAlertsOnRuleTimeout =
alertType.cancelAlertsOnRuleTimeout ?? config.cancelAlertsOnRuleTimeout;
ruleTypeRegistry.register(alertType);
});
} else {
Expand Down Expand Up @@ -386,21 +389,24 @@ export class AlertingPlugin {
return alertingAuthorizationClientFactory!.create(request);
};

taskRunnerFactory.initialize({
logger,
getServices: this.getServicesFactory(core.savedObjects, core.elasticsearch),
getRulesClientWithRequest,
spaceIdToNamespace,
actionsPlugin: plugins.actions,
encryptedSavedObjectsClient,
basePathService: core.http.basePath,
eventLogger: this.eventLogger!,
internalSavedObjectsRepository: core.savedObjects.createInternalRepository(['alert']),
executionContext: core.executionContext,
ruleTypeRegistry: this.ruleTypeRegistry!,
kibanaBaseUrl: this.kibanaBaseUrl,
supportsEphemeralTasks: plugins.taskManager.supportsEphemeralTasks(),
maxEphemeralActionsPerAlert: this.config.then((config) => config.maxEphemeralActionsPerAlert),
this.config.then((config) => {
taskRunnerFactory.initialize({
logger,
getServices: this.getServicesFactory(core.savedObjects, core.elasticsearch),
getRulesClientWithRequest,
spaceIdToNamespace,
actionsPlugin: plugins.actions,
encryptedSavedObjectsClient,
basePathService: core.http.basePath,
eventLogger: this.eventLogger!,
internalSavedObjectsRepository: core.savedObjects.createInternalRepository(['alert']),
executionContext: core.executionContext,
ruleTypeRegistry: this.ruleTypeRegistry!,
kibanaBaseUrl: this.kibanaBaseUrl,
supportsEphemeralTasks: plugins.taskManager.supportsEphemeralTasks(),
maxEphemeralActionsPerAlert: config.maxEphemeralActionsPerAlert,
cancelAlertsOnRuleTimeout: config.cancelAlertsOnRuleTimeout,
});
});

this.eventLogService!.registerSavedObjectProvider('alert', (request) => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ const createExecutionHandlerParams: jest.Mocked<
stateVal: 'My other {{state.value}} goes here',
},
supportsEphemeralTasks: false,
maxEphemeralActionsPerAlert: Promise.resolve(10),
maxEphemeralActionsPerAlert: 10,
};

beforeEach(() => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ export interface CreateExecutionHandlerOptions<
request: KibanaRequest;
alertParams: AlertTypeParams;
supportsEphemeralTasks: boolean;
maxEphemeralActionsPerAlert: Promise<number>;
maxEphemeralActionsPerAlert: number;
}

interface ExecutionHandlerOptions<ActionGroupIds extends string> {
Expand Down Expand Up @@ -157,7 +157,7 @@ export function createExecutionHandler<
const alertLabel = `${alertType.id}:${alertId}: '${alertName}'`;

const actionsClient = await actionsPlugin.getActionsClientWithRequest(request);
let ephemeralActionsToSchedule = await maxEphemeralActionsPerAlert;
let ephemeralActionsToSchedule = maxEphemeralActionsPerAlert;
for (const action of actions) {
if (
!actionsPlugin.isActionExecutable(action.id, action.actionTypeId, { notifyUsage: true })
Expand Down
15 changes: 8 additions & 7 deletions x-pack/plugins/alerting/server/task_runner/task_runner.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ describe('Task Runner', () => {
ruleTypeRegistry,
kibanaBaseUrl: 'https://localhost:5601',
supportsEphemeralTasks: false,
maxEphemeralActionsPerAlert: new Promise((resolve) => resolve(10)),
maxEphemeralActionsPerAlert: 10,
cancelAlertsOnRuleTimeout: true,
};

function testAgainstEphemeralSupport(
Expand Down Expand Up @@ -285,7 +286,7 @@ describe('Task Runner', () => {
expect(call.services).toBeTruthy();

const logger = taskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(2);
expect(logger.debug).toHaveBeenCalledTimes(3);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -432,7 +433,7 @@ describe('Task Runner', () => {
`);

const logger = customTaskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(3);
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -648,7 +649,7 @@ describe('Task Runner', () => {
expect(actionsClient.ephemeralEnqueuedExecution).toHaveBeenCalledTimes(0);

const logger = taskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).toHaveBeenCalledTimes(5);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -848,7 +849,7 @@ describe('Task Runner', () => {
expect(enqueueFunction).toHaveBeenCalledTimes(1);

const logger = customTaskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).toHaveBeenCalledTimes(5);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -1537,7 +1538,7 @@ describe('Task Runner', () => {
`);

const logger = customTaskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(4);
expect(logger.debug).toHaveBeenCalledTimes(5);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down Expand Up @@ -4339,7 +4340,7 @@ describe('Task Runner', () => {
expect(call.services).toBeTruthy();

const logger = taskRunnerFactoryInitializerParams.logger;
expect(logger.debug).toHaveBeenCalledTimes(2);
expect(logger.debug).toHaveBeenCalledTimes(3);
expect(logger.debug).nthCalledWith(1, 'executing alert test:1 at 1970-01-01T00:00:00.000Z');
expect(logger.debug).nthCalledWith(
2,
Expand Down
Loading