[Security Solution] Instrument rule executors with Elastic APM #117672

xcrzx · 2021-11-05T14:49:38Z

Summary

This PR shows how Elastic APM can help us find performance bottlenecks in Security API routes and rule executors.

Walkthrough

To instrument your local Kibana with APM, create config/apm.dev.js with the following content:

module.exports = {
  environment: '<use uniq name here>', // You will use this name to filter logs from your local environment
  active: true,
};

Start Kibana as you usually do (yarn start); it'll start sending logs to the shared APM Server.
Navigate to https://ela.st/kibana-ops and select Default space.
Go to Observability > APM > Services, select your environment and click on Kibana service.
Click on the Transactions tab and select the task-run transaction type.
Then find transactions corresponding to the rule type you want to inspect, e.g., siem.queryRule rule execution.
Well done 🙌 Now you can investigate the rule execution timeline

Merge after [APM] Add more info to the "Number of items in this trace exceed what is displayed" (xpack.apm.waterfall.exceedsMax) EuiCallOut #118282 is fixed. Without it, traces are not visible. Should be fixed with this PR: [Alerting] Add more rule execution context #117504

x-pack/plugins/security_solution/server/lib/detection_engine/signals/executors/eql.ts

x-pack/plugins/security_solution/server/lib/detection_engine/signals/utils.ts

elasticmachine · 2021-12-15T14:20:06Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine · 2021-12-15T14:20:08Z

Pinging @elastic/security-solution (Team: SecuritySolution)

x-pack/plugins/security_solution/server/utils/with_security_span.ts

spong · 2021-12-15T21:52:59Z

...ecurity_solution/server/lib/detection_engine/rule_execution_log/rule_execution_log_client.ts

-  public deleteCurrentStatus(ruleId: string): Promise<void> {
-    return this.client.deleteCurrentStatus(ruleId);
+  public async deleteCurrentStatus(ruleId: string): Promise<void> {
+    await withSecuritySpan('RuleExecutionLogClient.deleteCurrentStatus', () =>


Why are we await'ing here now and weren't before?

I removed return, and added async + await. Don't remember why I did that, tbh. But these functions are identical:

const foo = async () => { await asyncFnReturningVoid(); }; const bar = () => { return asyncFnReturningVoid(); };

x-pack/plugins/security_solution/server/lib/detection_engine/signals/signal_rule_alert_type.ts

spong · 2021-12-15T22:08:07Z

x-pack/plugins/security_solution/server/lib/detection_engine/signals/executors/threshold.ts

-    for (const [hash, entry] of Object.entries(signalHistory)) {
-      if (entry.lastSignalTimestamp < tuple.from.valueOf()) {
-        toDelete.push(hash);
+  return withSecuritySpan('detectionEngine thresholdExecutor', async () => {


Just thresholdExecutor? Doesn't look like the other executors prefix with detectionEngine?

Suggested change

return withSecuritySpan('detectionEngine thresholdExecutor', async () => {

return withSecuritySpan('thresholdExecutor', async () => {

I previously used detectionEngine as a prefix but changed to withSecuritySpan later to avoid duplication. Missed that piece during refactoring, thanks!

spong · 2021-12-15T22:17:34Z

...ecurity_solution/server/lib/detection_engine/rule_types/create_security_rule_type_wrapper.ts

-              eventLogService,
-              logger,
-            });
+        agent.setTransactionName(`${options.rule.ruleTypeId} rule execution`);


nit: Probably don't need rule here since all the ruleTypeId's end with Rule anyway?

Suggested change

agent.setTransactionName(`${options.rule.ruleTypeId} rule execution`);

agent.setTransactionName(`${options.rule.ruleTypeId} execution`);

Agree, changed!

spong · 2021-12-15T22:24:42Z

x-pack/plugins/security_solution/server/utils/with_security_span.ts

+
+type Span = Exclude<typeof agent.currentSpan, undefined | null>;
+
+export const withSecuritySpan = <T>(


nit: Add JSDoc for when folks should use this function and any necessary pre-req's (does this need to happen within the scope of agent.setTransactionName()?)

Added doc. As for prerequisites, there aren't any. We can use this method anywhere throughout our codebase. All main code paths are already wrapped in transactions on the framework level.

spong · 2021-12-15T22:27:26Z

...ecurity_solution/server/lib/detection_engine/rule_types/create_security_rule_type_wrapper.ts

+            const errorMessage = buildRuleMessage(`Check privileges failed to execute ${exc}`);
+            logger.error(errorMessage);
+            await ruleStatusClient.logStatusChange({
+              ...basicLogArguments,
+              message: errorMessage,
+              newStatus: RuleExecutionStatus['partial failure'],


Should we be setting agent.setTransactionOutcome('failure'); here (and in other failure/success cases) just as you did over in signal_rule_alert_type?

Ahhh, is this covered globally by task_runner?

kibana/x-pack/plugins/alerting/server/task_runner/task_runner.ts

Lines 640 to 646 in 1b02742

if (apm.currentTransaction) {

if (executionStatus.status === 'ok' || executionStatus.status === 'active') {

apm.currentTransaction.setOutcome('success');

} else if (executionStatus.status === 'error' || executionStatus.status === 'unknown') {

apm.currentTransaction.setOutcome('failure');

}

}

Yes, they added it recently, so there's no need to set the outcome on our side anymore. Thanks for pointing it out. I also removed setOutcome from signal_rule_alert_type .

spong

Checked out, was able to verify and test locally against a cloud APM instance (presumable major version mis-match when trying against ops.kibana.dev), and performed code review.

Few nits and questions around async/await usage and setting transaction outcome in create_security_rule_type_wrapper.ts, but other than that LGTM! 👍 Thanks for instrumenting all our rule types @xcrzx! This is going to be extreeeemely helpful in debugging going forward! 😀

kibana-ci · 2021-12-16T13:44:43Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💔 Build #13610 failed 3d5e03a94085a0aaaf56c688197b45d29bafcca8
💚 Build #13420 succeeded e2c4e933a7375f7e5dc93a593c7f32c3671e2684
💚 Build #12851 succeeded b21fcfd2e684d0c4e4e4bf1de4fa45a26e5bada8
💚 Build #12463 succeeded 1c0a240b900516bd1f25a132c430eb0cd65472ba
💔 Build #12453 failed 9967baaed3e6d9d1dcaf8495e86834f77fb6cec2
💛 Build #7310 was flaky 93bc2eccafdc910b7bd84dd967164ceffa2e7235

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @xcrzx

xcrzx changed the base branch from main to 7.16 November 5, 2021 14:49

xcrzx force-pushed the apm-test branch from 07e7a06 to 889054d Compare November 8, 2021 10:17

xcrzx changed the base branch from 7.16 to main November 8, 2021 10:17

xcrzx force-pushed the apm-test branch 7 times, most recently from 4a29ee2 to d21d723 Compare November 10, 2021 11:41

xcrzx commented Nov 10, 2021

View reviewed changes

x-pack/plugins/security_solution/server/lib/detection_engine/signals/executors/eql.ts Outdated Show resolved Hide resolved

xcrzx commented Nov 10, 2021

View reviewed changes

x-pack/plugins/security_solution/server/lib/detection_engine/signals/utils.ts Outdated Show resolved Hide resolved

xcrzx force-pushed the apm-test branch 2 times, most recently from ea464f7 to 6c8aeac Compare November 17, 2021 09:08

xcrzx changed the title ~~Instrument rule executors with Elastic APM~~ [Security Solution] Instrument rule executors with Elastic APM Nov 17, 2021

xcrzx self-assigned this Nov 17, 2021

xcrzx force-pushed the apm-test branch 2 times, most recently from dba12d0 to 93bc2ec Compare November 17, 2021 09:28

elastic deleted a comment from kibanamachine Nov 17, 2021

xcrzx added the v8.0.0 label Nov 17, 2021

xcrzx force-pushed the apm-test branch 4 times, most recently from b21fcfd to 413d0c8 Compare December 15, 2021 14:13

xcrzx marked this pull request as ready for review December 15, 2021 14:20

xcrzx requested a review from a team as a code owner December 15, 2021 14:20

xcrzx force-pushed the apm-test branch from 413d0c8 to e2c4e93 Compare December 15, 2021 14:58

banderror added the performance label Dec 15, 2021

spong reviewed Dec 15, 2021

View reviewed changes

x-pack/plugins/security_solution/server/utils/with_security_span.ts Outdated Show resolved Hide resolved

spong reviewed Dec 15, 2021

View reviewed changes

x-pack/plugins/security_solution/server/lib/detection_engine/signals/signal_rule_alert_type.ts Show resolved Hide resolved

spong reviewed Dec 15, 2021

View reviewed changes

spong approved these changes Dec 15, 2021

View reviewed changes

xcrzx force-pushed the apm-test branch from e2c4e93 to 3d5e03a Compare December 16, 2021 11:07

banderror removed the v8.0.0 label Dec 16, 2021

Test APM instrumentation of rule executors

7983202

xcrzx force-pushed the apm-test branch from 3d5e03a to 7983202 Compare December 16, 2021 11:59

xcrzx added backport:skip This commit does not require backporting and removed auto-backport Deprecated - use backport:version if exact versions are needed labels Dec 16, 2021

xcrzx enabled auto-merge (squash) December 16, 2021 12:48

xcrzx merged commit 7847bc8 into elastic:main Dec 16, 2021

xcrzx deleted the apm-test branch December 16, 2021 13:46

TinLe pushed a commit to TinLe/kibana that referenced this pull request Dec 22, 2021

Test APM instrumentation of rule executors (elastic#117672)

1823776

xcrzx mentioned this pull request Apr 4, 2022

[Security Solution] Instrument Security frontend code to collect performance data #129324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution] Instrument rule executors with Elastic APM #117672

[Security Solution] Instrument rule executors with Elastic APM #117672

xcrzx commented Nov 5, 2021 •

edited

Loading

elasticmachine commented Dec 15, 2021

elasticmachine commented Dec 15, 2021

spong Dec 15, 2021

xcrzx Dec 16, 2021

spong Dec 15, 2021

xcrzx Dec 16, 2021 •

edited

Loading

spong Dec 15, 2021

xcrzx Dec 16, 2021

spong Dec 15, 2021

xcrzx Dec 16, 2021

spong Dec 15, 2021

spong Dec 15, 2021

xcrzx Dec 16, 2021

spong left a comment

kibana-ci commented Dec 16, 2021

	return withSecuritySpan('detectionEngine thresholdExecutor', async () => {
	return withSecuritySpan('thresholdExecutor', async () => {

	agent.setTransactionName(`${options.rule.ruleTypeId} rule execution`);
	agent.setTransactionName(`${options.rule.ruleTypeId} execution`);


		type Span = Exclude<typeof agent.currentSpan, undefined \| null>;

		export const withSecuritySpan = <T>(

	if (apm.currentTransaction) {
	if (executionStatus.status === 'ok' \|\| executionStatus.status === 'active') {
	apm.currentTransaction.setOutcome('success');
	} else if (executionStatus.status === 'error' \|\| executionStatus.status === 'unknown') {
	apm.currentTransaction.setOutcome('failure');
	}
	}

[Security Solution] Instrument rule executors with Elastic APM #117672

[Security Solution] Instrument rule executors with Elastic APM #117672

Conversation

xcrzx commented Nov 5, 2021 • edited Loading

Summary

Walkthrough

elasticmachine commented Dec 15, 2021

elasticmachine commented Dec 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xcrzx Dec 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spong left a comment

Choose a reason for hiding this comment

kibana-ci commented Dec 16, 2021

💚 Build Succeeded

Metrics [docs]

History

xcrzx commented Nov 5, 2021 •

edited

Loading

xcrzx Dec 16, 2021 •

edited

Loading