-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Instrument rule executors with Elastic APM #117672
Conversation
4a29ee2
to
d21d723
Compare
x-pack/plugins/security_solution/server/lib/detection_engine/signals/executors/eql.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/security_solution/server/lib/detection_engine/signals/utils.ts
Outdated
Show resolved
Hide resolved
ea464f7
to
6c8aeac
Compare
dba12d0
to
93bc2ec
Compare
b21fcfd
to
413d0c8
Compare
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
x-pack/plugins/security_solution/server/utils/with_security_span.ts
Outdated
Show resolved
Hide resolved
public deleteCurrentStatus(ruleId: string): Promise<void> { | ||
return this.client.deleteCurrentStatus(ruleId); | ||
public async deleteCurrentStatus(ruleId: string): Promise<void> { | ||
await withSecuritySpan('RuleExecutionLogClient.deleteCurrentStatus', () => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we await
'ing here now and weren't before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed return
, and added async
+ await
. Don't remember why I did that, tbh. But these functions are identical:
const foo = async () => {
await asyncFnReturningVoid();
};
const bar = () => {
return asyncFnReturningVoid();
};
x-pack/plugins/security_solution/server/lib/detection_engine/signals/signal_rule_alert_type.ts
Show resolved
Hide resolved
for (const [hash, entry] of Object.entries(signalHistory)) { | ||
if (entry.lastSignalTimestamp < tuple.from.valueOf()) { | ||
toDelete.push(hash); | ||
return withSecuritySpan('detectionEngine thresholdExecutor', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thresholdExecutor
? Doesn't look like the other executors prefix with detectionEngine
?
return withSecuritySpan('detectionEngine thresholdExecutor', async () => { | |
return withSecuritySpan('thresholdExecutor', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously used detectionEngine
as a prefix but changed to withSecuritySpan
later to avoid duplication. Missed that piece during refactoring, thanks!
eventLogService, | ||
logger, | ||
}); | ||
agent.setTransactionName(`${options.rule.ruleTypeId} rule execution`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Probably don't need rule
here since all the ruleTypeId
's end with Rule
anyway?
agent.setTransactionName(`${options.rule.ruleTypeId} rule execution`); | |
agent.setTransactionName(`${options.rule.ruleTypeId} execution`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, changed!
|
||
type Span = Exclude<typeof agent.currentSpan, undefined | null>; | ||
|
||
export const withSecuritySpan = <T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add JSDoc for when folks should use this function and any necessary pre-req's (does this need to happen within the scope of agent.setTransactionName()
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added doc. As for prerequisites, there aren't any. We can use this method anywhere throughout our codebase. All main code paths are already wrapped in transactions on the framework level.
const errorMessage = buildRuleMessage(`Check privileges failed to execute ${exc}`); | ||
logger.error(errorMessage); | ||
await ruleStatusClient.logStatusChange({ | ||
...basicLogArguments, | ||
message: errorMessage, | ||
newStatus: RuleExecutionStatus['partial failure'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be setting agent.setTransactionOutcome('failure');
here (and in other failure/success cases) just as you did over in signal_rule_alert_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, is this covered globally by task_runner
?
kibana/x-pack/plugins/alerting/server/task_runner/task_runner.ts
Lines 640 to 646 in 1b02742
if (apm.currentTransaction) { | |
if (executionStatus.status === 'ok' || executionStatus.status === 'active') { | |
apm.currentTransaction.setOutcome('success'); | |
} else if (executionStatus.status === 'error' || executionStatus.status === 'unknown') { | |
apm.currentTransaction.setOutcome('failure'); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they added it recently, so there's no need to set the outcome on our side anymore. Thanks for pointing it out. I also removed setOutcome
from signal_rule_alert_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked out, was able to verify and test locally against a cloud APM instance (presumable major version mis-match when trying against ops.kibana.dev
), and performed code review.
Few nits and questions around async
/await
usage and setting transaction outcome in create_security_rule_type_wrapper.ts
, but other than that LGTM! 👍 Thanks for instrumenting all our rule types @xcrzx! This is going to be extreeeemely helpful in debugging going forward! 😀
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @xcrzx |
Summary
This PR shows how Elastic APM can help us find performance bottlenecks in Security API routes and rule executors.
Walkthrough
config/apm.dev.js
with the following content:yarn start
); it'll start sending logs to the shared APM Server.task-run
transaction type.siem.queryRule rule execution
.