-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Add more rule execution context #117504
Conversation
The added label and the new transaction name will be very helpful! |
@dgieselaar Can you please remind us how we identify as spans the different rule actions: email, index, server-log, ServiceNow-itsm, webhook, pagerduty... For example,
|
@cyrille-leclerc unfortunately the trace waterfall is broken, @cauemarcondes is working on a fix, I'll update the PR with a screenshot for actions if that's fixed before this PR lands. I'm not sure if I get your second point though, can you elaborate? |
Thanks @dgieselaar
For the rule actions (email, index, server-log, servicenow-itsm, webhook, pagerduty...), it would be great to capture span labels that characterize the execution like the URL invoked, the authentication username ... in order to slice and dice traces in any dimension and also enable visualization of the destination on the service map and as an uninstrumented backend. Here is an example that has a lot of commonalities with the labels we collect on CI/CD pipelines steps like the pipeline checkout step:
|
Hmm, I don't want to inadvertently leak sensitive data, I'm not sure how to prevent that if we for instance stringify params or config. Any thoughts here @elastic/kibana-alerting-services? |
Good catch @dgieselaar , we sanitized a bunch of attributes in the Jenkins Otel integration, typically parsing URLs and reconstructing them to ensure they don't leak credentials. Here is an example: |
@elasticmachine merge upstream |
@dgieselaar Is it possible to view latency distribution for all rules of a specific rule type with these changes? Let's say I'm investigating performance issues with
But with this PR, I cannot select all query rules in a single view, as transactions now split but rule name. So of I have 100+ activated rules, it becomes tedious to examine them one by one. Or I'm missing something? |
Not in the APM app. You can use e.g. Lens to gather that data, but it won't allow you to inspect a trace without manually copying trace ids. Usually we separate transaction groups if they have different performance characteristics, which I would expect to be the case here from rule instance to rule instance. |
@elasticmachine merge upstream |
That's a very good point: for guided rules, the execution path and the performance characteristics are likely to be very homogeneous and thus it could make sense to have the same transaction group. |
I'm not sure how to make that distinction (between guided and generic rules) from the alerting framework's perspective. It is something that the security rule types can set themselves in the rule executor. Maybe that's a good compromise? |
That could be an interesting starting point |
Yeah, it looks like we can use |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left some nits on naming but looks great otherwise. Thanks!
@@ -105,7 +105,7 @@ export class ActionExecutor { | |||
name: `execute_action`, | |||
type: 'actions', | |||
labels: { | |||
actionId, | |||
actions_action_id: actionId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our new terminology, I think this should be actions_connector_id
and actions_connector_type_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated the labels w/ new terminology!
@@ -855,6 +881,12 @@ function generateNewAndRecoveredInstanceEvents< | |||
const recoveredAlertInstanceIds = Object.keys(recoveredAlertInstances); | |||
const newIds = without(currentAlertInstanceIds, ...originalAlertInstanceIds); | |||
|
|||
if (apm.currentTransaction) { | |||
apm.currentTransaction.addLabels({ | |||
alerting_new_instances: newIds.length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our updated terminology, I believe this should be alerting_new_alerts
, alerting_active_alerts
, alerting_recovered_alerts
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: Kibana Machine <[email protected]>
Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>
Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>
Co-authored-by: Kibana Machine <[email protected]>
Closes #113506.
The following changes were made:
Executing alerting rule