-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onboard Inventory Metric Threshold rule type with FAAD #178506
Conversation
/ci |
/ci |
Pinging @elastic/response-ops (Team:ResponseOps) |
evaluationValues?: Array<number | null>, | ||
thresholds?: Array<number | null> | ||
) => InventoryMetricThresholdAlert; | ||
|
||
export const createInventoryMetricThresholdExecutor = (libs: InfraBackendLibs) => | ||
libs.metricsRules.createLifecycleRuleExecutor< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to remove this usage of createLifecycleRuleExecutor
so that this no longer goes through the rule registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed that! Thanks for pointing out.
Fixed with 5c5315f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; noted some potential source changes to reduce casts. Also left a question about the threshold values when using a "between" comparator, but the behaviour here (not writing threshold values to the alerts docs) seems the same as es query, so perhaps it's as good as we can do today.
alerts: { | ||
...MetricsRulesTypeAlertDefinition, | ||
shouldWrite: true, | ||
} as IRuleTypeAlerts<InventoryMetricThresholdAlert>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to get rid of this cast by changing MetricsRulesTypeAlertDefinition
from IRuleTypeAlerts
to IRuleTypeAlerts<InventoryMetricThresholdAlert>
in x-pack/plugins/observability_solution/infra/server/lib/alerting/register_rule_types.ts
, and it seemed to like it - I was able to remove the cast here completely anyway. Had to import InventoryMetricThresholdAlert
in the register_rule_types.ts
module ...
Though looking at that MetricsRulesTypeAlertDefinition
, it has shouldWrite: false
. Can we change the definition to shouldWrite: true
and then remove the override here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple rule types using the MetricsRulesTypeAlertDefinition
. If all of them are onboarded to the framework AAD and no longer using the lifecycle executor, then we can set shouldWrite: true
on the MetricsRulesTypeAlertDefinition
and use it as is without the override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already talked about it with Alexi.
Whoever merges their PR last, can remove this line :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This done now.
> & { | ||
// Defining a custom type for this because the schema generation script doesn't allow explicit null values | ||
[ALERT_EVALUATION_VALUES]?: Array<number | null>; | ||
[ALERT_EVALUATION_THRESHOLD]?: Array<number | null>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like THRESHOLD isn't omitted above ... I was able to add it to the Omit<>
with no errors ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it, thanks.
[ALERT_REASON]: reason, | ||
[ALERT_ACTION_GROUP]: actionGroup, | ||
[ALERT_EVALUATION_VALUES]: evaluationValues, | ||
[ALERT_EVALUATION_THRESHOLD]: thresholds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a "between" comparator, which should be passing an array of values to thresholds. The rule works fine, but this field is not set in the alert document. For a single-value threshold (anything but between or not between), it printed the threshold fine.
This may be a pre-existing issue with this rule type - I noticed that es query seems to not populate this field if it's multi-value as well, perhaps that's the pattern for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is intentional:
https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/infra/server/lib/alerting/common/get_values.ts#L34
We don't save threshold if at least one of the comparators is a multiple threshold one such as 'in between'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmuellr @ersin-erdal I've created a ticket for this topic to save the range threshold as an array. I also saw the the same behaviour in the es query rule, what do you think about adding the range threshold as an array in all rules (including es query)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be good, but I suspect we will have pre-existing data with just a scalar value - so any typing on such fields in Typescript will have to be number | number[]
.
fields: { | ||
alertsClient.setAlertData({ | ||
id: UNGROUPED_FACTORY_KEY, | ||
payload: { | ||
[ALERT_REASON]: reason, | ||
[ALERT_ACTION_GROUP]: actionGroup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't need to report this back since it's set by the framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
const esClient = services.scopedClusterClient.asCurrentUser; | ||
|
||
const { savedObjectsClient, alertFactory: baseAlertFactory, alertsClient } = services; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the baseAlertFactory
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested creating alerts and recover them + notification, everything seemed as expected! 💪🏻
Regarding code, I've checked it but I rely mostly on ResponseOps team review :)
💚 Build Succeeded
Metrics [docs]History
To update your PR or re-run it, just comment with: cc @ersin-erdal |
Towards: #169867
This PR onboards Inventory Metric Threshold rule type with FAAD.
To verify.
I used data-generator to generate metric data.
Then created an Inventory Threshold rule with actions (alert and recovered),
conitions:
For Hosts, When CPU usage is above 10
.Inventory Threshold uses the following formula to calculate the result:
(
system.cpu.user.pct
+system.cpu.system.pct
) /system.cpu.cores
Set
system.cpu.user.pct
= 1system.cpu.system.pct
= 1system.cpu.cores
= 4in the cpu-001. This makes the CPU usage 0.5 (50%) for the
host-1
and run the generator with
./generate metrics
Your rule should create an alert and should saved it in
.internal.alerts-observability.metrics.alerts-default-000001
Then set
system.cpu.user.pct
=0 andsystem.cpu.system.pct
=0.The alert should be recovered and the AAD in the above index should be updated
kibana.alert.status: recovered
.