-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ML][DOCS] Add documentation for detector rules and filters (#32013)
- Loading branch information
1 parent
41b12e2
commit 4cdef4e
Showing
16 changed files
with
648 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,230 @@ | ||
[role="xpack"] | ||
[[ml-configuring-detector-custom-rules]] | ||
=== Customizing detectors with rules and filters | ||
|
||
<<ml-rules,Rules and filters>> enable you to change the behavior of anomaly | ||
detectors based on domain-specific knowledge. | ||
|
||
Rules describe _when_ a detector should take a certain _action_ instead | ||
of following its default behavior. To specify the _when_ a rule uses | ||
a `scope` and `conditions`. You can think of `scope` as the categorical | ||
specification of a rule, while `conditions` are the numerical part. | ||
A rule can have a scope, one or more conditions, or a combination of | ||
scope and conditions. | ||
|
||
Let us see how those can be configured by examples. | ||
|
||
==== Specifying rule scope | ||
|
||
Let us assume we are configuring a job in order to DNS data exfiltration. | ||
Our data contain fields "subdomain" and "highest_registered_domain". | ||
We can use a detector that looks like `high_info_content(subdomain) over highest_registered_domain`. | ||
If we run such a job it is possible that we discover a lot of anomalies on | ||
frequently used domains that we have reasons to trust. As security analysts, we | ||
are not interested in such anomalies. Ideally, we could instruct the detector to | ||
skip results for domains that we consider safe. Using a rule with a scope allows | ||
us to achieve this. | ||
|
||
First, we need to create a list with our safe domains. Those lists are called | ||
`filters` in {ml}. Filters can be shared across jobs. | ||
|
||
We create our filter using the {ref}/ml-put-filter.html[put filter API]: | ||
|
||
[source,js] | ||
---------------------------------- | ||
PUT _xpack/ml/filters/safe_domains | ||
{ | ||
"description": "Our list of safe domains", | ||
"items": ["safe.com", "trusted.com"] | ||
} | ||
---------------------------------- | ||
// CONSOLE | ||
|
||
Now, we can create our job specifying a scope that uses the filter for the | ||
`highest_registered_domain` field: | ||
|
||
[source,js] | ||
---------------------------------- | ||
PUT _xpack/ml/anomaly_detectors/dns_exfiltration_with_rule | ||
{ | ||
"analysis_config" : { | ||
"bucket_span":"5m", | ||
"detectors" :[{ | ||
"function":"high_info_content", | ||
"field_name": "subdomain", | ||
"over_field_name": "highest_registered_domain", | ||
"custom_rules": [{ | ||
"actions": ["skip_result"], | ||
"scope": { | ||
"highest_registered_domain": { | ||
"filter_id": "safe_domains", | ||
"filter_type": "include" | ||
} | ||
} | ||
}] | ||
}] | ||
}, | ||
"data_description" : { | ||
"time_field":"timestamp" | ||
} | ||
} | ||
---------------------------------- | ||
// CONSOLE | ||
|
||
As time advances and we see more data and more results, we might encounter new | ||
domains that we want to add in the filter. We can do that by using the | ||
{ref}/ml-update-filter.html[update filter API]: | ||
|
||
[source,js] | ||
---------------------------------- | ||
POST _xpack/ml/filters/safe_domains/_update | ||
{ | ||
"add_items": ["another-safe.com"] | ||
} | ||
---------------------------------- | ||
// CONSOLE | ||
// TEST[setup:ml_filter_safe_domains] | ||
|
||
Note that we can provide scope for any of the partition/over/by fields. | ||
In the following example we scope multiple fields: | ||
|
||
[source,js] | ||
---------------------------------- | ||
PUT _xpack/ml/anomaly_detectors/scoping_multiple_fields | ||
{ | ||
"analysis_config" : { | ||
"bucket_span":"5m", | ||
"detectors" :[{ | ||
"function":"count", | ||
"partition_field_name": "my_partition", | ||
"over_field_name": "my_over", | ||
"by_field_name": "my_by", | ||
"custom_rules": [{ | ||
"actions": ["skip_result"], | ||
"scope": { | ||
"my_partition": { | ||
"filter_id": "filter_1" | ||
}, | ||
"my_over": { | ||
"filter_id": "filter_2" | ||
}, | ||
"my_by": { | ||
"filter_id": "filter_3" | ||
} | ||
} | ||
}] | ||
}] | ||
}, | ||
"data_description" : { | ||
"time_field":"timestamp" | ||
} | ||
} | ||
---------------------------------- | ||
// CONSOLE | ||
|
||
Such a detector will skip results when the values of all 3 scoped fields | ||
are included in the referenced filters. | ||
|
||
==== Specifying rule conditions | ||
|
||
Imagine a detector that looks for anomalies in CPU utilization. | ||
Given a machine that is idle for long enough, small movement in CPU could | ||
result in anomalous results where the `actual` value is quite small, for | ||
example, 0.02. Given our knowledge about how CPU utilization behaves we might | ||
determine that anomalies with such small actual values are not interesting for | ||
investigation. | ||
|
||
Let us now configure a job with a rule that will skip results where CPU | ||
utilization is less than 0.20. | ||
|
||
[source,js] | ||
---------------------------------- | ||
PUT _xpack/ml/anomaly_detectors/cpu_with_rule | ||
{ | ||
"analysis_config" : { | ||
"bucket_span":"5m", | ||
"detectors" :[{ | ||
"function":"high_mean", | ||
"field_name": "cpu_utilization", | ||
"custom_rules": [{ | ||
"actions": ["skip_result"], | ||
"conditions": [ | ||
{ | ||
"applies_to": "actual", | ||
"operator": "lt", | ||
"value": 0.20 | ||
} | ||
] | ||
}] | ||
}] | ||
}, | ||
"data_description" : { | ||
"time_field":"timestamp" | ||
} | ||
} | ||
---------------------------------- | ||
// CONSOLE | ||
|
||
When there are multiple conditions they are combined with a logical `and`. | ||
This is useful when we want the rule to apply to a range. We simply create | ||
a rule with two conditions, one for each end of the desired range. | ||
|
||
Here is an example where a count detector will skip results when the count | ||
is greater than 30 and less than 50: | ||
|
||
[source,js] | ||
---------------------------------- | ||
PUT _xpack/ml/anomaly_detectors/rule_with_range | ||
{ | ||
"analysis_config" : { | ||
"bucket_span":"5m", | ||
"detectors" :[{ | ||
"function":"count", | ||
"custom_rules": [{ | ||
"actions": ["skip_result"], | ||
"conditions": [ | ||
{ | ||
"applies_to": "actual", | ||
"operator": "gt", | ||
"value": 30 | ||
}, | ||
{ | ||
"applies_to": "actual", | ||
"operator": "lt", | ||
"value": 50 | ||
} | ||
] | ||
}] | ||
}] | ||
}, | ||
"data_description" : { | ||
"time_field":"timestamp" | ||
} | ||
} | ||
---------------------------------- | ||
// CONSOLE | ||
|
||
==== Rules in the life-cycle of a job | ||
|
||
Rules only affect results created after the rules were applied. | ||
Let us imagine that we have configured a job and it has been running | ||
for some time. After observing its results we decide that we can employ | ||
rules in order to get rid of some uninteresting results. We can use | ||
the update-job API to do so. However, the rule we added will only be in effect | ||
for any results created from the moment we added the rule onwards. Past results | ||
will remain unaffected. | ||
|
||
==== Using rules VS filtering data | ||
|
||
It might appear like using rules is just another way of filtering the data | ||
that feeds into a job. For example, a rule that skips results when the | ||
partition field value is in a filter sounds equivalent to having a query | ||
that filters out such documents. But it is not. There is a fundamental | ||
difference. When the data is filtered before reaching a job it is as if they | ||
never existed for the job. With rules, the data still reaches the job and | ||
affects its behavior (depending on the rule actions). | ||
|
||
For example, a rule with the `skip_result` action means all data will still | ||
be modeled. On the other hand, a rule with the `skip_model_update` action means | ||
results will still be created even though the model will not be updated by | ||
data matched by a rule. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.