-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Alerts] Explore using multi_terms or composite aggs for threshold rules #125703
Comments
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
|
Below I'll summarize some problems with the current threshold rule implementation, and how we can address them: Problem: As outlined above, results depend on the order of terms provided, as each shard is limited in the number of buckets it can return. Caveats:
|
The current threshold rule implementation uses multiple levels of nested
terms
aggregations, one level for each field, to allow threshold rules to bucket by multiple fields. While this works most of the time, in some cases the results can depend on the order in which the nested buckets are defined. If the cardinality of one level ofterms
aggregation is too high then some buckets will be excluded from the results and sub-buckets will be limited to the included results. In this case, swapping the order of the aggregations could return different results.In 7.12, Elasticsearch gained the
multi_terms
aggregation capability, which allows aggregating by multiple fields in a single aggregation. This could replace N nested levels ofterms
aggs with a singlemulti_terms
aggregation that can be sorted by the final bucket sizes directly and removing the dependency on the order the field names are defined in. Themulti_terms
aggregation docs do come with a warning about performance, so we should test the performance of any new implementation. It may end up being faster than the nested implementation anyway.Alternatively, we could investigate composite aggregations to replace the N nested levels of aggs with a single aggregation. Composite aggregations don't allow sorting by bucket size, but are supposed to be faster than
multi_terms
and there appear to be workarounds with bucket selectors that at least allow filtering by bucket size.The text was updated successfully, but these errors were encountered: