You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hiya SIEM team. Over at Elasticsearch we've been looking into a few performance related items, and some of the aggs that SIEM dashboard uses caught our eye.
Benchmarks?
Do we benchmark any of the dashboards? The Elasticsearch team uses Rally extensively, perhaps we could find a way to translate the dashboard requests into some kind of rally track? It'd help both of us keep an eye on performance, make changes easier to think about, and easier to collaborate on since we'd have shared dataset to look at
Usage of filter aggs
There seems to be widespread use of filter aggs, which is non-ideal. Filter aggs are relatively expensive, especially when compared to filtering in the query component of a search request. Each individual filter agg needs to load the bitset of docs that contain that value, and check it against the doc one-by-one (as opposed to query filters which can use a leap-frog mechanism to minimize checks).
So the first thing would be trying to move filter aggs up into the query where possible, if they are being used to exclude documents.
If they are being used for counts (like here), there are some options:
Try to rewrite some of those to operate as terms aggs. E.g. if multiple filters share the same field (event.module or something), a terms agg will give you doc counts for all the different event modules. Terms is pretty aggressively optimized because it is so widely used. It's hard to say for sure if it would help, but from some informal testing (see rally test at end) it tends to be noticeably faster.
For fields that are non-overlapping and sparse, a value_count agg can be useful. E.g. if only a subset of docs have a certain field and you want to know how many there are, a value_count on that field will return the count without having to bucket them. A relatively niche usage here, but handy if applicable
Rewrite into an msearch and skip aggregating all together. Each msearch clause will be a single search request filtering for specifically the criteria needed. With size: 0 you don't incur a fetch-overhead, and with track_total_hits: true you can still get the total count.
3b. If you don't need exact counts, setting track_total_hits: false will enable the new block max-wand optimization and return results very fast. You can configure a threshold when it stops counting, so you can say "> 100,000 results", etc
I ran a simple test showing msearch ("count"), filter, filters, term and value_count. As you can see, msearch is fastest by a large margin, followed by term and value_count. Filter/filters are generally slower
terms instead of filter for partitioning
Related to 1) above, if there is a scenario where you wish to partition the same field into multiple buckets, a terms agg will be faster (and simpler query) than a series of filter aggs. For example, this request uses two filter aggs to create "success" and "failure" buckets.
Instead, a single terms agg on the field will produce both buckets and do it cheaper. In addition, the child filter: event.outcome: success agg is unnecessary because by the nature of the parent bucket, all docs in that bucket are already success/failure. You can just grab the count from the bucket doc_count.
If there are unrelated values in the field and you only want "success"/"failure", you can use the include/exclude functionality of a terms agg to only include terms you care about.
AutoDateHistogram min_interval
There's some optimization work done in ES (coming 7.8/7.9) which will improve auto-date-histo speed noticeably. But in the mean time, specifying a min_interval will help prevent extra work. E.g. auto-date-histo will start with second-level intervals and round up from there. If querying a 12h time range it almost never makes sense to look at second-intervals, so that part of the rounding is wasted effort.
This does remove some of the convenience of "fire and forget" aspect of auto-date-histo, but it can translate into notable performance improvements. I'm not sure the best option here, but if there's a way to intelligently set min_interval it'd probably help.
Closing
Sorry for the long ticket! I decided to file this as a ticket instead of email/slack/google doc/etc because it seemed easier to work through on github. Feel free to ping me if you have questions, happy to help out! It's hard to say for sure if any of these suggestions will actually help (although the msearch case is very compelling due to how it works), which is why I led with the question about benchmarks. Setting those up might be a good first step so we can quantitatively tweak the queries/aggs.
The text was updated successfully, but these errors were encountered:
Hiya SIEM team. Over at Elasticsearch we've been looking into a few performance related items, and some of the aggs that SIEM dashboard uses caught our eye.
Benchmarks?
Do we benchmark any of the dashboards? The Elasticsearch team uses Rally extensively, perhaps we could find a way to translate the dashboard requests into some kind of rally track? It'd help both of us keep an eye on performance, make changes easier to think about, and easier to collaborate on since we'd have shared dataset to look at
Usage of
filter
aggsThere seems to be widespread use of
filter
aggs, which is non-ideal. Filter aggs are relatively expensive, especially when compared to filtering in thequery
component of a search request. Each individual filter agg needs to load the bitset of docs that contain that value, and check it against the doc one-by-one (as opposed toquery
filters which can use a leap-frog mechanism to minimize checks).So the first thing would be trying to move
filter
aggs up into the query where possible, if they are being used to exclude documents.If they are being used for counts (like here), there are some options:
Try to rewrite some of those to operate as
terms
aggs. E.g. if multiple filters share the same field (event.module
or something), a terms agg will give you doc counts for all the different event modules. Terms is pretty aggressively optimized because it is so widely used. It's hard to say for sure if it would help, but from some informal testing (see rally test at end) it tends to be noticeably faster.For fields that are non-overlapping and sparse, a
value_count
agg can be useful. E.g. if only a subset of docs have a certain field and you want to know how many there are, avalue_count
on that field will return the count without having to bucket them. A relatively niche usage here, but handy if applicableRewrite into an
msearch
and skip aggregating all together. Each msearch clause will be a single search request filtering for specifically the criteria needed. Withsize: 0
you don't incur a fetch-overhead, and withtrack_total_hits: true
you can still get the total count.3b. If you don't need exact counts, setting
track_total_hits: false
will enable the new block max-wand optimization and return results very fast. You can configure a threshold when it stops counting, so you can say"> 100,000 results"
, etcI ran a simple test showing msearch (
"count"
), filter, filters, term and value_count. As you can see, msearch is fastest by a large margin, followed by term and value_count. Filter/filters are generally slowerterms
instead offilter
for partitioningRelated to 1) above, if there is a scenario where you wish to partition the same field into multiple buckets, a
terms
agg will be faster (and simpler query) than a series offilter
aggs. For example, this request uses twofilter
aggs to create "success" and "failure" buckets.Instead, a single
terms
agg on the field will produce both buckets and do it cheaper. In addition, the childfilter: event.outcome: success
agg is unnecessary because by the nature of the parent bucket, all docs in that bucket are already success/failure. You can just grab the count from the bucket doc_count.If there are unrelated values in the field and you only want "success"/"failure", you can use the
include
/exclude
functionality of aterms
agg to only include terms you care about.AutoDateHistogram min_interval
There's some optimization work done in ES (coming 7.8/7.9) which will improve auto-date-histo speed noticeably. But in the mean time, specifying a
min_interval
will help prevent extra work. E.g. auto-date-histo will start with second-level intervals and round up from there. If querying a 12h time range it almost never makes sense to look at second-intervals, so that part of the rounding is wasted effort.This does remove some of the convenience of "fire and forget" aspect of auto-date-histo, but it can translate into notable performance improvements. I'm not sure the best option here, but if there's a way to intelligently set
min_interval
it'd probably help.Closing
Sorry for the long ticket! I decided to file this as a ticket instead of email/slack/google doc/etc because it seemed easier to work through on github. Feel free to ping me if you have questions, happy to help out! It's hard to say for sure if any of these suggestions will actually help (although the msearch case is very compelling due to how it works), which is why I led with the question about benchmarks. Setting those up might be a good first step so we can quantitatively tweak the queries/aggs.
The text was updated successfully, but these errors were encountered: