-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Composite aggs seems to sort too slowly with filter queries #70035
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
OK, for curiousity's sake, I ran the datafeed without that simple filter query:
⚡ ⚡ ⚡ ⚡ Oh man, if we could get these speeds with filter queries!!! |
Note: for machine learning datafeeds, the first composite agg source will always be a |
Digging into the code some and discussed the various execution paths with @nik9000 . Lines 402 to 411 in 94b9c4b
Is the path we are hitting as:
But, it does seem weird to hit every document. Assume that the top source is a Making that "slow path" faster will greatly improve throughput for transforms (which constantly uses composite aggs with range queries and term filter queries) and datafeeds (which allow arbitrary user provided filter queries). The nice thing about datafeeds is that the top source will ALWAYS be a For transforms AND datafeeds, the I leave this in more capable hands than mine :). |
I've raised #92197 which might be able to help with this issue. |
closing as not planned. |
Piggy-backing off of previous work: #28745
During the work in #69970 some troubling performance data has reared its ugly head.
Given the following query:
The following composite agg moves at an almost glacial pace:
Here are some doc stats:
In datafeeds we "chunk" through when scrolling through data. Consequently, we hit every document and make multiple queries. This is because sorting by
timestamp
can be costly when hitting many docs.So, our scrolling datafeed had the following performance:
Job finished in ~6 minutes
Doing composite agg without chunking:
🐌 🐌 🐌
🐌 🐌 🐌
job finished in 40+ mintes
It seems to me that the composite agg is doing WAY too much work. I think it may be sorting WAY too many documents given the sources.
As an experiment, I added some time based query chunking in
25264688ms
intervals (calculated based on term cardinality, count, and total time range)🔥 🔥 🔥
🔥 🔥 🔥
Job finished in ~4 minutes
Datafeeds (and transforms) will ALWAYS be a
filter
based query (ignoring scores). These queries are user provided, so they could definitely be anything. But it seems to me that there is still room for improvement in the composite agg.The text was updated successfully, but these errors were encountered: