Optimising sorted scroll requests #23022

clintongormley · 2017-02-07T16:07:59Z

The performance of sorted scroll requests can be dominated by the time it takes to sort all documents on each tranche of hits. This can partially be amortised by increasing the size of the scroll request, but that strategy soon starts to fail for other reasons. Ultimately the more documents you have, the longer it takes to sort them.

When sorting by e.g. date, it can be much more efficient to break a single scroll request up into chunks, so that each scroll request deals with a subset of docs within a certain date range. Anecdotal evidence on an index of 50M docs reports an improvement from 7h to 10 mins!

It would be nice to be able to automate this internally within a single scroll request. The trickiest part is to figure out how big a chunk should be, given that data can be non-uniform. Simply asking the user wouldn't be sufficient as they may set a chunk of 1 hour, but an hour of missing data would simply return no results, indicating the end of the scroll request.

Here are a few possibilities:

Use an open-ended range, setting gt but not lt - the deeper you get the fewer documents you would match
If the date field is the only field in the query (ie no intersections) then the BKD tree could be used to return "1000 docs with a value greater than X"
Use a shard-level auto-adjusting chunk size which could start small and increase the chunk size if too few documents are returned, or decrease the chunk size if too many are returned.

The text was updated successfully, but these errors were encountered:

dimitris-athanasiou · 2017-02-07T17:29:05Z

As @jimczi suggested, leveraging #9572 to determine a chunk size would be great to solve this.

jpountz · 2017-02-13T17:53:05Z

I think one way that we could make it work in the general case would be to start collecting with an open-ended range filter, and every X collected docs (X being a multiple of size):

stop collecting
replace the upper bound of the range filter with the value of the least competitive doc in the priority queue.
resume collecting (using an initial advance() call in order to skip the already visited docs)

In the worst case that the index is sorted in the reverse order, this would make things slower than they are today, but in the average case (or in the best case that the index is sorted in the same order) I think this could make things much faster?

andyb-elastic · 2018-03-21T19:07:34Z

@elastic/es-search-aggs

rajalakshmi-v15 · 2019-07-19T08:51:18Z

Hi. Is this issue still open?

jpountz · 2019-07-19T09:16:49Z

Yes. We have made progress for queries sorted by score when we introduced block-max WAND and we are about to make progress for queries sorted by field as well via #39770.

jpountz · 2022-11-18T08:42:22Z

This issue is effectively implemented when sorting on _doc, numeric and date fields. We would need #91680 to also support this on keyword fields.

clintongormley added :Scroll discuss >enhancement labels Feb 7, 2017

clintongormley added help wanted adoptme high hanging fruit and removed discuss labels Feb 17, 2017

jimczi mentioned this issue Nov 8, 2017

Add composite aggregator #26800

Merged

clintongormley mentioned this issue Jan 22, 2018

Eagerly fetch future results while scrolling #28303

Closed

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Scroll labels Feb 14, 2018

rjernst added the Team:Search Meta label for search team label May 4, 2020

jpountz closed this as completed Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimising sorted scroll requests #23022

Optimising sorted scroll requests #23022

clintongormley commented Feb 7, 2017

dimitris-athanasiou commented Feb 7, 2017

jpountz commented Feb 13, 2017

andyb-elastic commented Mar 21, 2018

rajalakshmi-v15 commented Jul 19, 2019

jpountz commented Jul 19, 2019

jpountz commented Nov 18, 2022 •

edited

Loading

Optimising sorted scroll requests #23022

Optimising sorted scroll requests #23022

Comments

clintongormley commented Feb 7, 2017

dimitris-athanasiou commented Feb 7, 2017

jpountz commented Feb 13, 2017

andyb-elastic commented Mar 21, 2018

rajalakshmi-v15 commented Jul 19, 2019

jpountz commented Jul 19, 2019

jpountz commented Nov 18, 2022 • edited Loading

jpountz commented Nov 18, 2022 •

edited

Loading