Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimising sorted scroll requests #23022

Closed
clintongormley opened this issue Feb 7, 2017 · 6 comments
Closed

Optimising sorted scroll requests #23022

clintongormley opened this issue Feb 7, 2017 · 6 comments
Labels
>enhancement help wanted adoptme high hanging fruit :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@clintongormley
Copy link
Contributor

The performance of sorted scroll requests can be dominated by the time it takes to sort all documents on each tranche of hits. This can partially be amortised by increasing the size of the scroll request, but that strategy soon starts to fail for other reasons. Ultimately the more documents you have, the longer it takes to sort them.

When sorting by e.g. date, it can be much more efficient to break a single scroll request up into chunks, so that each scroll request deals with a subset of docs within a certain date range. Anecdotal evidence on an index of 50M docs reports an improvement from 7h to 10 mins!

It would be nice to be able to automate this internally within a single scroll request. The trickiest part is to figure out how big a chunk should be, given that data can be non-uniform. Simply asking the user wouldn't be sufficient as they may set a chunk of 1 hour, but an hour of missing data would simply return no results, indicating the end of the scroll request.

Here are a few possibilities:

  • Use an open-ended range, setting gt but not lt - the deeper you get the fewer documents you would match
  • If the date field is the only field in the query (ie no intersections) then the BKD tree could be used to return "1000 docs with a value greater than X"
  • Use a shard-level auto-adjusting chunk size which could start small and increase the chunk size if too few documents are returned, or decrease the chunk size if too many are returned.
@dimitris-athanasiou
Copy link
Contributor

As @jimczi suggested, leveraging #9572 to determine a chunk size would be great to solve this.

@jpountz
Copy link
Contributor

jpountz commented Feb 13, 2017

I think one way that we could make it work in the general case would be to start collecting with an open-ended range filter, and every X collected docs (X being a multiple of size):

  • stop collecting
  • replace the upper bound of the range filter with the value of the least competitive doc in the priority queue.
  • resume collecting (using an initial advance() call in order to skip the already visited docs)

In the worst case that the index is sorted in the reverse order, this would make things slower than they are today, but in the average case (or in the best case that the index is sorted in the same order) I think this could make things much faster?

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Scroll labels Feb 14, 2018
@andyb-elastic
Copy link
Contributor

@elastic/es-search-aggs

@rajalakshmi-v15
Copy link

Hi. Is this issue still open?

@jpountz
Copy link
Contributor

jpountz commented Jul 19, 2019

Yes. We have made progress for queries sorted by score when we introduced block-max WAND and we are about to make progress for queries sorted by field as well via #39770.

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@jpountz
Copy link
Contributor

jpountz commented Nov 18, 2022

This issue is effectively implemented when sorting on _doc, numeric and date fields. We would need #91680 to also support this on keyword fields.

@jpountz jpountz closed this as completed Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement help wanted adoptme high hanging fruit :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

6 participants