-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use search_after in scroll #4280
Conversation
5666fa7
to
1a68823
Compare
quickwit/quickwit-search/src/root.rs
Outdated
@@ -491,6 +491,7 @@ async fn search_partial_hits_phase_with_scroll( | |||
) | |||
.next_page(leaf_search_resp.partial_hits.len() as u64); | |||
|
|||
scroll_ctx.truncate_start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that you are calling truncate start to handle the case where the request max hits is > SCROLL_BATCH_LEN. If so, I don't understand why this isn't truncate
(as opposed to truncate_start) we want to call here.
If this is the use case you are targetting I suggest we :
- do not cache anything if max_hits is too large
- return an error and log a warn if scroll is used with max_hits too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the firsts elements in cache have already been consumed. To not store too many elements (so, to reduce memory pressure), we throw away elements so as to keep at most SCROLL_BATCH_LEN
elems.
This is arguably not useful, we could keep only one element (to know the values to provide for search_after), and throw everything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
I'd rather remove the truncate_before. If I understand correctly, it is there to optimize memory.
The complexity added is non-trivial.
The truncation relies on a very bad spec on elasticsearch: scrolling is not nilpotent.
Calling Elasticsearch twice with the same scroll id increases the scroll. (This is bad engineering because it prevents retries on all kinds of errors.)
In addition, the function has a strange behavior that is not suggested in its name. It truncates, but attempts to leave at least one element.
we throw away elements so as to keep at most SCROLL_BATCH_LEN elems.
Even without the truncation, I think we already have at mostSCROLL_BATCH_LEN
today. (maybe it is a change you made recently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do we want to keep scrolling idempotent? Before this PR, it was (though at the cost of having each page being slower and more memory intensive to create than the previous).
After, but with the original truncate_start
(or without it being called, but with usage of search_after based scrolling), it is idempotent in some cases, but not always (and never if a page is more than SCROLL_BATCH_LEN
)
With the last commit, it rarely is idempotent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a request change: I did not understand the truncate_start
stuff. Please have a look at my comment.
As discussed offline, let's put the search_after key in the scroll key, and avoid caching when max hits is too large, |
) -> ScrollKeyAndStartOffset { | ||
let scroll_ulid: Ulid = Ulid::new(); | ||
// technically we could only initialize search_after on first call to next_page, and use | ||
// default() before, but that feels like partial initilization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// default() before, but that feels like partial initilization. | |
// default() before, but that feels like partial initialization. |
Description
fix #3748
use search_after as an implementation detail in scroll. We lose the ability to replay request with the same
scroll_id
if pages are for more than 1k document (which isn't something that's technically allowed anyway), but we gain that the underlying request is always for the same amount of documents, and not growing linearly like before.Fix a bug where asking for pages of >1k doc would yield a first page of 1k, and only have the right number of docs on subsequent pages.
Make it so that we can always answer a page in at most a single underlying query (before it was up to 10 queries. If the page was for 10k docs, we would issue 10 requests for 1k docs)
How was this PR tested?
tested manually and updated unit tests