-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the overhead of timeouts and low-level search cancellation. #25776
Conversation
Setting a timeout or enforcing low-level search cancellation used to make us wrap the collector and check either the current time or whether the search task was cancelled for every collected document. This can be significant overhead on cheap queries that match many documents. This commit changes the approach to wrap the bulk scorer rather than the collector and exponentially increase the interval between two consecutive checks in order to reduce the overhead of those checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments but I like the approach. I think we need to make sure that the new behavior does not add extra overhead with multiple segments (throwing a CollectionTerminatedException rather than a TimeLimitedException).
@@ -134,6 +150,43 @@ public Weight createWeight(Query query, boolean needsScores, float boost) throws | |||
} | |||
|
|||
@Override | |||
protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException { | |||
final Weight cancellablWeight; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
final long time = counter.get(); | ||
if (time > maxTime) { | ||
queryResult.searchTimedOut(true); | ||
throw new CollectionTerminatedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use a TimeExceededException to stop the collection on all segments ? Otherwise you also need to check the timeout when the leafCollector is created like the CancellableCollector does ? If you prefer the second option you can remove the try/catch around the searcher.search below since this code does not throw TimeExceededException anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is necessary? The number of segments should be bounded so checking all of them should not be much more costly than stopping for all of them at once, at allows to keep things a bit simpler?
I only did things this way for cancellation so that we still check on a per-segment basis of low-level cancellation is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use the CollectionTerminatedException for this purpose. We have a special handling for this exception in the collectors but that's for the leaf level only.
When the timeout is detected we should be able to stop the search immediately but if we have to build every scorer first it might be costly. Using a different exception that we catch at the higher level when we call searcher.search
feels simpler to me and you don't need two levels of cancellation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops it looks like our comments crossed. Yes I agree with you, I had not fully understood what you meant in your previous comment and thought it would be more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I did not reuse the existing TimeExceededException because its constructor is private.
|
||
// we use the BooleanScorer window size as a base interval in order to make sure that we do not | ||
// slow down boolean queries | ||
private static final int INITIAL_INTERVAL = 1 << 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@jpountz what is the bug here? should this be marked as "enhancement" instead? Is there an issue with the existing behavior? |
I guess I saw it as a performance bug. I'm fine with making it an enhancement instead. |
@jimczi I pushed a new commit that should address your concern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@dakrone I made it an enhancement. |
…25776) Setting a timeout or enforcing low-level search cancellation used to make us wrap the collector and check either the current time or whether the search task was cancelled for every collected document. This can be significant overhead on cheap queries that match many documents. This commit changes the approach to wrap the bulk scorer rather than the collector and exponentially increase the interval between two consecutive checks in order to reduce the overhead of those checks.
Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.
This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.