Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the overhead of timeouts and low-level search cancellation. #25776

Merged
merged 3 commits into from
Jul 19, 2017

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jul 18, 2017

Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.

Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but I like the approach. I think we need to make sure that the new behavior does not add extra overhead with multiple segments (throwing a CollectionTerminatedException rather than a TimeLimitedException).

@@ -134,6 +150,43 @@ public Weight createWeight(Query query, boolean needsScores, float boost) throws
}

@Override
protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException {
final Weight cancellablWeight;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

final long time = counter.get();
if (time > maxTime) {
queryResult.searchTimedOut(true);
throw new CollectionTerminatedException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a TimeExceededException to stop the collection on all segments ? Otherwise you also need to check the timeout when the leafCollector is created like the CancellableCollector does ? If you prefer the second option you can remove the try/catch around the searcher.search below since this code does not throw TimeExceededException anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is necessary? The number of segments should be bounded so checking all of them should not be much more costly than stopping for all of them at once, at allows to keep things a bit simpler?

I only did things this way for cancellation so that we still check on a per-segment basis of low-level cancellation is disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use the CollectionTerminatedException for this purpose. We have a special handling for this exception in the collectors but that's for the leaf level only.
When the timeout is detected we should be able to stop the search immediately but if we have to build every scorer first it might be costly. Using a different exception that we catch at the higher level when we call searcher.search feels simpler to me and you don't need two levels of cancellation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops it looks like our comments crossed. Yes I agree with you, I had not fully understood what you meant in your previous comment and thought it would be more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I did not reuse the existing TimeExceededException because its constructor is private.


// we use the BooleanScorer window size as a base interval in order to make sure that we do not
// slow down boolean queries
private static final int INITIAL_INTERVAL = 1 << 11;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dakrone
Copy link
Member

dakrone commented Jul 18, 2017

@jpountz what is the bug here? should this be marked as "enhancement" instead? Is there an issue with the existing behavior?

@jpountz
Copy link
Contributor Author

jpountz commented Jul 19, 2017

I guess I saw it as a performance bug. I'm fine with making it an enhancement instead.

@jpountz
Copy link
Contributor Author

jpountz commented Jul 19, 2017

@jimczi I pushed a new commit that should address your concern.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz jpountz merged commit 55ad318 into elastic:master Jul 19, 2017
@jpountz jpountz deleted the fix/cancellation_overhead branch July 19, 2017 12:15
@jpountz
Copy link
Contributor Author

jpountz commented Jul 19, 2017

@dakrone I made it an enhancement.

jpountz added a commit that referenced this pull request Jul 21, 2017
…25776)

Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.
@clintongormley clintongormley added v6.0.0-beta1 :Search/Search Search-related issues that do not fall into other categories and removed v6.0.0 labels Jul 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v5.6.0 v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants