Reduce the overhead of timeouts and low-level search cancellation. #25776

jpountz · 2017-07-18T14:55:57Z

Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.

Setting a timeout or enforcing low-level search cancellation used to make us wrap the collector and check either the current time or whether the search task was cancelled for every collected document. This can be significant overhead on cheap queries that match many documents. This commit changes the approach to wrap the bulk scorer rather than the collector and exponentially increase the interval between two consecutive checks in order to reduce the overhead of those checks.

jimczi

I left some comments but I like the approach. I think we need to make sure that the new behavior does not add extra overhead with multiple segments (throwing a CollectionTerminatedException rather than a TimeLimitedException).

jimczi · 2017-07-18T16:46:31Z

core/src/main/java/org/elasticsearch/search/internal/ContextIndexSearcher.java

@@ -134,6 +150,43 @@ public Weight createWeight(Query query, boolean needsScores, float boost) throws
    }

    @Override
+    protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException {
+        final Weight cancellablWeight;


nit: missing e

jimczi · 2017-07-18T17:05:49Z

core/src/main/java/org/elasticsearch/search/query/QueryPhase.java

+                    final long time = counter.get();
+                    if (time > maxTime) {
+                        queryResult.searchTimedOut(true);
+                        throw new CollectionTerminatedException();


You could use a TimeExceededException to stop the collection on all segments ? Otherwise you also need to check the timeout when the leafCollector is created like the CancellableCollector does ? If you prefer the second option you can remove the try/catch around the searcher.search below since this code does not throw TimeExceededException anymore.

Do you think it is necessary? The number of segments should be bounded so checking all of them should not be much more costly than stopping for all of them at once, at allows to keep things a bit simpler?

I only did things this way for cancellation so that we still check on a per-segment basis of low-level cancellation is disabled.

I don't think we should use the CollectionTerminatedException for this purpose. We have a special handling for this exception in the collectors but that's for the leaf level only.
When the timeout is detected we should be able to stop the search immediately but if we have to build every scorer first it might be costly. Using a different exception that we catch at the higher level when we call searcher.search feels simpler to me and you don't need two levels of cancellation ?

oops it looks like our comments crossed. Yes I agree with you, I had not fully understood what you meant in your previous comment and thought it would be more complicated.

For the record, I did not reuse the existing TimeExceededException because its constructor is private.

jimczi · 2017-07-18T17:07:10Z

core/src/main/java/org/elasticsearch/search/internal/CancellableBulkScorer.java

+
+    // we use the BooleanScorer window size as a base interval in order to make sure that we do not
+    // slow down boolean queries
+    private static final int INITIAL_INTERVAL = 1 << 11;


imotov

Nice!

dakrone · 2017-07-18T20:28:43Z

@jpountz what is the bug here? should this be marked as "enhancement" instead? Is there an issue with the existing behavior?

jpountz · 2017-07-19T06:27:47Z

I guess I saw it as a performance bug. I'm fine with making it an enhancement instead.

jpountz · 2017-07-19T08:38:44Z

@jimczi I pushed a new commit that should address your concern.

jimczi

LGTM

jpountz · 2017-07-19T13:03:21Z

@dakrone I made it an enhancement.

…25776) Setting a timeout or enforcing low-level search cancellation used to make us wrap the collector and check either the current time or whether the search task was cancelled for every collected document. This can be significant overhead on cheap queries that match many documents. This commit changes the approach to wrap the bulk scorer rather than the collector and exponentially increase the interval between two consecutive checks in order to reduce the overhead of those checks.

jpountz added >bug v6.0.0 labels Jul 18, 2017

jpountz requested review from imotov and jimczi July 18, 2017 14:55

jimczi reviewed Jul 18, 2017

View reviewed changes

imotov approved these changes Jul 18, 2017

View reviewed changes

iter

42bbc19

jimczi approved these changes Jul 19, 2017

View reviewed changes

iter

9f95825

jpountz merged commit 55ad318 into elastic:master Jul 19, 2017

jpountz deleted the fix/cancellation_overhead branch July 19, 2017 12:15

jpountz added >enhancement v5.6.0 and removed >bug labels Jul 19, 2017

clintongormley added v6.0.0-beta1 :Search/Search Search-related issues that do not fall into other categories and removed v6.0.0 labels Jul 25, 2017

This was referenced Aug 17, 2017

Consider enabling low-level search cancellation by default #26258

Closed

Re-enable filter editor suggestions elastic/kibana#13376

Merged

fantapsody mentioned this pull request May 28, 2018

Add a hard time limit for the entire search request #30897

Closed

missinglink mentioned this pull request Nov 13, 2019

discuss: ES query cancellation pelias/api#1384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the overhead of timeouts and low-level search cancellation. #25776

Reduce the overhead of timeouts and low-level search cancellation. #25776

jpountz commented Jul 18, 2017

jimczi left a comment

jimczi Jul 18, 2017

jpountz Jul 18, 2017

jimczi Jul 18, 2017

jpountz Jul 18, 2017

jimczi Jul 19, 2017

jpountz Jul 19, 2017

jpountz Jul 19, 2017

jimczi Jul 18, 2017

imotov left a comment

dakrone commented Jul 18, 2017

jpountz commented Jul 19, 2017

jpountz commented Jul 19, 2017

jimczi left a comment

jpountz commented Jul 19, 2017

Reduce the overhead of timeouts and low-level search cancellation. #25776

Reduce the overhead of timeouts and low-level search cancellation. #25776

Conversation

jpountz commented Jul 18, 2017

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imotov left a comment

Choose a reason for hiding this comment

dakrone commented Jul 18, 2017

jpountz commented Jul 19, 2017

jpountz commented Jul 19, 2017

jimczi left a comment

Choose a reason for hiding this comment

jpountz commented Jul 19, 2017