-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search optimisation - add canMatch early aborts for queries on "_index" field #48681
Conversation
Pinging @elastic/es-search (:Search/Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks good to me. The original issue also mentions terms
query and it should also be possible to add the logic to the prefix
query ?
server/src/main/java/org/elasticsearch/index/query/TermQueryBuilder.java
Outdated
Show resolved
Hide resolved
d7487dc
to
b8338e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me in general, can you add tests with aliases too?
// Special-case optimisation for canMatch phase: | ||
// We can skip querying this shard if the index name doesn't match the value of this query on the "_index" field. | ||
QueryShardContext shardContext = queryRewriteContext.convertToShardContext(); | ||
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to use the indexNameMatcher so that it works with aliases too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did wonder about supporting aliases but wouldn't that give false negatives? I assume the indexed term in the _index
field is only the physical index name so it would be a mistake to fail the canMatch phase if a query term happens to match an alias? My assumption was this issue is about optimising existing index name matching behaviour, not changing it to add alias support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be:
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) { | |
if (shardContext != null && shardContext.indexMatches(value + "*") == false) { |
?
b8338e2
to
64814e9
Compare
I've added tests but a couple of anomalies exist:
|
test this please |
f8f37d5
to
dde93fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left more comments.
Prefix and wildcard queries on "_index" field work with physical index names but not alias names (term and terms queries work with both physical and alias names). A known limitation?
It should work the same. I added a comment on the prefix query rewrite, maybe that's the reason you see it not working ?
"Skipped" counting logic in results is a bit of a mystery - always looks to be one less than what I would expect to see. The 70_skip_shards yaml test index needed 2 shards before it would count just one as skipped. What is the expected behaviour?
When all shards are skipped we use one to produce an empty result. The logic is explained here:
elasticsearch/server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java
Line 85 in 4e6756f
// this is a special case where we have no hit but we need to get at least one search response in order |
I guess we could avoid doing this if there are no aggregations but that's out of scope for this pr.
// Special-case optimisation for canMatch phase: | ||
// We can skip querying this shard if the index name doesn't match the value of this query on the "_index" field. | ||
QueryShardContext shardContext = queryRewriteContext.convertToShardContext(); | ||
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be:
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) { | |
if (shardContext != null && shardContext.indexMatches(value + "*") == false) { |
?
// Special-case optimisation for canMatch phase: | ||
// We can skip querying this shard if the index name doesn't match the value of this query on the "_index" field. | ||
QueryShardContext shardContext = queryRewriteContext.convertToShardContext(); | ||
if (shardContext != null && shardContext.indexMatches(BytesRefs.toString(value)) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BytesRefs.toString
is not needed ?
if (shardContext != null && shardContext.indexMatches(BytesRefs.toString(value)) == false) { | |
if (shardContext != null && shardContext.indexMatches(value) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike wildcardquery value
is an object not a string. I remember having a test failure where I assumed String and it was a BytesRef so had to add this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix change suggestion looks to have fixed things, thanks
1b3e0b3
to
9fe82da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two minor comments but it looks good to me otherwise.
- do: | ||
search: | ||
ccs_minimize_roundtrips: false | ||
rest_total_hits_as_int: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this parameter is supposed to be for bw compat of the response format only, can you use track_total_hits=true instead?
return index.getName().startsWith(pattern.substring(0, pattern.length()-1)); | ||
} | ||
return pattern.equals(index.getName()); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to reuse Regex.simpleMatch
?
961344f
to
e21a561
Compare
test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
…n index that doesn’t match the query expression. Part of the “canMatch” phase optimisations. Closes elastic#48473
* rest_total_hits_as_int -> track_total_hits * Use Regex.simpleMatch
5a79628
to
6ac08b1
Compare
test this please |
…x" field (elastic#48681) Make queries on the “_index” field fast-fail if the target shard is an index that doesn’t match the query expression. Part of the “canMatch” phase optimisations. Closes elastic#48473
Make queries on the “_index” field fast-fail if the target shard is an index name that doesn’t match the query expression. Part of the “canMatch” phase optimisations.
Closes #48473