Term query on _index field does not take aliases into account #23306

uschindler · 2017-02-22T12:38:52Z

While trying to remove indicesQuery (deprecated by #17710) in my own code base, I noticed the following problem: The indicesQuery supports to pass any name of an index to the query, so also aliases work (of course it does not work with filtered aliases, doesn't it? - but that does not matter for the problem).

In my case I needed a query that returns results of a query for a specific index only (only alias name known, not its real name) and a matchAll query for the other indexes. Setup was:

alias named "allMyStuff" with 3 indexes. Each of the three indexes have an internal, timestamped name, but there also exists an alias pointing to the most recent index. The client code only knows the alias, not the real index name.
query executes on "allMyStaff" alias and filters the indices to query using indicesQuery.

This works with indicesQuery, specifiying the alias to only run the specific query on and return matchAll for all the others. With a term query on "_index" field, you cannot use the alias anymore. It only return documents if the index matches completely, so termQuery("_index", "alias") does not match anything.

So I don't agree with deprecating and removing indicesQuery unless this can be fixed.

clintongormley · 2017-02-22T14:59:39Z

Good point @uschindler - thanks

uschindler · 2017-02-22T15:24:51Z

I think the fix would be quite easy by using the same matching algorithm like indicesQuery does on the doToQuery() method:

@Override
protected Query doToQuery(QueryShardContext context) throws IOException {
    if (context.matchesIndices(indices)) {
        return innerQuery.toQuery(context);
    }
    return noMatchQuery.toQuery(context);
}

I think, IndexFieldType should do something similar, here the example for termQuery method (similar in the other methods like termsQuery):

@Override
public Query termQuery(Object value, @Nullable QueryShardContext context) {
    if (context.matchesIndices(new String[] { value.toString() /*or some other conversion of the term*/))) {
        return Queries.newMatchAllQuery();
    } else {
        return Queries.newMatchNoDocsQuery("Index didn't match. Index queried: " + context.index().getName() + " vs. " + value);
    }
}

I can try to provide a PR about this!

uschindler · 2017-02-22T15:27:20Z

There is still the question what to do with "filtered" aliases. Those may already broken with indicesQuery (not verified - if the filter of alias is applied afterwards, it would work). The "correct" way would be to apply the filter instead of matchAllDocs.

uschindler · 2017-02-22T18:40:56Z

...not so easy to fix because context.matchesIndices disappeared in master branch.

javanna · 2017-03-20T19:00:36Z

hey @uschindler you raise a good point which we have overlooked. The indices query bugged us for quite a while and we weren't sure how many people were using it. We need to fix the alias problem, but that doesn't require restoring the indices query. The fact that you'd need the matchesIndices method back for your fix means that part of the code removed should be restored, see 103984a . The method was removed as it was left unused after the indices query removal.

Let us discuss first though whether we come up with different ideas to address this.

uschindler · 2017-03-20T19:12:24Z

What was the problem with the indices query? Quite simple and straight-forward!

sjmc7 · 2017-05-18T15:45:47Z

We were also using the indices query with aliases so would be good to see this fixed.

carrino · 2017-10-20T18:44:15Z

+1 Was using indices query with aliases, also. Ran into this when migrating to 6.0.0-rc1

carrino · 2017-10-20T23:53:56Z

In addition to the alias problem even if you pass all the indexes the alias could resolve to, I found another issue.

Before, the IndicesQuery would fully encapsulate the inner query and it would not get parsed on an index of the wrong type. This means you could do a query one index with some fields and another index with another set of fields wrapping each in an IndicesQuery.

If you convert this to a boolean with must clauses, all the boolean query is parsed even if the must index clause would fail. This parsing can lead to number_format_exceptions.

uschindler · 2017-10-21T12:40:04Z

Hi @carrino: I agree with you. I had the same problem. I don't fully understand why the Elasticsearch team deprecated the indices query. It is very useful and a "term query on the index field" is by far no replacement!!!

I'd suggest to unddo the deprecation and keep the query as is. Otherwise I have to write a plugin that restores the query, which is way stupid. The whole class is only a few lines of code and brings so much phantastic usage opportunities. And there is no reason to remove it from the perspective of the backend.

carrino · 2017-10-25T16:55:56Z

I was able to work around this limitation by changing bool query to short circuit if one of it's filter or must clauses finds a MatchNoDocsQuery. I don't see any other query types that can be used to encapsulate a query on an index besides boolean being modified in this way. Any other ideas?

diff --git a/core/src/main/java/org/elasticsearch/index/query/BoolQueryBuilder.java b/core/src/main/java/org/elasticsearch/index/qu
index ac57c2a..11c714e 100644
--- a/core/src/main/java/org/elasticsearch/index/query/BoolQueryBuilder.java
+++ b/core/src/main/java/org/elasticsearch/index/query/BoolQueryBuilder.java
@@ -23,6 +23,7 @@ import org.apache.lucene.search.BooleanClause;
 import org.apache.lucene.search.BooleanClause.Occur;
 import org.apache.lucene.search.BooleanQuery;
 import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.search.MatchNoDocsQuery;
 import org.apache.lucene.search.Query;
 import org.elasticsearch.Version;
 import org.elasticsearch.common.ParseField;
@@ -376,10 +377,16 @@ public class BoolQueryBuilder extends AbstractQueryBuilder<BoolQueryBuilder> {
     @Override
     protected Query doToQuery(QueryShardContext context) throws IOException {
         BooleanQuery.Builder booleanQueryBuilder = new BooleanQuery.Builder();
-        addBooleanClauses(context, booleanQueryBuilder, mustClauses, BooleanClause.Occur.MUST);
+        if (!addBooleanClauses(context, booleanQueryBuilder, filterClauses, BooleanClause.Occur.FILTER)) {
+            return new MatchNoDocsQuery();
+        }
+
+        if (!addBooleanClauses(context, booleanQueryBuilder, mustClauses, BooleanClause.Occur.MUST)) {
+            return new MatchNoDocsQuery();
+        }
+
         addBooleanClauses(context, booleanQueryBuilder, mustNotClauses, BooleanClause.Occur.MUST_NOT);
         addBooleanClauses(context, booleanQueryBuilder, shouldClauses, BooleanClause.Occur.SHOULD);
-        addBooleanClauses(context, booleanQueryBuilder, filterClauses, BooleanClause.Occur.FILTER);
         BooleanQuery booleanQuery = booleanQueryBuilder.build();
         if (booleanQuery.clauses().isEmpty()) {
             return new MatchAllDocsQuery();
@@ -395,7 +402,7 @@ public class BoolQueryBuilder extends AbstractQueryBuilder<BoolQueryBuilder> {
         return adjustPureNegative ? fixNegativeQueryIfNeeded(query) : query;
     }
 
-    private static void addBooleanClauses(QueryShardContext context, BooleanQuery.Builder booleanQueryBuilder,
+    private static boolean addBooleanClauses(QueryShardContext context, BooleanQuery.Builder booleanQueryBuilder,
                                           List<QueryBuilder> clauses, Occur occurs) throws IOException {
         for (QueryBuilder query : clauses) {
             Query luceneQuery = null;
@@ -409,8 +416,12 @@ public class BoolQueryBuilder extends AbstractQueryBuilder<BoolQueryBuilder> {
                     luceneQuery = query.toFilter(context);
                     break;
             }
+            if ((occurs == Occur.FILTER || occurs == Occur.MUST) && luceneQuery.equals(new MatchNoDocsQuery())) {
+                return false;
+            }
             booleanQueryBuilder.add(new BooleanClause(luceneQuery, occurs));
         }
+        return true;
     }
 
     @Override

javanna · 2017-10-27T13:15:42Z

We finally discussed this, sorry @uschindler it took us a while to get to this! We want to fix this, but instead of restoring the removed indices query, we would prefer to try and make term query against _index work with aliases.

carrino · 2017-10-27T14:43:49Z

Did you talk about the query parse encapsulation aspect at all? Moving from indicesQuery to bool query with termQuery can result in parse errors. Do we want to make the booleanQueryBuilder change above to prevent this regression?

javanna · 2017-10-27T14:50:55Z

We have yet to look at implementation details, we need to try it out in practice and see what comes out code-wise. Stay tuned.

uschindler · 2017-10-27T15:55:16Z

I agree with @carrino - the good thing with the indices query was the fact that the query part that was not used for the current shard/index was not parsed at all. It was therefor possible to query 2 completely different indexes with incompatible schemas and have a separate inner query on both sides of the indices query. With the current "bool" approach, the full query is parsed for all shards and those shards that are from an incompatible index wil fail if you hit a query with field name checking (which is the default for most queries now).

In fact the indices query was something like an if/then/else statement in the query where you were able to pass a different query to an index based on the name. With a bool query that's not possible, as the parsing stage is done before execution.

acarstoiu · 2017-12-05T14:41:58Z

@uschindler, @carrino: As a workaround, perform the index filtering in the URL path and apply the index-specific queries ANDed with a term filter on _type. Be aware that has_parent/has_child predicates need to be run with ignore_unmapped set.
I know that this won't work in all cases, but you have a good chance. This solves also the matter with filtered aliases.

@ developers: Thank you for keeping our minds sharp, but people use Elasticsearch precisely because they want to get rid of the workarounds and limitations they had when searching in traditional databases.

carrino · 2018-02-26T23:14:18Z

Can we add the "regression" label to this also as it is a regression with es6 and is blocking upgrading?

elasticmachine · 2018-03-26T03:17:14Z

Pinging @elastic/es-search-aggs

tomcallahan · 2018-07-05T21:03:34Z

@talevy can you elaborate why this was labeled discuss?

talevy · 2018-07-09T14:33:04Z

@tomcallahan for sure. this latest comment #23306 (comment) didn't make it clear that we have a known plan on how to implement this, and so during our issue cleanup I thought it was worth bringing back up to make sure there is a plan. No other reason. If it is clear that we want to do this, and there is no discussion necessary, and only implementation details, then I'm happy to remove the label

tomcallahan · 2018-07-09T14:41:33Z

OK, let's take off the label until we know we have something specific to examine

barrucadu · 2019-06-04T11:20:12Z

Is there any plan to fix this?

I've just hit this issue migrating to ES6 and have come up with the solution of, when constructing a query, first making a request to ES to get the real name of an alias. An additional round-trip per query isn't great.

Our index names don't change often, so in principle we could cache them. But when they do we don't have a great way of signalling to the process that it needs to clear its cache, short of restarting it.

The format migrator prevents users from getting duplicate results when formats are migrated into the "govuk" index from the "government" or "detailed" ones: it ensures that only non-migrated formats are returned from the "government"/"detailed" indices and only migrated formats are returned from the "govuk" index. Previously it did this with an `indices` query. They look like this: { indices: { indices: ["list", "of", "indices"] , query: { query to perform against docs from the indices } , no_match_query: { query to perform against other docs } } } However, the `indices` query has been deprecated in ES5 and removed in ES6, in favour of just searching over the `_index` field in docs directly. This PR changes the `indices` query into a `should` with two clauses where one clause matches documents in the "govuk" index and the other clause matches documents in the other indices. It's a bit more cumbersome because the `terms` query I'm using doesn't handle index aliases[1], and involves another round-trip to elasticsearch when constructing the query to find out the real name of the "govuk" alias. Index real names only change very infrequently (whenever we reindex the cluster), so in principle they could be cached. But we don't currently have a good way for the reindexing rake task to signal to the web processes that they need to purge their cache. [1] elastic/elasticsearch#23306

jtibshirani · 2019-08-17T01:17:59Z

I have been looking into a fix for this. It is slightly more tricky to support now that fully-qualified index names such as cluster:index can be provided in cross cluster search. As a simple example, a wildcard query like a*b*c becomes difficult to interpret -- we could try to resolve a*b as the cluster name and c as an index alias, or a as the cluster name, b*c as the alias.

We support wildcards in qualified index names in the _search URL -- to me it seems we should follow the same strategy for handling wildcards, first proposed in #23893. When specifying a remote index name, we would require that the term contain a : to allow for distinguishing the cluster and index components. Another option would be to disallow issuing wildcard queries against the _index field. To me this doesn't seem like a good choice because wildcard support can be quite useful and was added in response to a customer request (#25722). Tagging @javanna and @jimczi for their thoughts on the above.

Finally, I haven’t yet dug into the query parsing issue @carrino raised, but will make sure to consider it while developing a fix.

javanna · 2019-08-22T11:01:05Z

@jtibshirani thanks for looking into this. I don't think CCS comes into play here, or maybe I don't follow why it may. We have prohibited using : in index and aliases names, hence if an index expression contains : it must be because it's referring to remote clusters. On the expansion of a*b*c I think that should only be expanded to local indices., while if you get something like *:a*b*c it refers to remote clusters matching the cluster expression at the left of :. Does this make sense to you?

jtibshirani · 2019-09-11T23:54:42Z

@javanna we are in agreement, my explanation above was just confusing :) I opened #46640 to add support for aliases, and included a couple open questions in the PR description.

I am not sure of the best way forward in terms of the other discrepancy (where the indices query avoided parsing its subqueries unless they matched the current index). I'll mark this for discussion internally and follow-up with another comment or PR.

jtibshirani · 2019-09-24T18:17:51Z

I looked into the parsing issue and do not see a good way to address it with the current API. It has been quite some time since we deprecated and removed the indices query, so we are not in a good position to reverse that decision.

It would be very helpful to collect up-to-date information about the remaining problems that were cited (query parsing and filtered aliases). If anyone is still running into problems related to the indices query removal, please open a new issue -- this will let us dig into those problems directly and get a sense of their priority.

I am going to close the issue, since we merged #46640 to address the original problem ("Term query on _index field does not take aliases into account"), but will keep a lookout for new related issues!

uschindler · 2019-09-25T07:22:09Z

Hi @jtibshirani,

many thanks for fixing this! I agree for the discussion of other problems (like not evaluating the query trees that are not part of the currently seen index) to have separate issues.

I think with the current approach, filtered aliases should work (I have not tested them yet), as their filter is always applied as a filter on the whole query, so you would not see documents filtered away. With indices query it was more problematic regarding parsing of query parts.

The format migrator prevents users from getting duplicate results when formats are migrated into the "govuk" index from the "government" or "detailed" ones: it ensures that only non-migrated formats are returned from the "government"/"detailed" indices and only migrated formats are returned from the "govuk" index. Previously it did this with an `indices` query. They look like this: { indices: { indices: ["list", "of", "indices"] , query: { query to perform against docs from the indices } , no_match_query: { query to perform against other docs } } } However, the `indices` query has been deprecated in ES5 and removed in ES6, in favour of just searching over the `_index` field in docs directly. This PR changes the `indices` query into a `should` with two clauses where one clause matches documents in the "govuk" index and the other clause matches documents in the other indices. It's a bit more cumbersome because the `terms` query I'm using doesn't handle index aliases[1], and involves another round-trip to elasticsearch when constructing the query to find out the real name of the "govuk" alias. Index real names only change very infrequently (whenever we reindex the cluster), so in principle they could be cached. But we don't currently have a good way for the reindexing rake task to signal to the web processes that they need to purge their cache. [1] elastic/elasticsearch#23306

clintongormley added :Query DSL discuss labels Feb 22, 2017

javanna added >enhancement help wanted adoptme and removed discuss labels Oct 27, 2017

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

talevy removed the :Search/Search Search-related issues that do not fall into other categories label Mar 26, 2018

talevy added discuss :Search/Search Search-related issues that do not fall into other categories labels Mar 26, 2018

talevy removed the discuss label Jul 9, 2018

ankane mentioned this issue May 7, 2019

Searching with multiple STI models broken since 4.0 ankane/searchkick#1259

Closed

barrucadu mentioned this issue Jun 4, 2019

Replace indices query with should query alphagov/search-api#1568

Merged

jtibshirani self-assigned this Aug 17, 2019

jtibshirani mentioned this issue Sep 11, 2019

Add support for aliases in queries on _index. #46640

Merged

jtibshirani added team-discuss and removed help wanted adoptme labels Sep 11, 2019

jtibshirani removed the team-discuss label Sep 23, 2019

jtibshirani closed this as completed Sep 24, 2019

jpountz mentioned this issue Nov 12, 2019

Search optimisation - add canMatch early aborts for queries on "_index" field #48681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Term query on _index field does not take aliases into account #23306

Term query on _index field does not take aliases into account #23306

uschindler commented Feb 22, 2017

clintongormley commented Feb 22, 2017

uschindler commented Feb 22, 2017

uschindler commented Feb 22, 2017 •

edited

Loading

uschindler commented Feb 22, 2017

javanna commented Mar 20, 2017

uschindler commented Mar 20, 2017

sjmc7 commented May 18, 2017

carrino commented Oct 20, 2017 •

edited

Loading

carrino commented Oct 20, 2017

uschindler commented Oct 21, 2017

carrino commented Oct 25, 2017

javanna commented Oct 27, 2017

carrino commented Oct 27, 2017

javanna commented Oct 27, 2017

uschindler commented Oct 27, 2017

acarstoiu commented Dec 5, 2017

carrino commented Feb 26, 2018

elasticmachine commented Mar 26, 2018

tomcallahan commented Jul 5, 2018

talevy commented Jul 9, 2018

tomcallahan commented Jul 9, 2018

barrucadu commented Jun 4, 2019 •

edited

Loading

jtibshirani commented Aug 17, 2019

javanna commented Aug 22, 2019

jtibshirani commented Sep 11, 2019 •

edited

Loading

jtibshirani commented Sep 24, 2019

uschindler commented Sep 25, 2019

Term query on _index field does not take aliases into account #23306

Term query on _index field does not take aliases into account #23306

Comments

uschindler commented Feb 22, 2017

clintongormley commented Feb 22, 2017

uschindler commented Feb 22, 2017

uschindler commented Feb 22, 2017 • edited Loading

uschindler commented Feb 22, 2017

javanna commented Mar 20, 2017

uschindler commented Mar 20, 2017

sjmc7 commented May 18, 2017

carrino commented Oct 20, 2017 • edited Loading

carrino commented Oct 20, 2017

uschindler commented Oct 21, 2017

carrino commented Oct 25, 2017

javanna commented Oct 27, 2017

carrino commented Oct 27, 2017

javanna commented Oct 27, 2017

uschindler commented Oct 27, 2017

acarstoiu commented Dec 5, 2017

carrino commented Feb 26, 2018

elasticmachine commented Mar 26, 2018

tomcallahan commented Jul 5, 2018

talevy commented Jul 9, 2018

tomcallahan commented Jul 9, 2018

barrucadu commented Jun 4, 2019 • edited Loading

jtibshirani commented Aug 17, 2019

javanna commented Aug 22, 2019

jtibshirani commented Sep 11, 2019 • edited Loading

jtibshirani commented Sep 24, 2019

uschindler commented Sep 25, 2019

uschindler commented Feb 22, 2017 •

edited

Loading

carrino commented Oct 20, 2017 •

edited

Loading

barrucadu commented Jun 4, 2019 •

edited

Loading

jtibshirani commented Sep 11, 2019 •

edited

Loading