-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.8] Backport: Search optimisation - add canMatch early aborts for queries on "_index" field #80005
Conversation
indexMatcher doesn't exist in the 6.8 branch.
Hello @mdaudali, we only backport bugfixes to the previous major version 6.8, since we only perform patch releases (which aren't supposed to include enhancements and new features). So unfortunately this change isn't eligible for backporting. Would you be able to refine the index pattern you're searching against to only include indices where the fields have compatible types? This is the usual approach we encourage. |
Hey @jtibshirani! Thanks for the quick response. Regarding the bugfix backport policy. Totally makes sense! In this case however, I believe this to be a bugfix despite the original PR being proposed as an optimisation. The original behaviour causes an unexpected exception to occur for a certain class of queries - namely, queries that search on shared fieldnames but are singly-scoped to a particular index with a bool-must filter. Regarding splitting the query up apriori: Understood. However, a paging query that should span across N documents would need to be internally split and sent to ES as up to N different paging queries, and the set of results would need to be held in memory and sorted, before manually creating pages to return back to the user. Overall, that can end up being quite a fiddly workaround! Hope this makes sense, happy to clarify anything :) Thanks! |
@mdaudali got it, I understand better why you consider this to be a 'bugfix' as well. I will check with the rest of the team to see what they think. Could you explain more about your use case? I didn't quite follow the description "However, a paging query that should span across N documents would need to be internally split and sent to ES as up to N different paging queries..." Some other important context: as our work ramps up for the 8.0 release, we can't be certain there will be more 6.8 patch releases. So even if we do decide to backport this, you might not be able to pick it up. |
Pinging @elastic/es-search (Team:Search) |
Thanks @jtibshirani! Apologies, I meant N different indices. In the ideal state (which this PR fixes), I can perform a paging query across index A, B, C, D in a single query, and request and return say the top 100 results in each page. The suggested solution of running separate queries on indices (A, B) and (C, D) would require loading a number of pages from both sets, storing the results and sorting them appropriately, then provide a page back to the client representing the top 100 matches from A, B, C, D as originally requested. An intermediary would then also need to cache any additional entries that have already been loaded from ES to be returned when a client requests the next page - essentially re-implementing ES paging functionality. This can be quite a complicated workaround even with merging just two paging calls together - and gets increasingly so when running more concurrent paging queries (when searching over more indices) Thanks! |
💚 CLA has been signed |
As Julie explained, it is very unlikely that we release a 6.8 patch version any soon. Although I wonder if your problem can be solved differently. The Would that be enough to solve your problem ? |
The backport only fixes queries that reference the actual index name, since the indexMatcher for aliases was not backported.
bac1e99
to
644e142
Compare
Hey @jimczi, Thanks for the response and great suggestion! Sadly, I don't believe the match filter will solve the problem at hand. To expand - a match query, as I understand it, allows us to search for a single value on a field (+ fuzzy matching). A query across many different terms would require a disjunction of match queries across all terms. It's not uncommon to receive >10,000 individual terms in a single query. This, in turn, means we hit the 1024 bool max clause count limit very quickly. On the other hand, a terms query has a default term count limit of 65536, which is more than sufficient. An assumption here is that the bool max clause count limit is significantly lower since additional clauses are more resource-expensive/intensive than the equivalent number of terms in a terms query. If this isn't the case (and therefore a bool OR'd match query is as expensive as terms given the same number of terms), then bumping the limit would be sufficient. As an aside, thanks for the context on the 6.8.X release. It's an accepted risk if we're not able to find a better solution through existing ES 6.8 primitives, but naturally no expectation on the ES team for a release to be cut if this gets merged. Thanks! |
Hey @jtibshirani, @jimczi , Just wanted to follow up here, are there any other alternative solutions we should try? Thank you!! |
@mdaudali after considering the options, I don't think we should backport this to 6.8. We really try to minimize changes on 6.8 in the rare case we need to quickly issue a patch release (usually for a security issue). Although the changes in the PR look small, to do a "proper fix" we'd need to backport the changes around I'm sorry this didn't result in the outcome you were hoping for. I hope your migration to 7.x goes smoothly, and in the meantime find some way to move forward (perhaps by adjusting the mapping to avoid fields with the same name but with conflicting types?) |
Hey!
Backports PR #48681 for issue #48473, As per the maintenance plan hopefully it should be acceptable to backport this to 6.8!
Context:
When two or more indices share the same field name, with different, non-coercible field types, searches across these indices fail when searching on the shared field name, even when scoped to a single index via an "_index" term query. This happens frequently when documents have a common id field name (e.g. "id"), but with different types.
A more concrete example:
Why not move to ES7?: The migration from ES6 -> ES7 in progress, but we would like to backport the fix as we still have clusters on ES6.8
Thanks! Let me know if there's any further changes I should make, or any additional clarifying information I can provide :)
(Apologies if there's a separate SOP for request backports. I have checked previous issues and the forum for prior examples for backporting, and did not find any.)