Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a soft limit for trailing wildcard character length? #38861

Closed
ppf2 opened this issue Feb 13, 2019 · 8 comments
Closed

Add a soft limit for trailing wildcard character length? #38861

ppf2 opened this issue Feb 13, 2019 · 8 comments
Labels
>enhancement help wanted adoptme :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@ppf2
Copy link
Member

ppf2 commented Feb 13, 2019

We currently provide a configurable allow_leading_wildcard option as a way to safeguard against leading wildcards. Trailing wildcards can also be painful as observed many times in the past, esp. when users issue queries like a*, 1*, etc..

While we can ask users to add protection themselves to their apps to require at least N characters before wildcard queries are accepted, it can be helpful to provide a configurable soft limit option in Elasticsearch (where trailing wildcards will not be allowed until there is N characters provided). Not sure what's a good default is for this. 2 or 3 characters perhaps?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@polyfractal polyfractal added :Search/Search Search-related issues that do not fall into other categories >enhancement labels Feb 13, 2019
@jimczi
Copy link
Contributor

jimczi commented Feb 19, 2019

We discussed this internally and we think it's a good way to promote the usage of the index_prefixes option on text fields. However we think that it would be better to add an hard limit to the number of expanded terms rather than a limit on the number of characters. The idea would be to throw an error if the expansion returns more than N terms, this limit could be applied by default to any wildcard query.

@jimczi jimczi added help wanted adoptme and removed team-discuss labels Feb 19, 2019
@lmiskiew
Copy link

Hello, I would like to work on this issue. Could I get some pointers as to which classes are involved in this?

@jimczi
Copy link
Contributor

jimczi commented May 20, 2019

@lmiskiew you can check StringFieldType#prefixQuery where the limit could be added. Since it should be configurable, an index setting is also needed. You can check how we use FieldMapper#IGNORE_MALFORMED_SETTING for an example of an index setting that is accessed in the field mappers.

@lmiskiew
Copy link

@jimczi Thank you.

@lmiskiew
Copy link

lmiskiew commented Jun 4, 2019

@jimczi Hello, I'm having trouble trying to get the number of expanded terms. I tried to use the PrefixQuery#rewrite method to be able to use QueryTermExtractor on the resulting Query but it only returns one empty term. Despite this the rewritten query works the same as the PrefixQuery without rewriting, so as if all the terms were there. I use the IndexReader from the QueryShardContext passed in the arguments and when I extract terms for a specific document ID manually it does return the expected terms so they are there. Is this the approach you had in mind when you said 'expansion returns more than N terms'? If so do you know why what I'm trying to do isn't working? I tried to figure it out on my own but I spent a lot of time on it and still can't get it to work.
I also found #27796 when searching for a way to implement this and wanted to make sure that this is still a problem.

@jimczi
Copy link
Contributor

jimczi commented Jun 5, 2019

Thanks @lmiskiew , counting the number of expansions is tricky. The prefix query has multiple rewrite methods that depend on the options provided by the user and they all expand terms differently. We could wrap the rewrite method or the query to count the terms lazily (when the query is executed) but considering the complexity I wonder if this is something we should pursue. Since we already have options to avoid prefix queries entirely (index_prefixes on the text field) I wonder if we could close this issue in favor of #29050 but I wonder what @elastic/es-search think ?

I also found #27796 when searching for a way to implement this and wanted to make sure that this is still a problem.

We have protection in place to limit the expansion for some rewrite methods (scoring_boolean and top_terms_N) but the default rewrite (constant_score) considers all terms and as such it can slowdown the query considerably which is why we discussed an hard limit that could be applied to all rewrite. As said above it might be tricky to add so I also wonder what @mayya-sharipova thinks since she made the initial investigation.

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@javanna
Copy link
Member

javanna commented Jun 24, 2024

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement help wanted adoptme :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

7 participants