Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard queries should not normalize/remove wildcards in queries #46300

Closed
talevy opened this issue Sep 3, 2019 · 3 comments · Fixed by #53127
Closed

Wildcard queries should not normalize/remove wildcards in queries #46300

talevy opened this issue Sep 3, 2019 · 3 comments · Fixed by #53127
Assignees
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@talevy
Copy link
Contributor

talevy commented Sep 3, 2019

Wildcard queries should not normalize wildcards from queries

Example scenario where wildcard queries do not exhibit expected behavior:

PUT my-index
{
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "no_wildcard"
        }
      }
    }
  },
  "settings": {
    "index": {
      "analysis": {
        "char_filter": {
          "no_wildcard": {
            "pattern": "[\\*]",
            "type": "pattern_replace",
            "replacement": ""
          }
        },
        "normalizer": {
          "no_wildcard": {
            "type": "custom",
            "char_filter": "no_wildcard"
          }
        }
      }
    }
  }
}

PUT /my-index/_doc/1
{
    "foo": "bar"
}

GET my-index/_search
{
  "profile": true,
  "query": {
    "wildcard": {
      "foo": {
        "value": "ba*"
      }
    }
  }
}

The resulting input that is provided to the wildcard query removes the *.

// output from the query profiler for above query
...
             {
                "type" : "MultiTermQueryConstantScoreWrapper",
                "description" : "foo:ba",
...

Because of this, the search does not find any matching documents.

I believe this relates to #28894.

@talevy talevy added discuss :Search Relevance/Analysis How text is split into tokens labels Sep 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jimczi
Copy link
Contributor

jimczi commented Sep 4, 2019

I think we can consider this as a bug. The reserved characters should be excluded when the normalization is applied. As a comparison, the query_string splits the wildcard input by reserved characters and applies normalization on each split independently to build the normalized version:
https://github.com/apache/lucene-solr/blob/master/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParserBase.java#L708
We should do the same in the wildcard query.

@talevy
Copy link
Contributor Author

talevy commented Sep 4, 2019

thanks Jim. I'll update the labels

@talevy talevy added >bug and removed discuss labels Sep 4, 2019
@talevy talevy changed the title Add ability to better handle normalization in wildcard queries Wildcard queries should not normalize wildcards from queries Sep 4, 2019
@talevy talevy changed the title Wildcard queries should not normalize wildcards from queries Wildcard queries should not normalize/remove wildcards in queries Sep 4, 2019
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 4, 2020
Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes elastic#46300
@cbuescher cbuescher self-assigned this Mar 5, 2020
cbuescher pushed a commit that referenced this issue Mar 12, 2020
Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes #46300
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 12, 2020
)

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes elastic#46300
cbuescher pushed a commit that referenced this issue Mar 24, 2020
…53512)

Wildcard queries on keyword fields get normalized, however this normalization
step should exclude the two special characters * and ? in order to keep the
wildcard query itself intact.

Closes #46300
@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants