-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MatchNoDocsQuery from stop words with wildcards in query_string #28856
Labels
>bug
:Search Relevance/Analysis
How text is split into tokens
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Comments
javanna
added
the
:Search/Search
Search-related issues that do not fall into other categories
label
Mar 1, 2018
cc @elastic/es-search-aggs |
javanna
added
:Search Relevance/Analysis
How text is split into tokens
and removed
:Search/Search
Search-related issues that do not fall into other categories
labels
Mar 1, 2018
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this issue
Mar 1, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes elastic#28855 Fixes elastic#28856
jimczi
added a commit
that referenced
this issue
Mar 4, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes #28855 Fixes #28856
jimczi
added a commit
that referenced
this issue
Mar 4, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes #28855 Fixes #28856
sebasjm
pushed a commit
to sebasjm/elasticsearch
that referenced
this issue
Mar 10, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes elastic#28855 Fixes elastic#28856
javanna
added
the
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
label
Jul 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Search Relevance/Analysis
How text is split into tokens
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Elasticsearch version (
bin/elasticsearch --version
):6.2.2, Build: 10b1edd/2018-02-16T19:01:30.685723Z
Plugins installed:
["analysis-icu"]
JVM version (
java -version
):9.0.4
OS version (
uname -a
if on a Unix-like system): Mac OS Sierra 10.12.616.7.0 Darwin Kernel Version 16.7.0: Thu Jan 11 22:59:40 PST 2018; root:xnu-3789.73.8~1/RELEASE_X86_64 x86_64
Description of the problem including expected versus actual behavior:
When doing a query_string query with the "stop" analyzer and wildcard analysis enabled, if the query contains stop words and wildcards (on the stop words or on other query terms), the expected behavior (at least, the behavior in Elasticsearch 5.5.0) is for the stop word to be removed from the token stream; the actual behavior is that it gets converted to MatchNoDocsQuery.
Steps to reproduce:
bin/elasticsearch
"on the run*"
:Response:
I also tested the following queries:
on the run
produces the expectedcontent:run
, withoutMatchNoDocsQuery
on* the run
producesMatchNoDocsQuery(\"analysis was empty for content:on\") content:run
Also tested with wildcard analysis off:
on the run*
producesMatchNoDocsQuery(\"analysis was empty for content:on\") content:run*
on the run
produces the expectedcontent:run
on* the run
producescontent:on* content:run
(unlike with wildcard analysis on)For comparison, here is the response to the same query from a fresh installation of 5.5.0 (+ analysis_icu for parity, but probably not relevant here?):
Provide logs (if relevant):
Not relevant.
The text was updated successfully, but these errors were encountered: