-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] WIldcard query behavior for text fields changed in an backward-incompatible way #8711
Comments
@kartg looks broken :( if you have time, write a YAML REST test for this next |
This is expected behavior. The default tokenizer first lower cases the input string at index time. So running a That patch did yield another change that also provides the behavior you seek along w/ a yaml test. |
Thanks @nknize, sounds like I'm wrong on it being a regression, should we close this? |
@nknize reopening because i'd like to discuss if such a behavior change breaks semver. As I outlined in the repro steps above, a user query on a
Note that the use-case i'm describing in this issue is for the |
It doesn't because this is a bug that was fixed. We shouldn't set a precedent of retaining bug compatibility, thats a slippery slope door we should never open.
Users of Elasticsearch and OpenSearch should already be aware that what they index is not the raw string and what tokens are created by the Standard Analyzer and Standard Tokenizer. If they don't they'll have no idea what's in their index. This isn't breaking because capital letters should never have matched in the first place - that's why
So the user should set |
I whole heartedly agree, which is why my answer to the "expected behavior" section starts with "unclear" 😄
This assumes that the users/clients querying for data are the same (or at least in lock step) as the users/clients indexing data. I don't necessarily agree with this assumption, but that's a separate concern. In the interest of consistent behavior, why do we want wildcard queries on |
What version was the original bug you fixed, @nknize, introduced? I think we should document that the bug was introduced in some version, fixed in 2.5, but users may have to change their implementation if the assumption was that the bug was expected behavior. cc: @kolchfa-aws |
+1, this should go into a "breaking changes" section of docs |
Created a doc issue: opensearch-project/documentation-website#4788 |
Describe the bug
It appears that PR #5462 changed the behavior for wildcard queries against
text
fields. Queries that previously worked (due to the behavior documented in #5461) now no longer return any results. IMO, this is backwards-incompatible change that should not have been backported to2.x
To Reproduce
This can be repro'd using the OpenSearch docker images. First, start a node container:
Then, create a test index with a test field of type
text
and add data to this:Now perform a wildcard search and observe that no hits are returned:
Next, repeat the same steps with the previous minor release Docker image -
opensearchproject/opensearch:2.4.1
- and observe that the wildcard query returns a resultExpected behavior
Unclear. The PR seems to fix erroneous behavior but introduces a breaking change.
Plugins
N/A
Screenshots
N/A
Host/Environment (please complete the following information):
Additional context
A workaround for this bug is to use a query_string query instead:
Finally, this bug only affects analyzed/normalized fields types. AFAIK, this means only
keyword
andtext
are affected. The PR updatesKeywordFieldMapper
to override this value since keyword fields are always normalized sokeyword
field types are not affected by this change.The text was updated successfully, but these errors were encountered: