Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using empty string in range query 'gt' returns no documents #63386

Closed
FlorianKalinna opened this issue Oct 7, 2020 · 3 comments · Fixed by #63397
Closed

Using empty string in range query 'gt' returns no documents #63386

FlorianKalinna opened this issue Oct 7, 2020 · 3 comments · Fixed by #63397
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.11.0 v8.0.0-alpha1

Comments

@FlorianKalinna
Copy link

I have created a new Index and then added two documents to it.

PUT test_index
PUT /test_index/_doc/1 { "A" : "5" }
PUT /test_index/_doc/2 { "B" : "5" }

After that I searched for every document with A greater than emty in that index,

GET test_index/_search { "query": { "range": { "A": { "gt": "" } } }, "size": 2000 }

and received no document.

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }

On the other hand when I searched for every document with A greater than equal empty,

GET test_index/_search { "query": { "range": { "A": { "gte": "" } } }, "size": 2000 }

I received one document.

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "A" : "5" } } ] } }

Why does the greater than comparison behave different and does not return a document, too?

@FlorianKalinna FlorianKalinna added >bug needs:triage Requires assignment of a team area label labels Oct 7, 2020
@cbuescher cbuescher added the :Search/Search Search-related issues that do not fall into other categories label Oct 7, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 7, 2020
@cbuescher cbuescher added v7.11.0 v8.0.0 and removed needs:triage Requires assignment of a team area label labels Oct 7, 2020
@cbuescher
Copy link
Member

This is indeed odd and to me seems a bit counterintuitive. At least for the lower bound of a text or keyword field I would assume ever value being larger than the empty string. The problem with this edge case is that the empty string should be considered an open lower bound that matches everything. This works as expected for "inclusice" lower bound (i.e. "gte") but not for "exclusive" (i.e. "gt") because its not completely clear which term to exclude. Internally we use Lucenes TermRangeQuery which does the right thing for "null" values here implicitely sets the inclusion flag to "true" in this case (see https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/TermRangeQuery.java#L52 and https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/TermRangeQuery.java#L81).
I think we should treat the empty string in the same way when constructing the range query and implicitely change the inclusion flag in this case to avoid this confusion.

@cbuescher cbuescher self-assigned this Oct 7, 2020
@cbuescher cbuescher changed the title Why do the greater than comparison and the greater than equal comparsion not return the same count of documents Using empty string in range query 'gt' returns no documents Oct 7, 2020
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 7, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string. This
PR fixed this special edge case by implicitely setting the "lower" include flag
in this case before constructing the TermRangeQuery.

Closes elastic#63386
cbuescher pushed a commit that referenced this issue Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string. 
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes #63386
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string.
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes elastic#63386
pugnascotia pushed a commit to pugnascotia/elasticsearch that referenced this issue Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string. 
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes elastic#63386
cbuescher pushed a commit that referenced this issue Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string.
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes #63386
@cbuescher
Copy link
Member

Should be fixed in 7.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants