Correctly handle duplicates in unordered interval matching #49775

romseygeek · 2019-12-02T17:24:54Z

Currently, unordered interval matching does not check for duplicates,
which means that a query for to be or not to be can match a document
that contains the phrase to be or not, because the second to be matches
at the same position as the first and the AND interval algorithm does not
check for overlaps. This is counter-intuitive.

This commit adds a check to the interval builder, such that if it finds duplicates
when combining sources into an unordered AND, it combines those duplicates
into an ORDERED interval first; so to be or not to be becomes
UNORDERED(ORDERED(to, to), ORDERED(be, be), or, not)

elasticmachine · 2019-12-02T17:24:56Z

Pinging @elastic/es-search (:Search/Search)

…vals

jimczi

LGTM

romseygeek · 2019-12-18T10:35:19Z

This really needs to be handled in lucene, as this solution doesn't correctly handle internal gaps in intervals with repeats. I've opened https://github.com/apache/lucene-solr/pull/1097/files

Correctly handle duplicates in unordered interval matching

84bb45a

romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.6.0 labels Dec 2, 2019

romseygeek requested a review from jimczi December 2, 2019 17:24

romseygeek self-assigned this Dec 2, 2019

romseygeek added 6 commits December 3, 2019 09:32

yaml tests

af4cec0

Merge remote-tracking branch 'origin/master' into unordered-dup-inter…

478da75

…vals

skip mixed-cluster yml tests until backport

bc25eb7

Merge remote-tracking branch 'origin/master' into unordered-dup-inter…

1a0cd78

…vals

Use maxwidth rather than maxgaps

d1b77d0

Still use maxgaps when building sources from DSL

9fde2f9

jimczi approved these changes Dec 12, 2019

View reviewed changes

romseygeek closed this Dec 18, 2019

romseygeek deleted the unordered-dup-intervals branch December 18, 2019 10:35

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

romseygeek removed v7.6.0 v8.0.0 labels Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly handle duplicates in unordered interval matching #49775

Correctly handle duplicates in unordered interval matching #49775

romseygeek commented Dec 2, 2019

elasticmachine commented Dec 2, 2019

jimczi left a comment

romseygeek commented Dec 18, 2019

Correctly handle duplicates in unordered interval matching #49775

Correctly handle duplicates in unordered interval matching #49775

Conversation

romseygeek commented Dec 2, 2019

elasticmachine commented Dec 2, 2019

jimczi left a comment

Choose a reason for hiding this comment

romseygeek commented Dec 18, 2019