Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzzy intervals source #49762

Merged
merged 3 commits into from
Jan 3, 2020
Merged

Conversation

romseygeek
Copy link
Contributor

This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes #49595

@romseygeek romseygeek added >feature :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.6.0 labels Dec 2, 2019
@romseygeek romseygeek requested a review from jimczi December 2, 2019 14:03
@romseygeek romseygeek self-assigned this Dec 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new source looks good to me, should it be in Lucene too ? I also wonder if we should allow for more than 128 expansions ? Allowing up to max boolean clause should be enough in Lucene 9 because the limit is applied on the entire query.

@romseygeek
Copy link
Contributor Author

should it be in Lucene too ?

You can already build something like this directly using Intervals.multiterm so I don't think we need a direct factory method in lucene.

I also wonder if we should allow for more than 128 expansions ?

I'll open a follow-up to move everything to use IndexSearcher.getMaxClauseCount(). The usual limits won't apply here though, because lucene 9 uses a query visitor after rewrite to count leaf clauses, and multiterm intervals doesn't do a whole-index rewrite so it doesn't know how many terms it will expand to yet. I don't think this is too much of a problem, as we directly limit expansions on multiterm intervals rather than relying on the overall clauses count, but it does suggest to me that we should leave the lucene defaults as 128.

@romseygeek
Copy link
Contributor Author

@elasticmachine update branch

@romseygeek romseygeek merged commit 32730cf into elastic:master Jan 3, 2020
@romseygeek romseygeek deleted the fuzzy-intervals branch January 3, 2020 09:55
romseygeek added a commit that referenced this pull request Jan 3, 2020
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes #49595
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes elastic#49595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fuzziness support in intervals query
4 participants