Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix slop #2031

Merged
merged 1 commit into from
May 10, 2023
Merged

fix slop #2031

merged 1 commit into from
May 10, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented May 9, 2023

Fix slop by carrying slop so far for terms > 2.
Define slop contract in the API

  • The query will match if its terms are separated by slop terms at most.
    The slop can be considered a budget between all terms.
    E.g. "A B C" with slop 1 allows "A X B C", "A B X C", but not "A X B X C".

  • Transposition costs 2, e.g. "A B" with slop 1 will not match "B A" but it would with slop 2
    Transposition is not a special case, in the example above A is moved 1 position and B is moved 1 position, so the slop is 2.

  • As a result slop works in both directions, so the order of the terms may changed as long as they respect the slop.

The slop carrying algorithm is only active for terms > 2 as it is slightly slower

running 6 tests
test query::phrase_query::phrase_scorer::bench::bench_intersection_count_short                                           ... bench:           2 ns/iter (+/- 0)
test query::phrase_query::phrase_scorer::bench::bench_intersection_medium                                                ... bench:         106 ns/iter (+/- 1)
test query::phrase_query::phrase_scorer::bench::bench_intersection_medium_slop                                           ... bench:         104 ns/iter (+/- 7)
test query::phrase_query::phrase_scorer::bench::bench_intersection_medium_slop_carrying                                  ... bench:         194 ns/iter (+/- 3)
test query::phrase_query::phrase_scorer::bench::bench_intersection_short                                                 ... bench:           3 ns/iter (+/- 0)
test query::phrase_query::phrase_scorer::bench::bench_intersection_short_slop                                            ... bench:          16 ns/iter (+/- 0)

Fix slop by carrying slop so far for multiterms.
Define slop contract in the API
@@ -66,6 +66,16 @@ impl PhraseQuery {
/// Slop allowed for the phrase.
///
/// The query will match if its terms are separated by `slop` terms at most.
/// The slop can be considered a budget between all terms.
/// E.g. "A B C" with slop 1 allows "A X B C", "A B X C", but not "A X B X C".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome explanation

@PSeitz PSeitz merged commit 0eafbaa into main May 10, 2023
@PSeitz PSeitz deleted the slop_test branch May 10, 2023 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants