-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change initial size of DocIdSetBuilder #502
Conversation
Signed-off-by: John Mazanec <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #502 +/- ##
=========================================
Coverage 84.04% 84.04%
Complexity 1019 1019
=========================================
Files 146 146
Lines 4188 4188
Branches 373 373
=========================================
Hits 3520 3520
Misses 492 492
Partials 176 176 ☔ View full report in Codecov by Sentry. |
Is it possible to add a unit test to check for the iterator type? We can check the exact implementation of the BulkAdder returned by grow() method |
What would the purpose of this test be? I think the point of switching is so that we can defer to Lucene to give us the correct iterator. |
I think the main purpose is to make sure which exact implementation is returned by Lucene and if that matches our expectations. Lucene has some internal logic behind identifying the exact class for adder, so let's say in future that change it - we can catch that earlier in tests and act accordingly. |
I think unit tests would need to test DocIdSetBuilder. Given that we dont implement this, I am not sure it makes sense to add a unit test testing this functionality in our plugin. I think if we want more control over which iterator is built, we would need to implement our own DocIdSetIteratorBuilder and test that. However, here I just want to defer to Lucene's DocIdSetBuilder to make decisions on which iterator to build. In general, this functionality will not be unique to k-NN. |
Changes initial size of the docidsetbuilder used for iterating over results for k-NN queries. Originally, it was set to the maximum docid. This changes it to be the number of docs returned. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit 586958e)
Changes initial size of the docidsetbuilder used for iterating over results for k-NN queries. Originally, it was set to the maximum docid. This changes it to be the number of docs returned. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit 586958e)
Changes initial size of the docidsetbuilder used for iterating over results for k-NN queries. Originally, it was set to the maximum docid. This changes it to be the number of docs returned. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit 586958e)
Changes initial size of the docidsetbuilder used for iterating over results for k-NN queries. Originally, it was set to the maximum docid. This changes it to be the number of docs returned. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit 586958e) Co-authored-by: John Mazanec <[email protected]>
Changes initial size of the docidsetbuilder used for iterating over results for k-NN queries. Originally, it was set to the maximum docid. This changes it to be the number of docs returned. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit 586958e) Co-authored-by: John Mazanec <[email protected]>
Description
Currently, we grow the size of the DocIDSetBuilder to the maximum id in the results returned by the query. The grow function, however, is intended to take the number of docs the iterator should have, not the maximum doc id.
Internally, this method has logic to determine whether the iterator should be a sparse set of doc ids, or a dense bit set, where 1 indicates whether a docid is set or not. This can cause problems where the wrong iterator type is created. For instance, assume our results are [200,000,000]. Num docs = 1, but max doc = 200,000,000. This would then cause a bitset of 200,000,000 entries to be created for a single doc. Instead, if we set the length to 1, the iterator would have a single int in an int array.
This change will allow lucene to pick the best iterator for our use case.
Issues Resolved
#500
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.