Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Call Issue Fix with Binary Quantized Vectors #2071

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

Vikasht34
Copy link
Contributor

@Vikasht34 Vikasht34 commented Sep 8, 2024

Description

This PR addresses a critical issue that was identified during benchmarking, where the recall performance unexpectedly dropped below 1. The root cause of the issue was traced to two main problems in the quantization and vector handling process:

  1. Bit Packing of Quantized Vector: The quantized vector was not correctly updated during the bit-packing process. Instead, the same vector values were being reused, resulting in incorrect quantization of subsequent vectors.
  2. Vector Transfer to JNI: When transferring vectors to the JNI layer, vectors were passed by reference, causing all the vectors in the VectorTransfer list to reference the same object. This led to unintended behavior where all vectors became identical, severely affecting the recall accuracy.

Testing

Performed Benchmarking with NQ Data Set Results are here s3://disk-based-ann-bq/NQ-1M-768/
Re-Call for One Bit Quantization with 2x Oversampling us 0.94

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code looks good to me. There are some minor comments. See how you want to resolve it.

Apart from this I would recommend doing 2 things:

  1. Update the description of the PR with details on what are the bugs which was found.
  2. Also add the details on how you tested these, like if you have ran a benchmarks to see the recall add that too.

ryanbogan
ryanbogan previously approved these changes Sep 9, 2024
Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit but not blocking. Otherwise, LGTM!

Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some merge conflicts that need to be fixed @Vikasht34

ryanbogan
ryanbogan previously approved these changes Sep 9, 2024
Signed-off-by: VIKASH TIWARI <[email protected]>
ryanbogan
ryanbogan previously approved these changes Sep 9, 2024
@navneet1v navneet1v merged commit ce735c4 into opensearch-project:main Sep 9, 2024
28 of 29 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 9, 2024
* Re-Call Issue Fix with Binary Quantized Vectors

Signed-off-by: VIKASH TIWARI <[email protected]>

* Feedback Fix

Signed-off-by: VIKASH TIWARI <[email protected]>

---------

Signed-off-by: VIKASH TIWARI <[email protected]>
Signed-off-by: Vikasht34 <[email protected]>
(cherry picked from commit ce735c4)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 9, 2024
* Re-Call Issue Fix with Binary Quantized Vectors

Signed-off-by: VIKASH TIWARI <[email protected]>

* Feedback Fix

Signed-off-by: VIKASH TIWARI <[email protected]>

---------

Signed-off-by: VIKASH TIWARI <[email protected]>
Signed-off-by: Vikasht34 <[email protected]>
(cherry picked from commit ce735c4)
@@ -33,4 +58,11 @@ public interface QuantizationOutput<T> {
* @return true if the quantized vector is already prepared, false otherwise.
*/
boolean isPrepared(int vectorLength);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should be removing this as its a dead code at this point, we can add it back if we ever need it in the future

quantizationOutput.prepareQuantizedVector(vectorLength);

// Assert
assertNotNull(quantizationOutput.getQuantizedVector());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not null isn't tight enough. Can you check if the length is 10 and its all 0?

@Before
public void setUp() throws Exception {
super.setUp();
quantizationOutput = new BinaryQuantizationOutput(BITS_PER_COORDINATE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member variable seems to be creating interdependency between tests, can we create this for every test please

* Returns a copy of the quantized vector. This is because of during transfer same vectors was getting
* added due to reference.
*/
return indexBuildSetup.getQuantizationOutput().getQuantizedVectorCopy();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the unit test for this class to make sure we always return a copy? we need a assertNotSame check in the unit test of this class

ryanbogan pushed a commit that referenced this pull request Sep 9, 2024
* Re-Call Issue Fix with Binary Quantized Vectors

Signed-off-by: VIKASH TIWARI <[email protected]>

* Feedback Fix

Signed-off-by: VIKASH TIWARI <[email protected]>

---------

Signed-off-by: VIKASH TIWARI <[email protected]>
Signed-off-by: Vikasht34 <[email protected]>
(cherry picked from commit ce735c4)

Co-authored-by: Vikasht34 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants