Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1699

Merged
merged 7 commits into from
May 24, 2024

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented May 13, 2024

Description

Continuation of #1287

This PR changes the implementation of our script scoring calculations to utilize Lucene's VectorUtil class.

Issues Resolved

#1032

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ryanbogan added 4 commits May 13, 2024 11:42
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Comment on lines 143 to 160
int numZeroInInput = 0;
int numZeroInQuery = 0;
float cosine = 0.0f;
for (int i = 0; i < inputVector.length; i++) {
if (inputVector[i] == 0) {
numZeroInInput++;
}

if (queryVector[i] == 0) {
numZeroInQuery++;
}
}
float normalizedProduct = normQueryVector * normInputVector;
if (normalizedProduct == 0) {
if (numZeroInInput == inputVector.length || numZeroInQuery == queryVector.length) {
return cosine;
}
try {
cosine = VectorUtil.cosine(queryVector, inputVector);
} catch (IllegalArgumentException e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did lucene doesn't have cosine functions directly present which we can leverage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the lucene cosine function on line 159. The rest just returns 0 if either the input or query vectors are all 0's.

normInputVector += inputVector[i] * inputVector[i];
}
float normalizedProduct = normQueryVector * normInputVector;
if (normalizedProduct == 0) {
logger.debug("Invalid vectors for cosine. Returning minimum score to put this result to end");
return 0.0f;
}
return (float) (dotProduct / (Math.sqrt(normalizedProduct)));
return (float) (VectorUtil.dotProduct(queryVector, inputVector) / (Math.sqrt(normalizedProduct)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to use dotProduct / normalize for calculate cosine. this would do one more iteration as original L108 doing dotProduct.

PS, i checked Lucene#DefaultVectorUtilSupport#cosine(float, float) would do cosine normalize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used that below, I'll see if I can get it to work with the normVector present in this method

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see there is return (float) (sum / Math.sqrt((double) norm1 * (double) norm2)); in Lucene#DefaultVectorUtilSupport#cosine(float, float) so we can use it directly in public static float cosinesimilOptimized and without using dotProduct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you seeing the DefaultVectorUtilSupport class? I've only been able to find VectorUtil so far and that class doesn't have a cosine method that takes floats, only float[]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD would be better as per out older experiments of SIMD. Also, given that lucene lacks that implementation I fine to remove this optimize cosine code for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see multiple versions of cosinesimil and cosineSimilarity. Lets just move towards 1 where we use Lucene functions to do the distance calculations and remove all others.

Some are using optimized and some doesn't. Lets just clean things up and move towards 1 implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll incorporate that with this PR then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO for those that are very serious about performance, they will normalize their data during preprocessing and use inner product directly. So, I think its okay to not change cosine functionality for now and just focus on dot product and l2 for this optimization.

dotProduct += queryVector[i] * inputVector[i];
normQueryVector += queryVector[i] * queryVector[i];
normInputVector += inputVector[i] * inputVector[i];
int numZeroInInput = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method would still return true for a zero vector right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cosine will be infinite if one vector is finite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we validate that it's not zero vector in the above method, we should be able to remove the other check because of the assert finite

Copy link

codecov bot commented May 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.92%. Comparing base (c315862) to head (540782c).
Report is 6 commits behind head on main.

Current head 540782c differs from pull request most recent head f5b76cf

Please upload reports for the commit f5b76cf to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1699      +/-   ##
============================================
- Coverage     84.93%   84.92%   -0.01%     
+ Complexity     1460     1459       -1     
============================================
  Files           177      178       +1     
  Lines          5860     5879      +19     
  Branches        597      594       -3     
============================================
+ Hits           4977     4993      +16     
- Misses          632      635       +3     
  Partials        251      251              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ryanbogan
Copy link
Member Author

Lucene VectorUtil CosineSimilOptimized microbenchmarks

No changes:

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.659 ±  0.102   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.682 ±  0.054   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.675 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.008 ±  0.219   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  1.012 ±  0.120   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  1.015 ±  0.360   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.331 ±  0.383   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.112   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.333 ±  0.226   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.693 ±  0.507   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.715 ±  1.113   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.677 ±  0.273   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  2.943 ±  1.095   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.132 ±  4.335   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  2.987 ±  3.950   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.572 ± 10.261   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.198 ±  1.984   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.379 ±  0.930   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.695 ±  6.103   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.757 ±  8.886   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  5.080 ±  6.601   s/op

Using VectorUtil:

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.654 ±  0.033   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.663 ±  0.043   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.662 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.000 ±  0.210   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  0.996 ±  0.094   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  0.995 ±  0.191   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.363 ±  0.669   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.132   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.356 ±  0.778   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.782 ±  3.863   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.699 ±  0.641   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.664 ±  0.183   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  3.032 ±  3.735   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.107 ±  5.165   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  3.209 ±  4.675   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.280 ±  5.766   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.310 ±  4.152   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.400 ± 14.473   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.076 ± 20.049   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.744 ±  8.573   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  4.974 ±  8.913   s/op

@navneet1v
Copy link
Collaborator

Lucene VectorUtil CosineSimilOptimized microbenchmarks

No changes:

Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.659 ±  0.102   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.682 ±  0.054   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.675 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.008 ±  0.219   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  1.012 ±  0.120   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  1.015 ±  0.360   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.331 ±  0.383   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.112   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.333 ±  0.226   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.693 ±  0.507   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.715 ±  1.113   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.677 ±  0.273   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  2.943 ±  1.095   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.132 ±  4.335   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  2.987 ±  3.950   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.572 ± 10.261   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.198 ±  1.984   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.379 ±  0.930   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.695 ±  6.103   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.757 ±  8.886   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  5.080 ±  6.601   s/op

Using VectorUtil:

Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.654 ±  0.033   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.663 ±  0.043   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.662 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.000 ±  0.210   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  0.996 ±  0.094   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  0.995 ±  0.191   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.363 ±  0.669   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.132   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.356 ±  0.778   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.782 ±  3.863   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.699 ±  0.641   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.664 ±  0.183   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  3.032 ±  3.735   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.107 ±  5.165   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  3.209 ±  4.675   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.280 ±  5.766   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.310 ±  4.152   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.400 ± 14.473   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.076 ± 20.049   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.744 ±  8.573   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  4.974 ±  8.913   s/op

@ryanbogan so what the conclusion? and I am seeing you are using function with name: transferVectors_withCapacity is that a typo that you didn't change the name of the function while running benchmarks?

@ryanbogan
Copy link
Member Author

Lucene VectorUtil CosineSimilOptimized microbenchmarks
No changes:
Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.659 ±  0.102   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.682 ±  0.054   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.675 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.008 ±  0.219   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  1.012 ±  0.120   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  1.015 ±  0.360   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.331 ±  0.383   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.112   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.333 ±  0.226   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.693 ±  0.507   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.715 ±  1.113   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.677 ±  0.273   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  2.943 ±  1.095   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.132 ±  4.335   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  2.987 ±  3.950   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.572 ± 10.261   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.198 ±  1.984   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.379 ±  0.930   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.695 ±  6.103   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.757 ±  8.886   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  5.080 ±  6.601   s/op

Using VectorUtil:
Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.654 ±  0.033   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.663 ±  0.043   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.662 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.000 ±  0.210   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  0.996 ±  0.094   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  0.995 ±  0.191   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.363 ±  0.669   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.132   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.356 ±  0.778   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.782 ±  3.863   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.699 ±  0.641   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.664 ±  0.183   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  3.032 ±  3.735   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.107 ±  5.165   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  3.209 ±  4.675   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.280 ±  5.766   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.310 ±  4.152   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.400 ± 14.473   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.076 ± 20.049   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.744 ±  8.573   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  4.974 ±  8.913   s/op

@ryanbogan so what the conclusion? and I am seeing you are using function with name: transferVectors_withCapacity is that a typo that you didn't change the name of the function while running benchmarks?

I just ran the base version of the microbenchmarks, which should run everything right?

@jmazanec15
Copy link
Member

@ryanbogan No, the cosinesimil is not covered by those.

Lets just leave cosinesimilOptimized untouched for now. For users really concerned about performance, they should normalize vectors during ingestion and then use dotProduct.

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>
@ryanbogan ryanbogan force-pushed the lucene_vector_util branch from dcb4f47 to f5b76cf Compare May 23, 2024 17:00
Copy link
Collaborator

@luyuncheng luyuncheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryanbogan ryanbogan merged commit 7a88f40 into opensearch-project:main May 24, 2024
48 of 50 checks passed
@ryanbogan ryanbogan deleted the lucene_vector_util branch May 24, 2024 16:00
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 24, 2024
…ing exact search (#1699)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 7a88f40)
ryanbogan added a commit that referenced this pull request May 24, 2024
…ing exact search (#1699) (#1717)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 7a88f40)

Co-authored-by: Ryan Bogan <[email protected]>
navneet1v added a commit that referenced this pull request Jun 1, 2024
* Fix flaky test in Faiss JNI range search (#1705)

Signed-off-by: Junqiu Lei <[email protected]>

* Support script score when doc value is disabled and fix misusing DISI (#1696)

* Revert "Revert 'Support script score when doc value is disabled' (#1662)"

This reverts commit bd2f403.

Signed-off-by: panguixin <[email protected]>

* fix misusing doc value

Signed-off-by: panguixin <[email protected]>

* add changelog

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>

* --- (#1712)

updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update threshold value after new result is added (#1715)

Signed-off-by: Heemin Kim <[email protected]>

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search (#1699)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>

* Add validation for pq m parameter before training starts (#1713)

* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>

* Updating the BWC test config after 2.14 release (#1724)

Signed-off-by: Navneet Verma <[email protected]>

---------

Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: panguixin <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Co-authored-by: Junqiu Lei <[email protected]>
Co-authored-by: panguixin <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Heemin Kim <[email protected]>
Co-authored-by: Ryan Bogan <[email protected]>
Co-authored-by: Navneet Verma <[email protected]>
jmazanec15 pushed a commit to jmazanec15/k-NN-1 that referenced this pull request Jun 24, 2024
…ing exact search (opensearch-project#1699) (opensearch-project#1717)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 7a88f40)

Co-authored-by: Ryan Bogan <[email protected]>
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request Jul 7, 2024
…ing exact search (opensearch-project#1699)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request Jul 7, 2024
…ing exact search (opensearch-project#1699)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants