Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1699

ryanbogan · 2024-05-13T18:43:24Z

Description

Continuation of #1287

This PR changes the implementation of our script scoring calculations to utilize Lucene's VectorUtil class.

Issues Resolved

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ing exact search Signed-off-by: Ryan Bogan <[email protected]>

Signed-off-by: Ryan Bogan <[email protected]>

navneet1v · 2024-05-13T23:27:03Z

src/main/java/org/opensearch/knn/plugin/script/KNNScoringUtil.java

+        int numZeroInInput = 0;
+        int numZeroInQuery = 0;
+        float cosine = 0.0f;
+        for (int i = 0; i < inputVector.length; i++) {
+            if (inputVector[i] == 0) {
+                numZeroInInput++;
+            }
+
+            if (queryVector[i] == 0) {
+                numZeroInQuery++;
+            }
        }
-        float normalizedProduct = normQueryVector * normInputVector;
-        if (normalizedProduct == 0) {
+        if (numZeroInInput == inputVector.length || numZeroInQuery == queryVector.length) {
+            return cosine;
+        }
+        try {
+            cosine = VectorUtil.cosine(queryVector, inputVector);
+        } catch (IllegalArgumentException e) {


did lucene doesn't have cosine functions directly present which we can leverage?

We use the lucene cosine function on line 159. The rest just returns 0 if either the input or query vectors are all 0's.

luyuncheng · 2024-05-14T13:07:45Z

src/main/java/org/opensearch/knn/plugin/script/KNNScoringUtil.java

            normInputVector += inputVector[i] * inputVector[i];
        }
        float normalizedProduct = normQueryVector * normInputVector;
        if (normalizedProduct == 0) {
            logger.debug("Invalid vectors for cosine. Returning minimum score to put this result to end");
            return 0.0f;
        }
-        return (float) (dotProduct / (Math.sqrt(normalizedProduct)));
+        return (float) (VectorUtil.dotProduct(queryVector, inputVector) / (Math.sqrt(normalizedProduct)));


if we want to use dotProduct / normalize for calculate cosine. this would do one more iteration as original L108 doing dotProduct.

PS, i checked Lucene#DefaultVectorUtilSupport#cosine(float, float) would do cosine normalize

I used that below, I'll see if I can get it to work with the normVector present in this method

i see there is return (float) (sum / Math.sqrt((double) norm1 * (double) norm2)); in Lucene#DefaultVectorUtilSupport#cosine(float, float) so we can use it directly in public static float cosinesimilOptimized and without using dotProduct

Where are you seeing the DefaultVectorUtilSupport class? I've only been able to find VectorUtil so far and that class doesn't have a cosine method that takes floats, only float[]

Where are you seeing the DefaultVectorUtilSupport class?

https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java#L65

Looks like it's also used in KNNScoringSpace as well: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/plugin/script/KNNScoringSpace.java#L106

SIMD would be better as per out older experiments of SIMD. Also, given that lucene lacks that implementation I fine to remove this optimize cosine code for now.

I can see multiple versions of cosinesimil and cosineSimilarity. Lets just move towards 1 where we use Lucene functions to do the distance calculations and remove all others.

Some are using optimized and some doesn't. Lets just clean things up and move towards 1 implementation.

Sure, I'll incorporate that with this PR then

IMHO for those that are very serious about performance, they will normalize their data during preprocessing and use inner product directly. So, I think its okay to not change cosine functionality for now and just focus on dot product and l2 for this optimization.

jmazanec15 · 2024-05-15T18:24:50Z

src/main/java/org/opensearch/knn/plugin/script/KNNScoringUtil.java

-            dotProduct += queryVector[i] * inputVector[i];
-            normQueryVector += queryVector[i] * queryVector[i];
-            normInputVector += inputVector[i] * inputVector[i];
+        int numZeroInInput = 0;


I think these are unnecessary: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java#L79. Can we just do this?

That method would still return true for a zero vector right?

I believe cosine will be infinite if one vector is finite

As long as we validate that it's not zero vector in the above method, we should be able to remove the other check because of the assert finite

Signed-off-by: Ryan Bogan <[email protected]>

codecov · 2024-05-16T19:35:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.92%. Comparing base (c315862) to head (540782c).
Report is 6 commits behind head on main.

❗ Current head 540782c differs from pull request most recent head f5b76cf

Please upload reports for the commit f5b76cf to get more accurate results.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1699      +/-   ##
============================================
- Coverage     84.93%   84.92%   -0.01%     
+ Complexity     1460     1459       -1     
============================================
  Files           177      178       +1     
  Lines          5860     5879      +19     
  Branches        597      594       -3     
============================================
+ Hits           4977     4993      +16     
- Misses          632      635       +3     
  Partials        251      251

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan · 2024-05-21T23:08:04Z

Lucene VectorUtil CosineSimilOptimized microbenchmarks

No changes:

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.659 ±  0.102   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.682 ±  0.054   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.675 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.008 ±  0.219   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  1.012 ±  0.120   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  1.015 ±  0.360   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.331 ±  0.383   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.112   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.333 ±  0.226   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.693 ±  0.507   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.715 ±  1.113   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.677 ±  0.273   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  2.943 ±  1.095   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.132 ±  4.335   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  2.987 ±  3.950   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.572 ± 10.261   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.198 ±  1.984   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.379 ±  0.930   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.695 ±  6.103   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.757 ±  8.886   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  5.080 ±  6.601   s/op

Using VectorUtil:

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.654 ±  0.033   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.663 ±  0.043   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.662 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.000 ±  0.210   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  0.996 ±  0.094   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  0.995 ±  0.191   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.363 ±  0.669   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.132   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.356 ±  0.778   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.782 ±  3.863   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.699 ±  0.641   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.664 ±  0.183   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  3.032 ±  3.735   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.107 ±  5.165   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  3.209 ±  4.675   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.280 ±  5.766   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.310 ±  4.152   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.400 ± 14.473   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.076 ± 20.049   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.744 ±  8.573   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  4.974 ±  8.913   s/op

navneet1v · 2024-05-21T23:52:09Z

Lucene VectorUtil CosineSimilOptimized microbenchmarks

No changes:

Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.659 ±  0.102   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.682 ±  0.054   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.675 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.008 ±  0.219   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  1.012 ±  0.120   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  1.015 ±  0.360   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.331 ±  0.383   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.112   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.333 ±  0.226   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.693 ±  0.507   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.715 ±  1.113   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.677 ±  0.273   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  2.943 ±  1.095   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.132 ±  4.335   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  2.987 ±  3.950   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.572 ± 10.261   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.198 ±  1.984   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.379 ±  0.930   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.695 ±  6.103   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.757 ±  8.886   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  5.080 ±  6.601   s/op

Using VectorUtil:

Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.654 ±  0.033   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.663 ±  0.043   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.662 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.000 ±  0.210   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  0.996 ±  0.094   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  0.995 ±  0.191   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.363 ±  0.669   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.132   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.356 ±  0.778   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.782 ±  3.863   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.699 ±  0.641   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.664 ±  0.183   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  3.032 ±  3.735   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.107 ±  5.165   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  3.209 ±  4.675   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.280 ±  5.766   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.310 ±  4.152   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.400 ± 14.473   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.076 ± 20.049   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.744 ±  8.573   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  4.974 ±  8.913   s/op

@ryanbogan so what the conclusion? and I am seeing you are using function with name: transferVectors_withCapacity is that a typo that you didn't change the name of the function while running benchmarks?

ryanbogan · 2024-05-22T21:00:59Z

Lucene VectorUtil CosineSimilOptimized microbenchmarks
No changes:
Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.659 ±  0.102   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.682 ±  0.054   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.675 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.008 ±  0.219   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  1.012 ±  0.120   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  1.015 ±  0.360   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.331 ±  0.383   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.112   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.333 ±  0.226   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.693 ±  0.507   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.715 ±  1.113   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.677 ±  0.273   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  2.943 ±  1.095   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.132 ±  4.335   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  2.987 ±  3.950   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.572 ± 10.261   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.198 ±  1.984   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.379 ±  0.930   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.695 ±  6.103   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.757 ±  8.886   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  5.080 ±  6.601   s/op

Using VectorUtil:
Block (23 lines)

Benchmark                                               (dimension)  (vectorsPerTransfer)  Mode  Cnt  Score    Error  Units
TransferVectorsBenchmarks.transferVectors_withCapacity          128                100000    ss    3  0.654 ±  0.033   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128                500000    ss    3  0.663 ±  0.043   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          128               1000000    ss    3  0.662 ±  0.048   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                100000    ss    3  1.000 ±  0.210   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256                500000    ss    3  0.996 ±  0.094   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          256               1000000    ss    3  0.995 ±  0.191   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                100000    ss    3  1.363 ±  0.669   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384                500000    ss    3  1.346 ±  0.132   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          384               1000000    ss    3  1.356 ±  0.778   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                100000    ss    3  1.782 ±  3.863   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512                500000    ss    3  1.699 ±  0.641   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          512               1000000    ss    3  1.664 ±  0.183   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                100000    ss    3  3.032 ±  3.735   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960                500000    ss    3  3.107 ±  5.165   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity          960               1000000    ss    3  3.209 ±  4.675   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                100000    ss    3  3.280 ±  5.766   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024                500000    ss    3  3.310 ±  4.152   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1024               1000000    ss    3  3.400 ± 14.473   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                100000    ss    3  5.076 ± 20.049   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536                500000    ss    3  4.744 ±  8.573   s/op
TransferVectorsBenchmarks.transferVectors_withCapacity         1536               1000000    ss    3  4.974 ±  8.913   s/op

@ryanbogan so what the conclusion? and I am seeing you are using function with name: transferVectors_withCapacity is that a typo that you didn't change the name of the function while running benchmarks?

I just ran the base version of the microbenchmarks, which should run everything right?

jmazanec15 · 2024-05-23T16:41:36Z

@ryanbogan No, the cosinesimil is not covered by those.

Lets just leave cosinesimilOptimized untouched for now. For users really concerned about performance, they should normalize vectors during ingestion and then use dotProduct.

This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]>

luyuncheng

LGTM

…ing exact search (#1699) * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search Signed-off-by: Ryan Bogan <[email protected]> * Add Changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Fix failing test Signed-off-by: Ryan Bogan <[email protected]> * fix test Signed-off-by: Ryan Bogan <[email protected]> * Fix test bug and remove unnecessary validation Signed-off-by: Ryan Bogan <[email protected]> * Remove cosineSimilOptimized Signed-off-by: Ryan Bogan <[email protected]> * Revert "Remove cosineSimilOptimized" This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 7a88f40)

…ing exact search (#1699) (#1717) * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search Signed-off-by: Ryan Bogan <[email protected]> * Add Changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Fix failing test Signed-off-by: Ryan Bogan <[email protected]> * fix test Signed-off-by: Ryan Bogan <[email protected]> * Fix test bug and remove unnecessary validation Signed-off-by: Ryan Bogan <[email protected]> * Remove cosineSimilOptimized Signed-off-by: Ryan Bogan <[email protected]> * Revert "Remove cosineSimilOptimized" This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 7a88f40) Co-authored-by: Ryan Bogan <[email protected]>

* Fix flaky test in Faiss JNI range search (#1705) Signed-off-by: Junqiu Lei <[email protected]> * Support script score when doc value is disabled and fix misusing DISI (#1696) * Revert "Revert 'Support script score when doc value is disabled' (#1662)" This reverts commit bd2f403. Signed-off-by: panguixin <[email protected]> * fix misusing doc value Signed-off-by: panguixin <[email protected]> * add changelog Signed-off-by: panguixin <[email protected]> --------- Signed-off-by: panguixin <[email protected]> * --- (#1712) updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update threshold value after new result is added (#1715) Signed-off-by: Heemin Kim <[email protected]> * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search (#1699) * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search Signed-off-by: Ryan Bogan <[email protected]> * Add Changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Fix failing test Signed-off-by: Ryan Bogan <[email protected]> * fix test Signed-off-by: Ryan Bogan <[email protected]> * Fix test bug and remove unnecessary validation Signed-off-by: Ryan Bogan <[email protected]> * Remove cosineSimilOptimized Signed-off-by: Ryan Bogan <[email protected]> * Revert "Remove cosineSimilOptimized" This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> * Add validation for pq m parameter before training starts (#1713) * Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> * Updating the BWC test config after 2.14 release (#1724) Signed-off-by: Navneet Verma <[email protected]> --------- Signed-off-by: Junqiu Lei <[email protected]> Signed-off-by: panguixin <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Heemin Kim <[email protected]> Signed-off-by: Ryan Bogan <[email protected]> Signed-off-by: Navneet Verma <[email protected]> Co-authored-by: Junqiu Lei <[email protected]> Co-authored-by: panguixin <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Heemin Kim <[email protected]> Co-authored-by: Ryan Bogan <[email protected]> Co-authored-by: Navneet Verma <[email protected]>

…ing exact search (opensearch-project#1699) (opensearch-project#1717) * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search Signed-off-by: Ryan Bogan <[email protected]> * Add Changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Fix failing test Signed-off-by: Ryan Bogan <[email protected]> * fix test Signed-off-by: Ryan Bogan <[email protected]> * Fix test bug and remove unnecessary validation Signed-off-by: Ryan Bogan <[email protected]> * Remove cosineSimilOptimized Signed-off-by: Ryan Bogan <[email protected]> * Revert "Remove cosineSimilOptimized" This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 7a88f40) Co-authored-by: Ryan Bogan <[email protected]>

…ing exact search (opensearch-project#1699) * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search Signed-off-by: Ryan Bogan <[email protected]> * Add Changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Fix failing test Signed-off-by: Ryan Bogan <[email protected]> * fix test Signed-off-by: Ryan Bogan <[email protected]> * Fix test bug and remove unnecessary validation Signed-off-by: Ryan Bogan <[email protected]> * Remove cosineSimilOptimized Signed-off-by: Ryan Bogan <[email protected]> * Revert "Remove cosineSimilOptimized" This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan added 4 commits May 13, 2024 11:42

Use the Lucene Distance Calculation Function in Script Scoring for do…

1752bfd

…ing exact search Signed-off-by: Ryan Bogan <[email protected]>

Add Changelog entry

d21a24f

Signed-off-by: Ryan Bogan <[email protected]>

Fix failing test

f423779

Signed-off-by: Ryan Bogan <[email protected]>

fix test

e1ec3b9

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan marked this pull request as ready for review May 13, 2024 21:49

ryanbogan requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski and luyuncheng as code owners May 13, 2024 21:49

navneet1v mentioned this pull request May 13, 2024

[FEATURE] Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1287

Closed

5 tasks

navneet1v reviewed May 13, 2024

View reviewed changes

luyuncheng reviewed May 14, 2024

View reviewed changes

jmazanec15 reviewed May 15, 2024

View reviewed changes

Fix test bug and remove unnecessary validation

540782c

Signed-off-by: Ryan Bogan <[email protected]>

Remove cosineSimilOptimized

f872d83

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan requested review from navneet1v, luyuncheng and jmazanec15 May 17, 2024 17:23

Revert "Remove cosineSimilOptimized"

f5b76cf

This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan force-pushed the lucene_vector_util branch from dcb4f47 to f5b76cf Compare May 23, 2024 17:00

jmazanec15 approved these changes May 23, 2024

View reviewed changes

luyuncheng approved these changes May 24, 2024

View reviewed changes

ryanbogan merged commit 7a88f40 into opensearch-project:main May 24, 2024
48 of 50 checks passed

ryanbogan added the backport 2.x label May 24, 2024

ryanbogan deleted the lucene_vector_util branch May 24, 2024 16:00

opensearch-trigger-bot bot mentioned this pull request May 24, 2024

[Backport 2.x] Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1717

Merged

This was referenced Jun 25, 2024

Vector efficient math operations for KNN distance functions #617

Closed

[FEATURE] Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1699

Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1699

ryanbogan commented May 13, 2024 •

edited

Loading

navneet1v May 13, 2024

ryanbogan May 14, 2024

luyuncheng May 14, 2024

ryanbogan May 14, 2024

luyuncheng May 14, 2024

ryanbogan May 14, 2024

luyuncheng May 14, 2024

ryanbogan May 16, 2024

navneet1v May 16, 2024

navneet1v May 16, 2024

ryanbogan May 16, 2024

jmazanec15 May 16, 2024

jmazanec15 May 15, 2024

ryanbogan May 15, 2024

jmazanec15 May 15, 2024

ryanbogan May 16, 2024

codecov bot commented May 16, 2024 •

edited

Loading

ryanbogan commented May 21, 2024

navneet1v commented May 21, 2024

ryanbogan commented May 22, 2024

jmazanec15 commented May 23, 2024

luyuncheng left a comment

Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1699

Use the Lucene Distance Calculation Function in Script Scoring for doing exact search #1699

Conversation

ryanbogan commented May 13, 2024 • edited Loading

Description

Issues Resolved

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 16, 2024 • edited Loading

Codecov Report

ryanbogan commented May 21, 2024

navneet1v commented May 21, 2024

ryanbogan commented May 22, 2024

jmazanec15 commented May 23, 2024

luyuncheng left a comment

Choose a reason for hiding this comment

ryanbogan commented May 13, 2024 •

edited

Loading

codecov bot commented May 16, 2024 •

edited

Loading