-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation for pq m parameter before training starts #1713
Conversation
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
); | ||
} | ||
|
||
ValidationException methodValidation = methodComponent.validateWithData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to validate further even if we don't support space type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though validation will ultimately fail if space type is not supported, more error messages can be added to the errors based on any potential problems with the method component context
Signed-off-by: Ryan Bogan <[email protected]>
src/main/java/org/opensearch/knn/training/TrainingDataSpec.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1713 +/- ##
============================================
- Coverage 85.02% 84.96% -0.07%
- Complexity 1463 1486 +23
============================================
Files 178 178
Lines 5898 6026 +128
Branches 598 626 +28
============================================
+ Hits 5015 5120 +105
- Misses 632 649 +17
- Partials 251 257 +6 ☔ View full report in Codecov by Sentry. |
@@ -109,9 +109,6 @@ class Faiss extends NativeLibrary { | |||
.build() | |||
); | |||
|
|||
// TODO: To think about in future: for PQ, if dimension is not divisible by code count, PQ will fail. Right now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice - finally getting rid of it
src/main/java/org/opensearch/knn/training/TrainingDataSpec.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
* Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 3701d19)
* Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 3701d19) Co-authored-by: Ryan Bogan <[email protected]>
* Fix flaky test in Faiss JNI range search (#1705) Signed-off-by: Junqiu Lei <[email protected]> * Support script score when doc value is disabled and fix misusing DISI (#1696) * Revert "Revert 'Support script score when doc value is disabled' (#1662)" This reverts commit bd2f403. Signed-off-by: panguixin <[email protected]> * fix misusing doc value Signed-off-by: panguixin <[email protected]> * add changelog Signed-off-by: panguixin <[email protected]> --------- Signed-off-by: panguixin <[email protected]> * --- (#1712) updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update threshold value after new result is added (#1715) Signed-off-by: Heemin Kim <[email protected]> * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search (#1699) * Use the Lucene Distance Calculation Function in Script Scoring for doing exact search Signed-off-by: Ryan Bogan <[email protected]> * Add Changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Fix failing test Signed-off-by: Ryan Bogan <[email protected]> * fix test Signed-off-by: Ryan Bogan <[email protected]> * Fix test bug and remove unnecessary validation Signed-off-by: Ryan Bogan <[email protected]> * Remove cosineSimilOptimized Signed-off-by: Ryan Bogan <[email protected]> * Revert "Remove cosineSimilOptimized" This reverts commit f872d83. Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> * Add validation for pq m parameter before training starts (#1713) * Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> * Updating the BWC test config after 2.14 release (#1724) Signed-off-by: Navneet Verma <[email protected]> --------- Signed-off-by: Junqiu Lei <[email protected]> Signed-off-by: panguixin <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Heemin Kim <[email protected]> Signed-off-by: Ryan Bogan <[email protected]> Signed-off-by: Navneet Verma <[email protected]> Co-authored-by: Junqiu Lei <[email protected]> Co-authored-by: panguixin <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Heemin Kim <[email protected]> Co-authored-by: Ryan Bogan <[email protected]> Co-authored-by: Navneet Verma <[email protected]>
…project#1713) * Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]>
…project#1713) * Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]>
…project#1713) * Add validation for pq code count before training starts Signed-off-by: Ryan Bogan <[email protected]> * Add integration test Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Clean up code Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary lines Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Change framework to add validation with data Signed-off-by: Ryan Bogan <[email protected]> * Remove unused error message Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> * Change space type check name for readability Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Modify validation error wording and add json structure to tests Signed-off-by: Ryan Bogan <[email protected]> * Change TrainingDataSpec to VectorSpaceInfo Signed-off-by: Ryan Bogan <[email protected]> * Add unit tests Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]>
Description
Currently, if the dimension of a model is not divisible by the pq m parameter (code count) provided in the training request, training fails with a detailed message in the logs. In addition, the model state is update to failed and the model error is set to
"Failed to execute training. May be caused by an invalid method definition or not enough memory to perform training."
.This PR adds a validation check for the above case before training starts or any memory is allocated. The model state is still set to failed, but the error message is more specific.
Issues Resolved
#1075
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.