-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added validations for score combination weights in Hybrid Search #265
Added validations for score combination weights in Hybrid Search #265
Conversation
Codecov Report
@@ Coverage Diff @@
## main #265 +/- ##
============================================
+ Coverage 85.44% 85.77% +0.32%
- Complexity 370 376 +6
============================================
Files 30 30
Lines 1079 1104 +25
Branches 165 168 +3
============================================
+ Hits 922 947 +25
Misses 83 83
Partials 74 74
|
6624246
to
71dd2e7
Compare
I wonder why |
71dd2e7
to
eeb1af0
Compare
Mostly to keep consistency with other features in OpenSearch, I've been told that we consider that a user expectations that all weights sum up to 1.0. For instance, here https://opensearch.org/docs/latest/search-plugins/search-pipelines/personalize-search-ranking/ we use two weights, one is set to Technically we can work with any number but I guess that can lead to a mess as there is no single standard then. |
return; | ||
} | ||
if (scores.length != weights.size()) { | ||
log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need this logging.
scores.length | ||
) | ||
); | ||
throw new IllegalArgumentException("number of weights must match number of sub-queries in hybrid query"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning size of weights and queries should be fine here. So security concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I do it as a rule of thumb, but I guess that's ok here. Let me push a commit
private void validateWeights(final List<Float> weightsList) { | ||
boolean isOutOfRange = weightsList.stream().anyMatch(weight -> !Range.between(0.0f, 1.0f).contains(weight)); | ||
if (isOutOfRange) { | ||
log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Returning submitted weights in exception should be fine rather than logging duplicated message because we already knows the value is float. And float won't cause any security risk by getting returned in exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
} | ||
float sumOfWeights = weightsList.stream().reduce(0.0f, Float::sum); | ||
if (!DoubleMath.fuzzyEquals(1.0f, sumOfWeights, DELTA_FOR_SCORE_ASSERTION)) { | ||
log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. No need an additional logging here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
c408f44
to
84454f6
Compare
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
84454f6
to
614be2c
Compare
Signed-off-by: Martin Gaievski <[email protected]>
a3dc7bd
to
67e19ca
Compare
* Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6)
* Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6)
… (#268) * Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6) Co-authored-by: Martin Gaievski <[email protected]>
… (#269) * Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6) Co-authored-by: Martin Gaievski <[email protected]>
Description
Enforcing checks and limits of weights for score normalization.
Following rules added:
Weight value and total sum of all weights will be checked during search processor creation. Match between number of sub-queries and weights will be checked in search time.
Issues Resolved
#123
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.