Added validations for score combination weights in Hybrid Search #265

martin-gaievski · 2023-08-28T22:00:08Z

Description

Enforcing checks and limits of weights for score normalization.
Following rules added:

each weight must be between 0.0 and 1.0
total sum of weights for score combination must be 1.0
number of sub-queries and number of weights must match

Weight value and total sum of all weights will be checked during search processor creation. Match between number of sub-queries and weights will be checked in search time.

Issues Resolved

#123

Check List

New functionality includes testing.
- All tests pass
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2023-08-28T22:05:54Z

Codecov Report

Merging #265 (67e19ca) into main (75b59cd) will increase coverage by 0.32%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main     #265      +/-   ##
============================================
+ Coverage     85.44%   85.77%   +0.32%     
- Complexity      370      376       +6     
============================================
  Files            30       30              
  Lines          1079     1104      +25     
  Branches        165      168       +3     
============================================
+ Hits            922      947      +25     
  Misses           83       83              
  Partials         74       74

Files Changed	Coverage Δ
...ation/ArithmeticMeanScoreCombinationTechnique.java	`90.00% <100.00%> (+0.52%)`	⬆️
...nation/GeometricMeanScoreCombinationTechnique.java	`94.44% <100.00%> (+0.32%)`	⬆️
...ination/HarmonicMeanScoreCombinationTechnique.java	`94.44% <100.00%> (+0.32%)`	⬆️
...ch/processor/combination/ScoreCombinationUtil.java	`97.72% <100.00%> (+2.27%)`	⬆️

heemin32 · 2023-08-28T22:53:55Z

I wonder why each weight must be between 0.0 and 1.0. As long as we divide it by sum of weights during calculation, the the effective sum will be 1 and each weight will be between 0.0 and 1.0. Should we have to enforce it from customer input?

martin-gaievski · 2023-08-28T23:34:44Z

I wonder why each weight must be between 0.0 and 1.0. As long as we divide it by sum of weights during calculation, the the effective sum will be 1 and each weight will be between 0.0 and 1.0. Should we have to enforce it from customer input?

Mostly to keep consistency with other features in OpenSearch, I've been told that we consider that a user expectations that all weights sum up to 1.0. For instance, here https://opensearch.org/docs/latest/search-plugins/search-pipelines/personalize-search-ranking/ we use two weights, one is set to x and other will be automatically calculated as 1.0 - x.

Technically we can work with any number but I guess that can lead to a mess as there is no single standard then.

heemin32 · 2023-08-29T16:38:48Z

src/main/java/org/opensearch/neuralsearch/processor/combination/ScoreCombinationUtil.java

+            return;
+        }
+        if (scores.length != weights.size()) {
+            log.error(


No need this logging.

heemin32 · 2023-08-29T16:39:35Z

src/main/java/org/opensearch/neuralsearch/processor/combination/ScoreCombinationUtil.java

+                    scores.length
+                )
+            );
+            throw new IllegalArgumentException("number of weights must match number of sub-queries in hybrid query");


Returning size of weights and queries should be fine here. So security concern.

Yup, I do it as a rule of thumb, but I guess that's ok here. Let me push a commit

heemin32 · 2023-08-29T16:41:00Z

src/main/java/org/opensearch/neuralsearch/processor/combination/ScoreCombinationUtil.java

+    private void validateWeights(final List<Float> weightsList) {
+        boolean isOutOfRange = weightsList.stream().anyMatch(weight -> !Range.between(0.0f, 1.0f).contains(weight));
+        if (isOutOfRange) {
+            log.error(


Same here. Returning submitted weights in exception should be fine rather than logging duplicated message because we already knows the value is float. And float won't cause any security risk by getting returned in exception.

heemin32 · 2023-08-29T16:41:25Z

src/main/java/org/opensearch/neuralsearch/processor/combination/ScoreCombinationUtil.java

+        }
+        float sumOfWeights = weightsList.stream().reduce(0.0f, Float::sum);
+        if (!DoubleMath.fuzzyEquals(1.0f, sumOfWeights, DELTA_FOR_SCORE_ASSERTION)) {
+            log.error(


Same. No need an additional logging here.

Signed-off-by: Martin Gaievski <[email protected]>

* Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6)

… (#268) * Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6) Co-authored-by: Martin Gaievski <[email protected]>

… (#269) * Added strong check on number of weights equals number of sub-queries Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 685d5d6) Co-authored-by: Martin Gaievski <[email protected]>

martin-gaievski added Enhancements Increases software capabilities beyond original client specifications backport 2.x Label will add auto workflow to backport PR to 2.x branch backport 2.10 backport to 2.10 branch labels Aug 28, 2023

martin-gaievski changed the title ~~Added strong check on number of weights equals number of sub-queries for Hybrid Search~~ Added validations for score combination weights in Hybrid Search Aug 28, 2023

martin-gaievski force-pushed the add-checks-weights-must-match-num-of-subqueries branch from 6624246 to 71dd2e7 Compare August 28, 2023 22:45

martin-gaievski force-pushed the add-checks-weights-must-match-num-of-subqueries branch from 71dd2e7 to eeb1af0 Compare August 28, 2023 23:17

martin-gaievski marked this pull request as ready for review August 28, 2023 23:42

martin-gaievski requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, sean-zheng-amazon, model-collapse, wujunshen, zane-neo, ylwu-amzn and jngz-es as code owners August 28, 2023 23:42

heemin32 reviewed Aug 29, 2023

View reviewed changes

martin-gaievski force-pushed the add-checks-weights-must-match-num-of-subqueries branch from c408f44 to 84454f6 Compare August 29, 2023 20:54

martin-gaievski added 2 commits August 29, 2023 18:16

Added strong check on number of weights equals number of sub-queries

5a555f0

Signed-off-by: Martin Gaievski <[email protected]>

Put more info to exception messages and remove extra logs

614be2c

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the add-checks-weights-must-match-num-of-subqueries branch from 84454f6 to 614be2c Compare August 30, 2023 01:17

Fixed tests after rebase on recent main

67e19ca

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the add-checks-weights-must-match-num-of-subqueries branch from a3dc7bd to 67e19ca Compare August 30, 2023 01:42

martin-gaievski requested a review from heemin32 August 30, 2023 03:16

navneet1v approved these changes Aug 30, 2023

View reviewed changes

heemin32 approved these changes Aug 30, 2023

View reviewed changes

martin-gaievski merged commit 685d5d6 into opensearch-project:main Aug 30, 2023

opensearch-trigger-bot bot mentioned this pull request Aug 30, 2023

[Backport 2.x] Added validations for score combination weights in Hybrid Search #268

Merged

opensearch-trigger-bot bot mentioned this pull request Aug 30, 2023

[Backport 2.10] Added validations for score combination weights in Hybrid Search #269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added validations for score combination weights in Hybrid Search #265

Added validations for score combination weights in Hybrid Search #265

martin-gaievski commented Aug 28, 2023

codecov bot commented Aug 28, 2023 •

edited

Loading

heemin32 commented Aug 28, 2023 •

edited

Loading

martin-gaievski commented Aug 28, 2023

heemin32 Aug 29, 2023

heemin32 Aug 29, 2023

martin-gaievski Aug 29, 2023

heemin32 Aug 29, 2023

martin-gaievski Aug 29, 2023

heemin32 Aug 29, 2023

martin-gaievski Aug 29, 2023

Added validations for score combination weights in Hybrid Search #265

Added validations for score combination weights in Hybrid Search #265

Conversation

martin-gaievski commented Aug 28, 2023

Description

Issues Resolved

Check List

codecov bot commented Aug 28, 2023 • edited Loading

Codecov Report

heemin32 commented Aug 28, 2023 • edited Loading

martin-gaievski commented Aug 28, 2023

heemin32 Aug 29, 2023

Choose a reason for hiding this comment

heemin32 Aug 29, 2023

Choose a reason for hiding this comment

martin-gaievski Aug 29, 2023

Choose a reason for hiding this comment

heemin32 Aug 29, 2023

Choose a reason for hiding this comment

martin-gaievski Aug 29, 2023

Choose a reason for hiding this comment

heemin32 Aug 29, 2023

Choose a reason for hiding this comment

martin-gaievski Aug 29, 2023

Choose a reason for hiding this comment

codecov bot commented Aug 28, 2023 •

edited

Loading

heemin32 commented Aug 28, 2023 •

edited

Loading