-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geometric mean normalization for scores #239
Add geometric mean normalization for scores #239
Conversation
9c78771
to
a62afaa
Compare
Codecov Report
@@ Coverage Diff @@
## feature/normalization #239 +/- ##
===========================================================
+ Coverage 82.43% 86.23% +3.80%
- Complexity 323 337 +14
===========================================================
Files 26 28 +2
Lines 979 981 +2
Branches 153 153
===========================================================
+ Hits 807 846 +39
+ Misses 108 69 -39
- Partials 64 66 +2
|
Signed-off-by: Martin Gaievski <[email protected]>
a62afaa
to
793f05a
Compare
...ensearch/neuralsearch/processor/combination/GeometricMeanScoreCombinationTechniqueTests.java
Show resolved
Hide resolved
… values Signed-off-by: Martin Gaievski <[email protected]>
* Verify score correctness by using alternative formula for geometric mean as n-th root of product of weighted scores, | ||
* more details in here https://en.wikipedia.org/wiki/Weighted_geometric_mean | ||
*/ | ||
private float geometricMean(List<Float> scores, List<Double> weights) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a doubt on the effectiveness of this test code. l believe we don’t need test code based on random number. I would like to hear other opinions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any concern as long as test is able to fail if there are changes in formula. Let's just make sure that it doesn't become flaky because of floating point precision losses.
Signed-off-by: Martin Gaievski <[email protected]>
e91da54
to
f2fdcbe
Compare
Description
Adding geometric mean technique, that is a generalization of the mean that is based on product and N-th root of N values (more details here). Weights are supported similarly what it's done in arithmetic mean. Example of pipeline with processor config:
In addition to main changes there are some refactoring in integ tests. I have to put it to this PR because with few new tests added for geometric mean auto redeploy feature started acting more aggressively and tests became flaky.
Issues Resolved
#228, part of solution for #126
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.