-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ByFieldRerank Processor (ReRankProcessor enhancement) #932
ByFieldRerank Processor (ReRankProcessor enhancement) #932
Conversation
Signed-off-by: Brian Flores <[email protected]>
Previously the ByField Rerank processor was used to handle a field that was only within the first level of the source map, instead customers may have a requirement to have a nested object in their document. This commit adds functionality to be able to handle these scenarios as well as deleting empty objects that are a result of deleting the targetField. Three unit tests were onboarding to ensure that the scores are retrieved and that the search response was able to rerank sucessfully; manual testing was done to see that removal empty maps were done when removing the targetField Signed-off-by: Brian Flores <[email protected]>
…s/neural-search into reRankByField-analysis
The functionality has been implemented all that is left is more UTs, hence I left the PR as a draft. I can't think of integration testing for it as it doesn't interact with a remote process. |
We can interact with local model, it should be fine for the sake of integ test, something similar to how it's done in existing test for rerank processor https://github.com/opensearch-project/neural-search/blob/main/src/test/java/org/opensearch/neuralsearch/processor/rerank/MLOpenSearchRerankProcessorIT.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the first round of review. Please address comments and let me know.
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
…unctionality In this commit I added more JavaDocs on methods and the initial class to help developers understand more in depth how the ByField Processor works. I added four more UTs that interact with the sourceMap modifcation feature I also implemented the feedback given regarding naming and the uneeded singular parameter in the RerankProcessorFactory Signed-off-by: Brian Flores <[email protected]>
This commit adds a IT where the ByFieldRerank processor interacts with the search pipeline workflow. It creates an index with sample data and the runs a query using the Rerank processor. Lastly it observes the result and tests that is has worked by checking the source map and the updated score Signed-off-by: Brian Flores <[email protected]>
This commit adds 4 UTs for the creation of the processor, There wasn't much more needed testing as the processor only required 1 mandatory field and doesn't use the context map. Signed-off-by: Brian Flores <[email protected]>
In this commit, UTs were added to assert that exceptions were being thrown the way expected (cases include not having a source map to perform reranking or having invalid mappings). There was a section of code that was not able to get hit but couldn't be avoided because the signature of the rescoreSearchResponse that could not throw an exception. Signed-off-by: Brian Flores <[email protected]>
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
@@ -218,5 +229,65 @@ public void testCrossEncoder_whenTooManyDocFields_thenFail() { | |||
() -> factory.create(Map.of(), TAG, DESC, false, config, pipelineContext) | |||
); | |||
} | |||
// End of MLOpenSearchRerankProcessor Tests | |||
|
|||
// Start of ByFieldRerankProcessor Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can make this comment part of the test method name
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/ByFieldRerankProcessor.java
Show resolved
Hide resolved
Signed-off-by: Brian Flores <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks.
Signed-off-by: Brian Flores <[email protected]>
CHANGELOG.md
Outdated
@@ -16,6 +16,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
|
|||
## [Unreleased 2.x](https://github.com/opensearch-project/neural-search/compare/2.18...2.x) | |||
### Features | |||
- Introduces ByFieldRerankProcessor for second level reranking on documents (This is done locally) ([#932](https://github.com/opensearch-project/neural-search/pull/932)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-gaievski should we add it in the changelog of main branch now as @brianf-aws already added in the release notes of 2.18?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @brianf-aws don't add change in changelog.md as the release notes are already cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just keep the change in release notes 2.18. file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood will remove from changelog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-gaievski should we add it in the changelog of main branch now as @brianf-aws already added in the release notes of 2.18?
we can skip changelog if we do have entry in the release notes
Signed-off-by: Brian Flores <[email protected]>
Adding skip changelog label because the release notes for 2.18 has already been merged. |
release-notes/opensearch-neural-search.release-notes-2.18.0.0.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Brian Flores <[email protected]>
a46beb1
into
opensearch-project:main
@@ -3,6 +3,9 @@ | |||
|
|||
Compatible with OpenSearch 2.18.0 | |||
|
|||
### Features | |||
- Introduces ByFieldRerankProcessor for second level reranking on documents ([#932](https://github.com/opensearch-project/neural-search/pull/932)) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this extra line 8. In the final formatting it will add 2 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it in main while merging PR #956
* Implements initial By Field re rank Signed-off-by: Brian Flores <[email protected]> (cherry picked from commit a46beb1)
* Implements initial By Field re rank Signed-off-by: Brian Flores <[email protected]> (cherry picked from commit a46beb1)
…ject#932) * Implements initial By Field re rank Signed-off-by: Brian Flores <[email protected]>
…960) * ByFieldRerank Processor (ReRankProcessor enhancement) (#932) Signed-off-by: Brian Flores <[email protected]>
…960) * ByFieldRerank Processor (ReRankProcessor enhancement) (#932) Signed-off-by: Brian Flores <[email protected]> (cherry picked from commit 858ff28)
…960) (#962) * ByFieldRerank Processor (ReRankProcessor enhancement) (#932) Signed-off-by: Brian Flores <[email protected]> (cherry picked from commit 858ff28) Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: Brian Flores <[email protected]> Co-authored-by: Martin Gaievski <[email protected]>
* Implements initial By Field re rank Signed-off-by: Brian Flores <[email protected]>
Description
The ByFieldRerank Processor applies a 2nd level re ranking on a search hit by specifying a targetField in the
_source
mapping of a search hit. It will update the_score
field with what has been provided bytarget_field
, Whenkeep_previous_score
is asserted a new field is appended calledprevious_score
which was the score prior to reranking. Currently we support both shallow target and also a nested field target such asa.b.c
given that this mapping exists.When deleting a target field as specified by
"remove_target_field": true
the processor will delete any empty maps as a result of this action for example, when the targetField isml_info.score_report
. It will transforminto the following
It was chosen this way to give users the flexibility to clean up their searchHit, however the default behavior was disabled to let new users adapt to the feature.
Related Issues
Resolves #926
Resolves opensearch-project/OpenSearch#15631
Changelog
keep_previous_score
to be able to give possibility of adding theprevious_score
field to the search hit. Was introduced in commit 66866a4void validate(SearchHit hit) throws IllegalArgumentException;
in a new Utility classprocessorSearchResponseUtil
so that future rerankers can take advantage of validating their data before reranking.Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.