Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByFieldRerank Processor (ReRankProcessor enhancement) #932

Merged

Conversation

brianf-aws
Copy link
Contributor

@brianf-aws brianf-aws commented Oct 14, 2024

Description

{
    "response_processors": [
        {
            "rerank": {
                "by_field": {
                    "target_field": "ml_score",
                     "remove_target_field": true, ## Default false
                    "keep_previous_score" : true ## Default false
                }
            }
        }
    ]
}

The ByFieldRerank Processor applies a 2nd level re ranking on a search hit by specifying a targetField in the _source mapping of a search hit. It will update the _score field with what has been provided by target_field, When keep_previous_score is asserted a new field is appended called previous_score which was the score prior to reranking. Currently we support both shallow target and also a nested field target such as a.b.c given that this mapping exists.

When deleting a target field as specified by "remove_target_field": true the processor will delete any empty maps as a result of this action for example, when the targetField is ml_info.score_report. It will transform

{
   "my_field": "value"
   "ml_info" : {
       "score_report" : 27
   }
}

into the following

{
   "my_field": "value"
}

It was chosen this way to give users the flexibility to clean up their searchHit, however the default behavior was disabled to let new users adapt to the feature.

Related Issues

Resolves #926
Resolves opensearch-project/OpenSearch#15631

Changelog

  • 10/17/2024 : onboarded a new parameter to keep_previous_score to be able to give possibility of adding the previous_score field to the search hit. Was introduced in commit 66866a4
  • 10/21/2024: Added a new functional interface void validate(SearchHit hit) throws IllegalArgumentException; in a new Utility class processorSearchResponseUtil so that future rerankers can take advantage of validating their data before reranking.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Previously the ByField Rerank processor was used to handle a field that was only within the first level of the source map, instead customers may have a requirement to have a nested object in their document. This commit adds functionality to be able to handle these scenarios as well as deleting empty objects that are a result of deleting the targetField. Three unit tests were onboarding to ensure that the scores are retrieved and that the search response was able to rerank sucessfully; manual testing was done to see that removal empty maps were done when removing the targetField

Signed-off-by: Brian Flores <[email protected]>
@brianf-aws
Copy link
Contributor Author

The functionality has been implemented all that is left is more UTs, hence I left the PR as a draft. I can't think of integration testing for it as it doesn't interact with a remote process.

@martin-gaievski
Copy link
Member

The functionality has been implemented all that is left is more UTs, hence I left the PR as a draft. I can't think of integration testing for it as it doesn't interact with a remote process.

We can interact with local model, it should be fine for the sake of integ test, something similar to how it's done in existing test for rerank processor https://github.com/opensearch-project/neural-search/blob/main/src/test/java/org/opensearch/neuralsearch/processor/rerank/MLOpenSearchRerankProcessorIT.java

@martin-gaievski martin-gaievski added the Features Introduces a new unit of functionality that satisfies a requirement label Oct 15, 2024
Copy link
Member

@vibrantvarun vibrantvarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the first round of review. Please address comments and let me know.

…unctionality

In this commit I added more JavaDocs on methods and the initial class to help developers understand more in depth how the ByField Processor works. I added four more UTs that interact with the sourceMap modifcation feature I also implemented the feedback given regarding naming and the uneeded singular parameter in the RerankProcessorFactory

Signed-off-by: Brian Flores <[email protected]>
This commit adds a IT where the ByFieldRerank processor interacts with the search pipeline workflow. It creates an index with sample data and the runs a query using the Rerank processor. Lastly it observes the result and tests that is has worked by checking the source map and the updated score

Signed-off-by: Brian Flores <[email protected]>
This commit adds 4 UTs for the creation of the processor, There wasn't much more needed testing as the processor only required 1 mandatory field and doesn't use the context map.

Signed-off-by: Brian Flores <[email protected]>
In this commit, UTs were added to assert that exceptions were being thrown the way expected (cases include not having a source map to perform reranking or having invalid mappings). There was a section of code that was not able to get hit but couldn't be avoided because the signature of the rescoreSearchResponse that could not throw an exception.

Signed-off-by: Brian Flores <[email protected]>
@@ -218,5 +229,65 @@ public void testCrossEncoder_whenTooManyDocFields_thenFail() {
() -> factory.create(Map.of(), TAG, DESC, false, config, pipelineContext)
);
}
// End of MLOpenSearchRerankProcessor Tests

// Start of ByFieldRerankProcessor Tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can make this comment part of the test method name

@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.18.0 labels Oct 22, 2024
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thank you

Copy link
Member

@vibrantvarun vibrantvarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks.

CHANGELOG.md Outdated
@@ -16,6 +16,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

## [Unreleased 2.x](https://github.com/opensearch-project/neural-search/compare/2.18...2.x)
### Features
- Introduces ByFieldRerankProcessor for second level reranking on documents (This is done locally) ([#932](https://github.com/opensearch-project/neural-search/pull/932))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski should we add it in the changelog of main branch now as @brianf-aws already added in the release notes of 2.18?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brianf-aws don't add change in changelog.md as the release notes are already cut.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep the change in release notes 2.18. file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood will remove from changelog

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski should we add it in the changelog of main branch now as @brianf-aws already added in the release notes of 2.18?

we can skip changelog if we do have entry in the release notes

@vibrantvarun
Copy link
Member

Adding skip changelog label because the release notes for 2.18 has already been merged.

@martin-gaievski martin-gaievski merged commit a46beb1 into opensearch-project:main Oct 22, 2024
36 of 37 checks passed
@@ -3,6 +3,9 @@

Compatible with OpenSearch 2.18.0

### Features
- Introduces ByFieldRerankProcessor for second level reranking on documents ([#932](https://github.com/opensearch-project/neural-search/pull/932))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this extra line 8. In the final formatting it will add 2 lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it in main while merging PR #956

opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 22, 2024
* Implements initial By Field re rank

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit a46beb1)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 22, 2024
* Implements initial By Field re rank

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit a46beb1)
brianf-aws added a commit to brianf-aws/neural-search that referenced this pull request Oct 22, 2024
…ject#932)

* Implements initial By Field re rank

Signed-off-by: Brian Flores <[email protected]>
martin-gaievski pushed a commit that referenced this pull request Oct 22, 2024
…960)

* ByFieldRerank Processor (ReRankProcessor enhancement) (#932)

Signed-off-by: Brian Flores <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 22, 2024
…960)

* ByFieldRerank Processor (ReRankProcessor enhancement) (#932)

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 858ff28)
martin-gaievski added a commit that referenced this pull request Oct 22, 2024
…960) (#962)

* ByFieldRerank Processor (ReRankProcessor enhancement) (#932)

Signed-off-by: Brian Flores <[email protected]>
(cherry picked from commit 858ff28)

Signed-off-by: Martin Gaievski <[email protected]>
Co-authored-by: Brian Flores <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
@brianf-aws brianf-aws deleted the reRankByField-analysis branch October 25, 2024 21:22
martin-gaievski pushed a commit that referenced this pull request Nov 18, 2024
* Implements initial By Field re rank

Signed-off-by: Brian Flores <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch backport 2.18 Features Introduces a new unit of functionality that satisfies a requirement skip-changelog v2.18.0
Projects
Status: Done
5 participants