-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Improving Search relevancy through Generic Reranker interfaces #485
Comments
Hi Henry, Thanks, for putting this together. I have a few questions...
|
Thanks @dylan-tong-aws. I have a few responses!
This is a narrow use-case. Just take all your docs and ask a (text-to-float) language model how similar they are. Then sort based off of that. Nonetheless, this alone can give like a 15-20% boost to recall in the top couple, so I think it's worth knocking out. p.s. Ok, I read up on the cohere rerank api and it should be able to connect to this work more readily than without it |
Hi,
|
@navneet1v thanks!
|
thanks @HenryL27. Few comments/questions
|
@vamshin thanks
Implementation-wise I think this becomes a single "rerank" processor and depending on the type ("cross-encoder" here) it casts itself to whatever it needs to be or something |
If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.
As this inconsistencies are arriving because of
The main point in this was the name
I am not saying re-rank based on vector field. When doing query customer may put
My recommendation for this would be that these re-rankers model should run outside of OpenSearch cluster like remote models, where users can use GPU based instances for doing re-ranking. The reason is if the latency for re-ranking is in 100 of ms for like 100 records, then the feature become unusable.
we should explore this more. May be our local models deployed in some other services like Sagemakers etc, and not specifically cohere. |
True, this is possible. But in a case where I already have a rather complicated search pipeline I might not want to rewrite it all, and I'm not sure that saying "if you want to use a different value for
Maybe. I guess the assumption with reranking is generally that anything not in the top k is irrelevant and therefore doesn't need to be reranked; as such why return it in the first place? That said, you might want to see things that are not in the top k - in high-latency cases where reranking is constrained, or in testing/tuning cases where you want to see what the reranker doesn't get to see to troubleshoot. We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.
Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.
Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then. Regarding remote models, yep, I'm looking into that. Fundamentally the Connector interface should handle all the juicy api discrepancies between remote models. I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right? |
I don't completely agree on this thought. We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them. Being in the middle state is bad. One way to solve this is by OverSampling processor. Customer asked for X, we retrieved let say 2 * X, re-ranked all 2X documents and returned X documents back.
Re-ranking field is one name that come to my mind.
Yes they are key value pairs. But what I meant to say was, we need to handle this use case too, because doing queries like this, provides latency boost.
Yeah may be. Will wait for that to be out. |
@HenryL27, another scenario we're looking to support is for custom re-rank models that are hosted on an external model server like an Amazon SageMaker endpoint. The search pipeline will require more flexibility than what it takes to integrate with a managed API. The hosted model may simply be a classification/regression model that's trained for a re-ranking task (eg. XGBoost). The gist is that we'll need some flexibility around how the data transfer and protocol:
|
This should suffice. It's generally how second stage re-ranking works, and what we were planning to support. I can re-validate this with our customers--I didn't receive requirements to normalize the re-scored results with results that weren't re-scored. |
I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this
This LGTM! This can be the direction if rerankers cannot be generic |
@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores? I am not aware of use cases that require some sort of way to normalize and combine scores. As far as I know, customers just expect to re-rank "k" results or all default to all the results retrieved by the initial retrieval. There's no need for anything fancy. |
@dylan-tong-aws I am not sure if you understood what I was trying to say but its not definitely this. What I am trying to say is lets say if a customer goes ahead and retrieve 100 results, and we re-rank only first 50, then the score of first 50 documents and the later ones will not be consistent. We should be consistent in our results scores. |
@navneet1v, right, so if a user says return K number of re-score results, it just returns K results even if the first-stage retrieval had N > K results. It's my understanding that the proposal is to return N results and find a way to normalize the K re-score results so they are consistent. I am agreement that we can just return the K results. As far as I know, this sufficiently delivers on customer requirements. |
Ok, consensus on the top k issue; can I get thumbs up? We will simply rerank every search result that goes through the processor. Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay) |
Yes I am aligned with removing topK. If for any other reason a customer want to fetch more results and just want to re-rank few results, they can us Oversampling processor, as mentioned here in my previous comment. (#485 (comment)) |
This implementation still honors the search result limit safe guards and query timeout settings, correct? |
I think so? That's more a question for Froh I think |
Aligned on removing topK to keep consistent results. Also this is not a one way door decision. If use cases arise to expose such params we can always revisit. |
instead taking something like
|
This one is better abstraction. I am aligned on this.
On this I am not that much convinced for creating a processor. But we can leave this decision when the use case arrives. As of now lets go with concatenation. This one will have quite a similarity with the Summarization Processor. There also we might want to summarize multiple fields. So we can think of a common generic way.
This is a valid point if we just look from predict api standpoint. I am not sure if going forward predict api will be integrated with Agents framework of MLCommons, if yes then it becomes counter intuitive to say predict api as thin wrapper because then we can build an Re-ranking agent whom we pass search response, model and other information and it make sure that it gives the final re-ranked results. Again its not a use case for now, so lets keep park it for future.
For this can you put comment the interface which you have in mind. |
@navneet1v
In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API |
I found 2 re-rankers that doesn't use ML models. Please check this: https://github.com/opensearch-project/search-processor/tree/main/amazon-personalize-ranking, https://github.com/opensearch-project/search-processor/tree/main/amazon-kendra-intelligent-ranking I think we should see how we can merge all these interfaces or the interface that we are building is it extendable enough to support those re-rankers in future. |
@navneet1v
Implementing this within the framework I've provided should be fairly straightforward (just implement a Oh also to update on the architecture in case you haven't seen the latest changes to the PR: The factory now creates the context source fetchers based on the configuration, and they are used by the top-level |
@HenryL27 yeah this looks pretty neat. One more thing, in the original ML Model based re-ranker I see we want to use Another thing can we now update the proposal (by creating a new section with updated interfaces) and also add a comment with what changes we have done in the proposal. |
define "better". I used updated RFC |
because I have some suggestions but those are also not that great may be |
@navneet1v How about |
Yes that is fair. Hence I was saying my suggestions are not that great. :D In @HenryL27 can you update the proposal with final interfaces as summary and recommended approach. Once that is done we can ask @sean-zheng-amazon, @vamshin , @dylan-tong-aws to review. |
updated. @navneet1v to your satisfaction? |
@navneet1v so, where are we at with this? |
@HenryL27 So, did some discussion and here are some names that got suggested. Among all the above options I am leaning towards cc: @vamshin , @dylan-tong-aws |
@navneet1v |
Is ml_opensearch the only provider of rerankers inside OpenSearch? I know I seem like the "LTR Champion" or whatever, but how do you see Learning-to-Rank fitting in here? It's works at the shard level, so maybe it doesn't, but it might be good for users to think of the API for re-ranking as re-ranking, however it works under the covers. Is this feasible? |
@macohen shard level does seem to imply that it wouldn't fit in well as a rerank response processor. But maybe we at some point make a reranking search phase results processor or whatever it needs to be - and then I would simply give it a name like |
I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker. Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request. A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store? |
@dylan-tong-aws the |
@dylan-tong-aws thanks for adding the info. The way I will look at this is, basically 1 and 3 are the same thing. We need to fetch the context for the reranker from a source. As provided by @HenryL27 the interface is currently and we can add these fetchers as per the need. |
@navneet1v are there any next steps for me? Or am I just waiting on security review? |
@HenryL27 following this issue, very interested in this feature as I would be looking to build on top of this to enable Cohere rerank. Has there been any progress? |
@tianjing-li the code has been merged in the feature branch of Neural search plugin. https://github.com/opensearch-project/neural-search/tree/feature/reranker Soon it will be merged in the main branch and will be backported to 2.x. You can use the feature branch and start your development for cohere re-ranker. |
Problem statement
Addresses #248
Reranking the top search results with a cross-encoder has been shown to improve search relevance rather dramatically. We’d like to do that. Furthermore, we’d like to do that inside of OpenSearch, for a couple reasons: 1/ it belongs there - it’s a technique to make your search engine search better, and 2/ it needs to precede RAG to integrate with it - the retrieval that augments the generation needs to be as good as possible - and succeed the initial retrieval, obviously - so it should be in OpenSearch.
Goals
Non-goals
Proposed solution
Reranking will be implemented as a search response processor, similar to RAG. Cross-Encoders will be introduced into ml-commons to support this.
Architecture / Rerank Search Path
Rest APIs
Create Rerank Pipeline
"ml_opensearch" refers to the kind of rerank processor.
"model_id" should be the id of the
text_similarity
model in ml-commons"context" tells the pipeline how to construct the context it needs in order to rerank
"document_fields" are a list of fields of the document (in
_source
orfields
) to rerank based on. Multiple fields will be concatenated as strings.Query Rerank Pipeline
Provide to the search pipeline as a search ext the params for the reranker. Use either
"query_text"
, which acts as the direct text to compare all the docs against, or"query_text_path"
, which is an xpath that points to another location in the query object.For example, with a neural query we might have
"query_text_path": "query.neural.embedding.query_text"
The rerank processor will evaluate the all search results, and then sort them based on the new scores.
Upload Cross Encoder Model
This is not a new API and all the other model-based APIs should still work for the cross encoder model/function name with minimal work to integrate.
Predict with Cross Encoder Model
See the Cross-Encoder PR
Risks
Implementation Details
The overall reranking flow will be:
We will implement two main base classes for this work:
RerankProcessor
andContextSourceFetcher
.ContextSourceFetcher
This will retrieve the context needed to rerank documents. Essentially, step 1. A particular rerank processor may make use of several of these, and they can get their context from any source.
RerankProcessor
Orchestrates the flow by combining all the context from the ContextSourceFetchers, then generates scores for the documents via an
abstract score
method, then does the sorting.Extensibility
It is my hope that these interfaces are simple enough to extend and configure that we can create a rich ecosystem of rerank processors. To implement the cross-encoder reranker, all I need to do is create a
NlpComparisonReranker
subclass that says "score things with ml-commons", aDocumentContextSourceFetcher
subclass that retrieves fields from documents, and aQueryContextSourceFetcher
that retrieves context from the query ext.If I wanted to implement the Amazon Personalize reranker of the
search-processors
repo, I would implement anAmazonPersonalizeSourceContextFetcher
and anAmazonPersonalizeReranker
, which only have to do the minimal amount of work to make the logic functional.I also think is should be possible incorporate some of the work from the Score Normalization and Combination feature, but that's outside the scope of this RFC.
Alternative solutions
Rerank Query Type
Another option is to implement some kind of rerank query. This would wrap another query and rerank it. For example
Pros:
Cons:
The text was updated successfully, but these errors were encountered: