-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multi-stage ranking, per-shard or global #70877
Comments
Pinging @elastic/es-search (Team:Search) |
I usually see this as an antipattern, IMO, given the tons of results you need to fetch, and the lack of the index statistics and queries you'd want to run. Though I can understand it's simplicity for people that just want their numpy/pandas/model or whatever to just run in an API as expected without translating to Elasticsearch queries.
This would be great. Certainly happy to provide any advice here given our experience with the LTR plugin, though @joshdevins I know you have experience with the LTR plugin to. I'm not sure the overall system you're proposing, you mention a loss function in Elasticsearch, though in my experience the training tends to happen offline. I think what would be ideal to me is that if Elasticsearch could speak some standard model serialization format, and I could get as close to 'deploy my Python code/model to Elasticsearch' as possible. Doesn't need to be python code per-se, but if I trained something in sklearn, I'd want it to just be easily hosted in Elasticsearch with minimal fuss. Other kinds of second-pass searchesRelated topics:
|
I'm referring to Elasticsearch's ML capabilities (Data Frame Analytics) which can train models based on data in an index, such as for regression and classification. The ranking loss is just a mention to what we could add to what we can do today. We can however already import XGBoost and LGBM models for example through eland. So if you've trained an LGBM model with a ranking loss, we can do inference in Elasticsearch already (at ingest time only for now, not yet available to a reranker but could be), assuming you have the same features available to the model 😃 |
I agree with @softwaredoug in the "antipattern" part not only because it is an API leak (reranking is better to be incapsulated and be leveraging the search engine parallelism), but also because it adds another stage where interpretation of scores can be difficult without access to the document metadata and index statistics. In practice I've seen rare efforts to go beyond what search engine offers, limiting the progress of improving relevancy. And to make it practically useful, adding debug info, like time spent per stage, will help to bring the model to the spotlight and prove its efficiency, as well as giving the tools to compare models. |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
A common, modern approach to ranking is to perform multiple stages of reranking, with progressively more expensive but effective rerankers [1][2][3]. For example a BM25 first-stage will rank all documents as is typical today, then a second stage LTR (learning to rank) reranker could rerank the top-1000, and possibly a third neural reranker would rerank the last top-100. This would be a simple cascade of rerankers whereas today's functionality allows for only a single reranking, and only on the top-k per-shard.
This should also allow a configurable "location" of reranking — you should be able to choose if you want to rerank the top-k per-shard or after collecting the per-shard top-k and rerank a global top-k [4]. This request was recently discussed in #60946 however closed due to a lack of need, assuming that the global reranking could be done in the application layer. While this is true, there are a couple of possible benefits to reranking in the coordinating node that come to the top of my mind (perhaps there are more, and there are certainly some contrary arguments).
[1] Learning to Efficiently Rank with Cascades
[2] Phased Ranking in Vespa.ai
[3] Multi-Stage Document Ranking with BERT
[4] Ranking Relevance in Yahoo Search, Section 3.2 Contextual Reranking
The text was updated successfully, but these errors were encountered: