-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Hit summaries (Hit highlighting 2.0) #372
Comments
@Jon-AtAWS thanks for creating this request. Do you have any particular query or other component of neural search repo that needs the hit summary? |
No, this is not targeted at a specific query. My idea was to add summarize: {} to the query API that would be much like the highlight: {} query API. Users could add summarization, based on some fields and parameters specified in the body of the summarize part of the query. Overall, summarization is a generalization of hit highlighting in much the same way that semantic search is a generalization of lexical search. Hit highlights attempt to answer the question "why" did this match, but use terms to provide that answer. A summary also answers that question, but using semantic content instead. As an alternative, we could consider extending the highlighter to use semantic information in some other way to select the highlighted term(s). |
I see what you mean now, it looks like a custom version of inner hits. If the ask is more like a generic feature I suggest we move the issue to core OpenSearch repository. In scope of the plugin repository we're limited by existing features, you can think of it in following way: we can change the structure inside the query (neural or hybrid for this repo), but any clause outside the query should be a request to core OpenSearch. |
Hi @Jon-AtAWS ! Are you wish to use the LLM to provide a summary on the searched results right? Maybe you can consider a workaround: create a flow agent with a Search Index tool and an ML Model tool. The Search Index tool has a hardcoded search query and the ML Model tool has a prompt, asking it to summarize the results returned from the search query. |
Is your feature request related to a problem?
Search engines have been offering hit highlighting for years. Highlighting matching terms helps end users understand why a particular search result is present and how it relates to their queries. Hit highlights have several shortcomings. They are costly, require storing source data, and sometimes the matching terms are buried or truncated out of the highlight snippet.
What solution would you like?
Add a hit summarizer that uses an LLM to generate and return a summary of each hit as part of the search result. Users should be able to configure the model as a connector or local. They should be able to select a field or fields that will serve as the source text for the summary. We could even highlight matching terms from the query as a second pass on the summary. Summarization could happen at index time, since it will be driven by the contents of the selected fields, and/or could be generated at query time with the query as context for the LLM.
This seems especially relevant for sparse vector search, where the source terms might be mutated. It is equally applicable in cases where query pre-processing, or elements in a search pipeline have mutated the query in some way that makes highlights less comprehensible.
The text was updated successfully, but these errors were encountered: