Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Hit summaries (Hit highlighting 2.0) #372

Open
Jon-AtAWS opened this issue Oct 1, 2023 · 4 comments
Open

[FEATURE] Hit summaries (Hit highlighting 2.0) #372

Jon-AtAWS opened this issue Oct 1, 2023 · 4 comments
Labels
Features Introduces a new unit of functionality that satisfies a requirement

Comments

@Jon-AtAWS
Copy link
Member

Is your feature request related to a problem?

Search engines have been offering hit highlighting for years. Highlighting matching terms helps end users understand why a particular search result is present and how it relates to their queries. Hit highlights have several shortcomings. They are costly, require storing source data, and sometimes the matching terms are buried or truncated out of the highlight snippet.

What solution would you like?

Add a hit summarizer that uses an LLM to generate and return a summary of each hit as part of the search result. Users should be able to configure the model as a connector or local. They should be able to select a field or fields that will serve as the source text for the summary. We could even highlight matching terms from the query as a second pass on the summary. Summarization could happen at index time, since it will be driven by the contents of the selected fields, and/or could be generated at query time with the query as context for the LLM.

This seems especially relevant for sparse vector search, where the source terms might be mutated. It is equally applicable in cases where query pre-processing, or elements in a search pipeline have mutated the query in some way that makes highlights less comprehensible.

@navneet1v navneet1v added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Oct 5, 2023
@martin-gaievski
Copy link
Member

@Jon-AtAWS thanks for creating this request. Do you have any particular query or other component of neural search repo that needs the hit summary?

@Jon-AtAWS
Copy link
Member Author

@martin-gaievski,

No, this is not targeted at a specific query. My idea was to add summarize: {} to the query API that would be much like the highlight: {} query API. Users could add summarization, based on some fields and parameters specified in the body of the summarize part of the query. Overall, summarization is a generalization of hit highlighting in much the same way that semantic search is a generalization of lexical search. Hit highlights attempt to answer the question "why" did this match, but use terms to provide that answer. A summary also answers that question, but using semantic content instead.

As an alternative, we could consider extending the highlighter to use semantic information in some other way to select the highlighted term(s).

@martin-gaievski
Copy link
Member

I see what you mean now, it looks like a custom version of inner hits.

If the ask is more like a generic feature I suggest we move the issue to core OpenSearch repository. In scope of the plugin repository we're limited by existing features, you can think of it in following way: we can change the structure inside the query (neural or hybrid for this repo), but any clause outside the query should be a request to core OpenSearch.

@yuye-aws
Copy link
Member

yuye-aws commented Nov 6, 2024

Hi @Jon-AtAWS ! Are you wish to use the LLM to provide a summary on the searched results right?

Maybe you can consider a workaround: create a flow agent with a Search Index tool and an ML Model tool.

The Search Index tool has a hardcoded search query and the ML Model tool has a prompt, asking it to summarize the results returned from the search query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement
Projects
Status: Backlog
Development

No branches or pull requests

4 participants