[FEATURE] Hit summaries (Hit highlighting 2.0) #372

Jon-AtAWS · 2023-10-01T19:44:22Z

Is your feature request related to a problem?

Search engines have been offering hit highlighting for years. Highlighting matching terms helps end users understand why a particular search result is present and how it relates to their queries. Hit highlights have several shortcomings. They are costly, require storing source data, and sometimes the matching terms are buried or truncated out of the highlight snippet.

What solution would you like?

Add a hit summarizer that uses an LLM to generate and return a summary of each hit as part of the search result. Users should be able to configure the model as a connector or local. They should be able to select a field or fields that will serve as the source text for the summary. We could even highlight matching terms from the query as a second pass on the summary. Summarization could happen at index time, since it will be driven by the contents of the selected fields, and/or could be generated at query time with the query as context for the LLM.

This seems especially relevant for sparse vector search, where the source terms might be mutated. It is equally applicable in cases where query pre-processing, or elements in a search pipeline have mutated the query in some way that makes highlights less comprehensible.

martin-gaievski · 2024-11-05T17:32:12Z

@Jon-AtAWS thanks for creating this request. Do you have any particular query or other component of neural search repo that needs the hit summary?

Jon-AtAWS · 2024-11-05T19:19:34Z

@martin-gaievski,

No, this is not targeted at a specific query. My idea was to add summarize: {} to the query API that would be much like the highlight: {} query API. Users could add summarization, based on some fields and parameters specified in the body of the summarize part of the query. Overall, summarization is a generalization of hit highlighting in much the same way that semantic search is a generalization of lexical search. Hit highlights attempt to answer the question "why" did this match, but use terms to provide that answer. A summary also answers that question, but using semantic content instead.

As an alternative, we could consider extending the highlighter to use semantic information in some other way to select the highlighted term(s).

martin-gaievski · 2024-11-05T19:50:03Z

I see what you mean now, it looks like a custom version of inner hits.

If the ask is more like a generic feature I suggest we move the issue to core OpenSearch repository. In scope of the plugin repository we're limited by existing features, you can think of it in following way: we can change the structure inside the query (neural or hybrid for this repo), but any clause outside the query should be a request to core OpenSearch.

yuye-aws · 2024-11-06T02:14:01Z

Hi @Jon-AtAWS ! Are you wish to use the LLM to provide a summary on the searched results right?

Maybe you can consider a workaround: create a flow agent with a Search Index tool and an ML Model tool.

The Search Index tool has a hardcoded search query and the ML Model tool has a prompt, asking it to summarize the results returned from the search query.

Jon-AtAWS added the untriaged label Oct 1, 2023

navneet1v added this to Vector Search RoadMap Oct 5, 2023

github-project-automation bot moved this to Backlog in Vector Search RoadMap Oct 5, 2023

navneet1v added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Hit summaries (Hit highlighting 2.0) #372

[FEATURE] Hit summaries (Hit highlighting 2.0) #372

Jon-AtAWS commented Oct 1, 2023

martin-gaievski commented Nov 5, 2024

Jon-AtAWS commented Nov 5, 2024

martin-gaievski commented Nov 5, 2024

yuye-aws commented Nov 6, 2024

[FEATURE] Hit summaries (Hit highlighting 2.0) #372

[FEATURE] Hit summaries (Hit highlighting 2.0) #372

Comments

Jon-AtAWS commented Oct 1, 2023

Is your feature request related to a problem?

What solution would you like?

martin-gaievski commented Nov 5, 2024

Jon-AtAWS commented Nov 5, 2024

martin-gaievski commented Nov 5, 2024

yuye-aws commented Nov 6, 2024