Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified highlighter: include additional context outside of highlighted sentence to reach target fragment_size #28089

Closed
marshalium opened this issue Jan 5, 2018 · 4 comments
Labels
>feature :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@marshalium
Copy link

Describe the feature

Currently, the unified highlighter can only provide context by including the sentence the highlighted word is in. This is sometimes a very short highlight. For example, given text in a field like this:

Some leading context. A short sentence. Some more content. And even more context around that sentence.

Running a query for the term sentence using the unified highlighter and fragment_size set to 300, results in a highlight that, while it includes the word that we're looking for, does not provide much context and is nowhere close to the target size requested:

A short <em>sentence</em>.

In contrast, run the same query with the plain highlighter results in a highlight with much more useful context (and in this case another highlighted word!):

Some leading context. A short <em>sentence</em>. Some more content. And even more context around that <em>sentence</em>.

The unified highlighter should include as much context as possible without going over the target fragment size. This will result in more consistently sized highlights (which is nice for visual consistency) and will provide more useful context in cases where the highlight occurs in a short sentence.

cc: @colings86

@marshalium marshalium added :Search Relevance/Highlighting How a query matched a document >feature labels Jan 5, 2018
jimczi added a commit to jimczi/elasticsearch that referenced this issue Jan 8, 2018
…ighter

The unified highlighter selects a single sentence per fragment from the offset of the first highlighted term.
This change modifies this selection and allows more than one sentence in a single fragment.
The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff
the overall size of the fragment is smaller than the maximum length (fragment_size).
We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the
unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment.
If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in
the core of the unified highlighter.

Closes elastic#28089
jimczi added a commit that referenced this issue Jan 11, 2018
…ighter (#28132)

The unified highlighter selects a single sentence per fragment from the offset of the first highlighted term.
This change modifies this selection and allows more than one sentence in a single fragment.
The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff the overall size of the fragment is smaller than the maximum length (fragment_size).
We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment.
If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in the core of the unified highlighter.

Closes #28089
jimczi added a commit that referenced this issue Jan 11, 2018
…ighter (#28132)

The unified highlighter selects a single sentence per fragment from the offset of the first highlighted term.
This change modifies this selection and allows more than one sentence in a single fragment.
The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff the overall size of the fragment is smaller than the maximum length (fragment_size).
We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment.
If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in the core of the unified highlighter.

Closes #28089
@marshalium
Copy link
Author

Thank you @jimczi!

@lplazas
Copy link

lplazas commented Mar 22, 2018

Sorry to update on an old thread. Since what version is this fix available?

@jimczi
Copy link
Contributor

jimczi commented Mar 22, 2018

@lfplazas10 you can see the version on the linked pr:
#28132
It is available since 6.2.0

@zmays
Copy link

zmays commented Mar 7, 2019

Note the pr only expands context to the right of the match. Any sentences to the left (i.e leading context) are not included at the moment

@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants