-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include all sentences smaller than fragment_size in the unified highlighter #28132
Merged
jimczi
merged 2 commits into
elastic:master
from
jimczi:enhancements/unified_expand_fragment_length
Jan 11, 2018
Merged
Include all sentences smaller than fragment_size in the unified highlighter #28132
jimczi
merged 2 commits into
elastic:master
from
jimczi:enhancements/unified_expand_fragment_length
Jan 11, 2018
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ighter The unified highlighter selects a single sentence per fragment from the offset of the first highlighted term. This change modifies this selection and allows more than one sentence in a single fragment. The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff the overall size of the fragment is smaller than the maximum length (fragment_size). We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment. If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in the core of the unified highlighter. Closes elastic#28089
jimczi
added
:Search Relevance/Highlighting
How a query matched a document
>enhancement
v6.2.0
v7.0.0
labels
Jan 8, 2018
LGTM |
romseygeek
approved these changes
Jan 11, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jimczi
added a commit
that referenced
this pull request
Jan 11, 2018
…ighter (#28132) The unified highlighter selects a single sentence per fragment from the offset of the first highlighted term. This change modifies this selection and allows more than one sentence in a single fragment. The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff the overall size of the fragment is smaller than the maximum length (fragment_size). We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment. If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in the core of the unified highlighter. Closes #28089
jasontedor
added a commit
that referenced
this pull request
Jan 11, 2018
* master: (43 commits) Rename core module to server (#28180) upgraded jna from 4.4.0-1 to 4.5.1 (#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (#28041) Include all sentences smaller than fragment_size in the unified highlighter (#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (#28174) [Docs] Remove Kerberos/SPNEGO Shield plugin (#28019) Ignore null value for range field (#27845) (#28116) Fix environment variable substitutions in list setting (#28106) docs: Replaces indexed script java api docs with stored script api docs test: ensure we endup with a single segment Make sure that we don't detect files as maven coordinate when installing a plugin (#28163) [Tests] temporary disable meta plugin rest tests #28163 meta-plugin should install bin and config at the top level (#28162) Painless: Add public member read/write access test. (#28156) Docs: Clarify password protection support with keystore (#28157) [Docs] fix plugin properties inclusion for plugins authors ...
jasontedor
added a commit
that referenced
this pull request
Jan 11, 2018
* 6.x: (41 commits) Rename core module to server (#28190) [Test] Remove superfluous lines [Test] Add skip test for index range field with null values upgraded jna from 4.4.0-1 to 4.5.1 (#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads [Docs] Correct response json in rank-eval.asciidoc Fix environment variable substitutions in list setting (#28106) Add scroll parameter to _reindex API (#28041) Include all sentences smaller than fragment_size in the unified highlighter (#28132) [Docs] Improvements in script-fields.asciidoc (#28174) [Docs] Remove Kerberos/SPNEGO Shield plugin (#28019) Ignore null value for range field (#27845) (#28116) docs: Replaces indexed script java api docs with stored script api docs test: ensure we endup with a single segment Make sure that we don't detect files as maven coordinate when installing a plugin (#28163) [Tests] temporary disable meta plugin rest tests #28163 meta-plugin should install bin and config at the top level (#28162) Painless: Add public member read/write access test. (#28156) Docs: Clarify password protection support with keystore (#28157) [Docs] fix plugin properties inclusion for plugins authors ...
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Jan 13, 2018
* master: (30 commits) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (elastic#28041) Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (elastic#28174) [Docs] Remove Kerberos/SPNEGO Shield plugin (elastic#28019) Ignore null value for range field (elastic#27845) (elastic#28116) Fix environment variable substitutions in list setting (elastic#28106) ...
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Jan 16, 2018
* compile-with-jdk-9: (56 commits) TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (elastic#28041) Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (elastic#28174) ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently the
unified
highlighter selects a single sentence per fragment from the offset of the first highlighted term no matter if this sentence is smaller than the providedfragment_size
.This change modifies this selection and allows more than one sentence in a single fragment.
The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff the overall size of the fragment is smaller than the maximum length (fragment_size).
We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment.
If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in the core of the unified highlighter.
Closes #28089