Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the analyzed text for highlighting #27934

Conversation

mayya-sharipova
Copy link
Contributor

  • Introduce index level settings to control the max number of character
    to be analyzed for highlighting
  • Throw an error if analysis is required on a larger text

Closes #27517

- Introduce index level settings to control the max number of character
to be analyzed for highlighting
- Throw an error if analysis is required on a larger text

Closes elastic#27517
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, I left some comment regarding naming.

* indexing with offsets or term vectors is recommended.
*/
public static final Setting<Integer> MAX_ANALYZED_OFFSET_SETTING =
Setting.intSetting("index.analyze.max_analyzed_offset", 10000, 1, Property.Dynamic, Property.IndexScope);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe index.highlight.max_analyzed_offset to indicate that it applies to highlighting only ?

/**
* Returns the maximum number of chars that will be analyzed in a highlight request
*/
public int getMaxAnalyzedOffset() { return this.maxAnalyzedOffset; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, getHighlightMaxAnalyzedOffset ?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mayya-sharipova mayya-sharipova merged commit cbd271e into elastic:master Dec 21, 2017
@mayya-sharipova mayya-sharipova deleted the limit-analyzed-text-for-highlight branch December 21, 2017 15:20
@mayya-sharipova
Copy link
Contributor Author

@jimczi thanks for the review, Jim

mayya-sharipova added a commit that referenced this pull request Dec 27, 2017
- Introduce index level settings to control the max number of character
to be analyzed for highlighting
- Create a deprecation warning if analysis is required on a larger text

Closes #27517
@Bargs
Copy link

Bargs commented Jan 9, 2018

Hi @mayya-sharipova and @jimczi. I was just testing this change and it looks like it's going to break Kibana for users with large fields. It's quite common for users to have fields with >10000 characters so with this change in 6.2 Kibana's Discover (which uses highlighting) will start throwing errors.

Is it necessary to throw an error? Without the error, users might see a change in highlighting on large fields but at least Kibana wouldn't stop working. The downside is that users may think highlighting is broken if they see matching text that is not highlighted. That might be an acceptable tradeoff and breaking change in a major release but I think either change might be surprising in a minor.

screen shot 2018-01-09 at 10 08 05 am

@jimczi
Copy link
Contributor

jimczi commented Jan 9, 2018

That might be an acceptable tradeoff and breaking change in a major release but I think either change might be surprising in a minor

The setting should be unlimited by default in 6.2. @mayya-sharipova can you change the value for that branch and maybe add a deprecation warning for texts bigger than the default limit in 7 ?

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Jan 9, 2018

@Bargs thank you for reporting this. Have you tested this on 6.2 and experienced the error?
Because for 6.2 we don't throw an error, we just issue a deprecation warning:
https://github.com/elastic/elasticsearch/blob/6.x/core/src/main/java/org/apache/lucene/search/uhighlight/CustomUnifiedHighlighter.java#L133

Errors are thrown only on 7.x

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Jan 9, 2018

@jimczi So, for 6.2 do we want to do following:
if the settings != LIMIT_7 and text exceeds the limit, we will issue an error
if the setting == LIMIT_7 and text exceeds the limit, we will issue a deprecation warning happening in the major version

Sorry, slack has connection problems for me, can't open it for 30 mins

@Bargs
Copy link

Bargs commented Jan 9, 2018

Sorry, like a fool I only tested quickly in master. If 6.2 doesn't respond with an error, Kibana should still work fine. Though we might end up spamming their deprecation logs on every search request.

@jimczi
Copy link
Contributor

jimczi commented Jan 10, 2018

So, for 6.2 do we want to do following:
if the settings != LIMIT_7 and text exceeds the limit, we will issue an error
if the setting == LIMIT_7 and text exceeds the limit, we will issue a deprecation warning happening in the major version

For 6.2 I would set the limit to -1 and allow updating this setting only if the value is > 0. This way if the value is -1 you know that the setting is not explicitly set. Then if the value is -1 you can issue a deprecation warning when the text exceeds the default value set in 7 (10,000) and if it's not -1 you can throw an error if the text exceeds the provided threshold.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jan 12, 2018
- Introduce index level setting "index.highlight.max_analyzed_offset"
to control the max number of character to be analyzed for highlighting
- Make this setting to be unset by default (equal to -1)
- Issue a deprecation warning if setting is unset and analysis is required
 on a text larger than ES v.7.x max setting (10000)
- Throw IllegalArgumentException is setting is set by a user, and
analysis is required on a text larger than the user's set value.

Closes elastic#27517

Adding validator for index.highlight.max_analyzed_offset setting
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Mar 6, 2018
Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to elastic#27934, elastic/kibana#16764
mayya-sharipova added a commit that referenced this pull request Mar 6, 2018
Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to #27934, elastic/kibana#16764
mayya-sharipova added a commit that referenced this pull request Mar 6, 2018
Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to #27934, elastic/kibana#16764
@kiju98
Copy link

kiju98 commented Mar 22, 2018

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants