Kibana use highlighting only on small fields or fields indexed with offsets #16764

mayya-sharipova · 2018-02-15T18:24:05Z

Kibana version: 6.2

Elasticsearch version: 6.2

Describe the feature:
Since 6.2, we have modified highlighting to limit the analyzed text for highlighting to 10000 chars. See elastic/elasticsearch#27934.
Unless a field is indexed with offsets ("index_options": "offsets" or "term_vector": "with_positions_offsets"), or unless index setting index.highlight.max_analyzed_offset is set to higher than 10K chars, an attempt to highlight a field with more than 10K chars will produce:

from 6.2 - a warning
from 7.0 - an error

We have seen people getting a lot of deprecation warnings from highlighting when trying to hit a default Kibana page in 6.2.
To avoid this and prevent future errors in 7.0, we propose that:

Kibana to use highlighting only on small fields (less than 10K chars) or fields indexed with offsets,
and also
(not sure if relevant, this is for speedups) for Kibana to use unified highlighter (unless Kibana already uses it)

The text was updated successfully, but these errors were encountered:

mayya-sharipova · 2018-02-15T18:25:40Z

pinging @Bargs who initially raised this issue
pinging @kovyrin that experienced this issue

kovyrin · 2018-02-15T18:46:19Z

This definitely needs to be a coordinated effort to ensure that Kibana does not simply break because of the change in ES. At the moment it breaks only in proxy scenarios (when there is a proxy between ES and Kibana or Kibana and the user) like https://github.com/elastic/cloud/issues/10532 (still very painful when you hit it) , but going forward it will break every deployment with large fields indexed by logstash or filebeat.

deanpeters · 2018-02-15T22:51:07Z

Would it make sense to somehow equip individuals in a deprecation path with an enumeration ... heck perhaps a visualization ... as to all the searches that need some highlight text truncation?

Would it make sense to offer a link to such a report within deprecation warnings?

(And if we're already doing it apologies in advance)

Bargs · 2018-02-15T23:20:57Z

Thanks for the heads up! I'll admit my highlighting knowledge is pretty rudimentary, so excuse me if these questions are pretty basic:

we propose that Kibana uses highlighting only on small fields

Is there any API we can use to determine the average or median size of a given field?

fields indexed with unified or fvh highlighter that should not produce these warnings.

What work does this require from the user ahead of time? Most Kibana users are used to being able to point Kibana at any index and get highlighting for their searches. Making any sort of assumption about the state of their index settings and mappings is difficult unless it's an extremely common default, like text/keyword multi-fields for example. It's also going to be tough to tell users that highlighting is going to suddenly stop working on their old indices unless they reindex.

mayya-sharipova · 2018-02-15T23:45:54Z

@Bargs very good questions. Adding @jimczi here to provide some suggestions for @Bargs's questions

jimczi · 2018-02-16T15:55:46Z

I think we need an explicit parameter to truncate highlighting. If I understand correctly, Kibana sends a query to highlight all fields (*) without any limit and then I guess that the returned text is truncated in the visualization. For performance reason but also because it makes no sense to show a text with more than 10k chars in a visualization, we should have a way to limit the highlighting to the first N chars in the text. So instead of failing the request, this parameter would allow to limit the size of the highlighted text. We don't need to remove the soft limit, this parameter would work on top of it by truncating the input text if it's bigger than the provided number.
@mayya-sharipova WDYT ?

@Bargs what is the current behavior for large texts ? Are they truncated in the visualization ? What is the limit (if there is one) ?

mayya-sharipova · 2018-02-16T16:12:33Z

@jimczi I think this is a very good suggestion. Do you mean we should add another parameter to the highlight request, smth like analyzed_text_limit? Or may be a boolean parameter trancate_analyzed_text that will truncate the text to the value of index.highlight.max_analyzed_offset?

Bargs · 2018-02-16T16:51:45Z

We will actually show you the entire field in the Discover doc table. 10,000 characters may seem like a lot, but I could see this being useful if you're searching for a needle in a haystack. If highlighting suddenly stops working after n number of characters (a limit the user doing the querying may not have set up and may be unaware of), I expect they'll perceive highlighting as being broken or unreliable.

Should we be using the unified highlighter anyway? Does the unified highlighter require the user to set things up at index creation time or can we use it dynamically?

mayya-sharipova · 2018-02-16T17:08:07Z

@Bargs thanks for the update. Looks like the limit would not help in your current Kibana workflow.

About unified highlighter, yes, we should use it by default. unified will use term_vectors or offsets if an index was indexed with them, or if not, unified will analyze a text in memory real time, and for large fields, it may take substantial memory/time, which we were trying to prevent.

Bargs · 2018-02-21T16:05:28Z

I marked this as a blocker for 7.0 since we'll need to figure out what to do before then.

mayya-sharipova · 2018-02-21T19:34:43Z

@jimczi I wonder if we can change the behaviour of unified/plain highlighter NOT to produce any errors/warnings, and instead just analyze and highlight ONLY the number of chars set in index.highlight.max_analyzed_offset. We will also update the documentation with a warning that only 10K chars by default are analyzed for highlighting, and a user will not see highlights pass 10K chars.

jimczi · 2018-02-21T19:50:40Z

I think throwing an error is fine but we should maybe revise the default limit. 10K is maybe too restrictive especially for Kibana, would 1MB be enough to consider that it's too big to present in a viz ?
Just ignoring data after 10k would be weird especially if the limit is extracted from a cluster setting with a default value. I think we should just catch outliers (with 1MB we should be good) and make the exception more verbose (adding the field name, index name and doc_id in the exception).

mayya-sharipova · 2018-02-21T20:01:24Z

@jimczi thanks Jim, sounds resonable

Bargs · 2018-02-21T21:50:12Z

1MB sounds reasonable, users who really need highlighting on larger fields can handle the additional set up. In Kibana it would be nice if we could catch the error and provide a user friendly message about the failure and how it can be fixed. So my last question is, if someone runs into this limit, what do we need to ask them to change in their mappings and is there anything we need to update in Kibana's search request body?

mayya-sharipova · 2018-02-21T23:24:07Z

@Bargs Answering your questions:

is there anything we need to update in Kibana's search request body

We would recommend to always use unified highlighter (unless you are already using it)

if someone runs into this limit, what do we need to ask them to change in their mappings

The thrown error will say it all. The current error message is smth like this:
"The length of the text to be analyzed for highlighting has exceeded the allowed maximum of [10000]. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!"

So, basically, they would either need to increase index.highlight.max_analyzed_offset (not recommended), or reindex this field with "index_options": "offsets" or "term_vector": "with_positions_offsets" (recommended).

trevan · 2018-02-21T23:55:36Z

What happens if you have a field over 1MB but you don't want to increase the max_analyzed_offset or add reindex the field but you are ok with not having the search on it highlighted?

Any way to have Kibana ignore that field during searching so that an exception isn't thrown?

mayya-sharipova · 2018-02-22T00:24:46Z

@trevan It is a good question. In the ES search/highlight request there is no way to exclude certain fields, only to include specific fields: "fields" : {..}, I wonder if Kibana can use that.

trevan · 2018-02-22T00:28:25Z

@mayya-sharipova, using an include list might be a problem for those of us with 1000s of fields.

Bargs · 2018-02-22T17:45:05Z

Yeah it would be nice to have the option to exclude fields in ES's highlight API. That said, we might be able to fake it in Kibana by including all the fields stored in the index pattern except for the fields the user wants to exclude.

@trevan would you only want to exclude those large fields from highlighting, or would you want to exclude them from Discover completely? If the latter, we could tie this into the existing "Source filters". But perhaps it's a bad idea to couple those two concepts. It might also make more sense to persist this setting at the search level, it's hard to say.

In any event, I'm thinking highlight exclusions is more of a separate enhancement request. If a user has to increase the limit and they don't want to reindex, it doesn't put them in a different spot than where they are today where there's no limit at all. At least by having a limit, users will be more aware of the impact of highlighting and can choose to turn it off completely for the time being if they need to.

@mayya-sharipova we're not specifying a highlighter in our request, so it sounds like it should use the unified highlighter by default? If that's the case, I don't think there's anything we need to update on the Kibana side of things as long as the limit is a reasonable default and the error message returned from ES gives the user actionable advice.

trevan · 2018-02-22T17:57:23Z

@Bargs, no I don't want to exclude them from Discover. My specific situation is that we we have a few fields that occasionally get a value that size. The field value is normally less than 1k. It will be really frustrating to have Discover sometimes work and then for it to sometimes not work just because the really large field is sometimes present.

Bargs · 2018-02-22T18:06:12Z

@trevan gotcha. Do you agree that, once there's a limit, manually increasing the limit should effectively create the same situation we have today, where there is no limit? If so, could you create a separate ticket to track the enhancement request for excluding individual fields from highlighting? I think it's a great idea and I want to make sure we don't lose it, but I think it'll require some more discussion than we should get into in this comment thread.

trevan · 2018-02-22T19:08:15Z

@Bargs, good point. I agree with that. I created #16877.

Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K. Enhance an error message when offset increased to include field name, index name and doc_id. Relates to elastic/kibana#16764

Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K. Enhance an error message when offset increased to include field name, index name and doc_id. Relates to elastic#27934, elastic/kibana#16764

Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K. Enhance an error message when offset increased to include field name, index name and doc_id. Relates to #27934, elastic/kibana#16764

Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K. Enhance an error message when offset increased to include field name, index name and doc_id. Relates to elastic/kibana#16764

Bargs · 2018-06-18T20:34:37Z

@mayya-sharipova @jimczi I was just re-reading this thread, trying to remember if there's anything we still need to update in Kibana, and it seems like there isn't. We're already using the unified highlighter since we just use the default. As I understand it, everything else is up to the user. Am I missing anything or can we close this ticket out?

mayya-sharipova · 2018-06-18T20:45:56Z

@bars I think we can close this ticket.
Using the unified highlighter in Kibana is good, but a user should also ensure to index a field with offsets if a field is really big (more than 1 million characters by default). Otherwise a user will get a warning or error from 7.x, but I guess this could be left to the user.

With the upgrade to ES 7 there are changes to avoid default limits on higliht with ES structural apps matching strings longer than 10k characters more here: elastic/kibana#16764 (comment)

With the upgrade to ES 7 there are changes to avoid default limits on higliht with ES structural apps matching strings longer than 10k characters more here: elastic/kibana#16764 (comment) Signed-off-by: Luis Guzman <[email protected]>

mayya-sharipova changed the title ~~Kibana use Highlighting only on small fields or fields indexed with Unified or FVH Highlighters~~ Kibana use Highlighting only on small fields or fields indexed with offsets Feb 15, 2018

mayya-sharipova changed the title ~~Kibana use Highlighting only on small fields or fields indexed with offsets~~ Kibana use highlighting only on small fields or fields indexed with offsets Feb 15, 2018

Bargs added blocker :Discovery v7.0.0 labels Feb 21, 2018

trevan mentioned this issue Feb 22, 2018

Allow excluding of fields from highlighting in Discover #16877

Closed

mayya-sharipova mentioned this issue Feb 24, 2018

Limit analyzed text for highlighting (improvements) elastic/elasticsearch#28808

Merged

mayya-sharipova mentioned this issue Mar 5, 2018

Limit analyzed text for highlighting (improvements) elastic/elasticsearch#28907

Merged

mayya-sharipova closed this as completed Jun 18, 2018

Ark74 mentioned this issue Jan 7, 2020

Move to positions_offsets iot fix 10k highlight limit nextcloud/fulltextsearch_elasticsearch#91

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana use highlighting only on small fields or fields indexed with offsets #16764

Kibana use highlighting only on small fields or fields indexed with offsets #16764

mayya-sharipova commented Feb 15, 2018 •

edited

Loading

mayya-sharipova commented Feb 15, 2018

kovyrin commented Feb 15, 2018

deanpeters commented Feb 15, 2018

Bargs commented Feb 15, 2018

mayya-sharipova commented Feb 15, 2018

jimczi commented Feb 16, 2018

mayya-sharipova commented Feb 16, 2018 •

edited

Loading

Bargs commented Feb 16, 2018

mayya-sharipova commented Feb 16, 2018 •

edited

Loading

Bargs commented Feb 21, 2018

mayya-sharipova commented Feb 21, 2018 •

edited

Loading

jimczi commented Feb 21, 2018

mayya-sharipova commented Feb 21, 2018

Bargs commented Feb 21, 2018

mayya-sharipova commented Feb 21, 2018

trevan commented Feb 21, 2018

mayya-sharipova commented Feb 22, 2018

trevan commented Feb 22, 2018

Bargs commented Feb 22, 2018

trevan commented Feb 22, 2018

Bargs commented Feb 22, 2018

trevan commented Feb 22, 2018

Bargs commented Jun 18, 2018

mayya-sharipova commented Jun 18, 2018

Kibana use highlighting only on small fields or fields indexed with offsets #16764

Kibana use highlighting only on small fields or fields indexed with offsets #16764

Comments

mayya-sharipova commented Feb 15, 2018 • edited Loading

mayya-sharipova commented Feb 15, 2018

kovyrin commented Feb 15, 2018

deanpeters commented Feb 15, 2018

Bargs commented Feb 15, 2018

mayya-sharipova commented Feb 15, 2018

jimczi commented Feb 16, 2018

mayya-sharipova commented Feb 16, 2018 • edited Loading

Bargs commented Feb 16, 2018

mayya-sharipova commented Feb 16, 2018 • edited Loading

Bargs commented Feb 21, 2018

mayya-sharipova commented Feb 21, 2018 • edited Loading

jimczi commented Feb 21, 2018

mayya-sharipova commented Feb 21, 2018

Bargs commented Feb 21, 2018

mayya-sharipova commented Feb 21, 2018

trevan commented Feb 21, 2018

mayya-sharipova commented Feb 22, 2018

trevan commented Feb 22, 2018

Bargs commented Feb 22, 2018

trevan commented Feb 22, 2018

Bargs commented Feb 22, 2018

trevan commented Feb 22, 2018

Bargs commented Jun 18, 2018

mayya-sharipova commented Jun 18, 2018

mayya-sharipova commented Feb 15, 2018 •

edited

Loading

mayya-sharipova commented Feb 16, 2018 •

edited

Loading

mayya-sharipova commented Feb 16, 2018 •

edited

Loading

mayya-sharipova commented Feb 21, 2018 •

edited

Loading