Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana use highlighting only on small fields or fields indexed with offsets #16764

Closed
mayya-sharipova opened this issue Feb 15, 2018 · 24 comments

Comments

@mayya-sharipova
Copy link

mayya-sharipova commented Feb 15, 2018

Kibana version: 6.2

Elasticsearch version: 6.2

Describe the feature:
Since 6.2, we have modified highlighting to limit the analyzed text for highlighting to 10000 chars. See elastic/elasticsearch#27934.
Unless a field is indexed with offsets ("index_options": "offsets" or "term_vector": "with_positions_offsets"), or unless index setting index.highlight.max_analyzed_offset is set to higher than 10K chars, an attempt to highlight a field with more than 10K chars will produce:

  1. from 6.2 - a warning
  2. from 7.0 - an error

We have seen people getting a lot of deprecation warnings from highlighting when trying to hit a default Kibana page in 6.2.
To avoid this and prevent future errors in 7.0, we propose that:

  1. Kibana to use highlighting only on small fields (less than 10K chars) or fields indexed with offsets,
    and also
  2. (not sure if relevant, this is for speedups) for Kibana to use unified highlighter (unless Kibana already uses it)
@mayya-sharipova
Copy link
Author

pinging @Bargs who initially raised this issue
pinging @kovyrin that experienced this issue

@kovyrin
Copy link
Contributor

kovyrin commented Feb 15, 2018

This definitely needs to be a coordinated effort to ensure that Kibana does not simply break because of the change in ES. At the moment it breaks only in proxy scenarios (when there is a proxy between ES and Kibana or Kibana and the user) like https://github.com/elastic/cloud/issues/10532 (still very painful when you hit it) , but going forward it will break every deployment with large fields indexed by logstash or filebeat.

@deanpeters
Copy link

Would it make sense to somehow equip individuals in a deprecation path with an enumeration ... heck perhaps a visualization ... as to all the searches that need some highlight text truncation?

Would it make sense to offer a link to such a report within deprecation warnings?

(And if we're already doing it apologies in advance)

@mayya-sharipova mayya-sharipova changed the title Kibana use Highlighting only on small fields or fields indexed with Unified or FVH Highlighters Kibana use Highlighting only on small fields or fields indexed with offsets Feb 15, 2018
@mayya-sharipova mayya-sharipova changed the title Kibana use Highlighting only on small fields or fields indexed with offsets Kibana use highlighting only on small fields or fields indexed with offsets Feb 15, 2018
@Bargs
Copy link
Contributor

Bargs commented Feb 15, 2018

Thanks for the heads up! I'll admit my highlighting knowledge is pretty rudimentary, so excuse me if these questions are pretty basic:

we propose that Kibana uses highlighting only on small fields

Is there any API we can use to determine the average or median size of a given field?

fields indexed with unified or fvh highlighter that should not produce these warnings.

What work does this require from the user ahead of time? Most Kibana users are used to being able to point Kibana at any index and get highlighting for their searches. Making any sort of assumption about the state of their index settings and mappings is difficult unless it's an extremely common default, like text/keyword multi-fields for example. It's also going to be tough to tell users that highlighting is going to suddenly stop working on their old indices unless they reindex.

@mayya-sharipova
Copy link
Author

@Bargs very good questions. Adding @jimczi here to provide some suggestions for @Bargs's questions

@jimczi
Copy link

jimczi commented Feb 16, 2018

I think we need an explicit parameter to truncate highlighting. If I understand correctly, Kibana sends a query to highlight all fields (*) without any limit and then I guess that the returned text is truncated in the visualization. For performance reason but also because it makes no sense to show a text with more than 10k chars in a visualization, we should have a way to limit the highlighting to the first N chars in the text. So instead of failing the request, this parameter would allow to limit the size of the highlighted text. We don't need to remove the soft limit, this parameter would work on top of it by truncating the input text if it's bigger than the provided number.
@mayya-sharipova WDYT ?

@Bargs what is the current behavior for large texts ? Are they truncated in the visualization ? What is the limit (if there is one) ?

@mayya-sharipova
Copy link
Author

mayya-sharipova commented Feb 16, 2018

@jimczi I think this is a very good suggestion. Do you mean we should add another parameter to the highlight request, smth like analyzed_text_limit? Or may be a boolean parameter trancate_analyzed_text that will truncate the text to the value of index.highlight.max_analyzed_offset?

@Bargs
Copy link
Contributor

Bargs commented Feb 16, 2018

We will actually show you the entire field in the Discover doc table. 10,000 characters may seem like a lot, but I could see this being useful if you're searching for a needle in a haystack. If highlighting suddenly stops working after n number of characters (a limit the user doing the querying may not have set up and may be unaware of), I expect they'll perceive highlighting as being broken or unreliable.

Should we be using the unified highlighter anyway? Does the unified highlighter require the user to set things up at index creation time or can we use it dynamically?

screen shot 2018-02-16 at 11 43 00 am

@mayya-sharipova
Copy link
Author

mayya-sharipova commented Feb 16, 2018

@Bargs thanks for the update. Looks like the limit would not help in your current Kibana workflow.

About unified highlighter, yes, we should use it by default. unified will use term_vectors or offsets if an index was indexed with them, or if not, unified will analyze a text in memory real time, and for large fields, it may take substantial memory/time, which we were trying to prevent.

@Bargs
Copy link
Contributor

Bargs commented Feb 21, 2018

I marked this as a blocker for 7.0 since we'll need to figure out what to do before then.

@mayya-sharipova
Copy link
Author

mayya-sharipova commented Feb 21, 2018

@jimczi I wonder if we can change the behaviour of unified/plain highlighter NOT to produce any errors/warnings, and instead just analyze and highlight ONLY the number of chars set in index.highlight.max_analyzed_offset. We will also update the documentation with a warning that only 10K chars by default are analyzed for highlighting, and a user will not see highlights pass 10K chars.

@jimczi
Copy link

jimczi commented Feb 21, 2018

I think throwing an error is fine but we should maybe revise the default limit. 10K is maybe too restrictive especially for Kibana, would 1MB be enough to consider that it's too big to present in a viz ?
Just ignoring data after 10k would be weird especially if the limit is extracted from a cluster setting with a default value. I think we should just catch outliers (with 1MB we should be good) and make the exception more verbose (adding the field name, index name and doc_id in the exception).

@mayya-sharipova
Copy link
Author

@jimczi thanks Jim, sounds resonable

@Bargs
Copy link
Contributor

Bargs commented Feb 21, 2018

1MB sounds reasonable, users who really need highlighting on larger fields can handle the additional set up. In Kibana it would be nice if we could catch the error and provide a user friendly message about the failure and how it can be fixed. So my last question is, if someone runs into this limit, what do we need to ask them to change in their mappings and is there anything we need to update in Kibana's search request body?

@mayya-sharipova
Copy link
Author

@Bargs Answering your questions:

is there anything we need to update in Kibana's search request body

We would recommend to always use unified highlighter (unless you are already using it)

if someone runs into this limit, what do we need to ask them to change in their mappings

The thrown error will say it all. The current error message is smth like this:
"The length of the text to be analyzed for highlighting has exceeded the allowed maximum of [10000]. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!"

So, basically, they would either need to increase index.highlight.max_analyzed_offset (not recommended), or reindex this field with "index_options": "offsets" or "term_vector": "with_positions_offsets" (recommended).

@trevan
Copy link
Contributor

trevan commented Feb 21, 2018

What happens if you have a field over 1MB but you don't want to increase the max_analyzed_offset or add reindex the field but you are ok with not having the search on it highlighted?

Any way to have Kibana ignore that field during searching so that an exception isn't thrown?

@mayya-sharipova
Copy link
Author

@trevan It is a good question. In the ES search/highlight request there is no way to exclude certain fields, only to include specific fields: "fields" : {..}, I wonder if Kibana can use that.

@trevan
Copy link
Contributor

trevan commented Feb 22, 2018

@mayya-sharipova, using an include list might be a problem for those of us with 1000s of fields.

@Bargs
Copy link
Contributor

Bargs commented Feb 22, 2018

Yeah it would be nice to have the option to exclude fields in ES's highlight API. That said, we might be able to fake it in Kibana by including all the fields stored in the index pattern except for the fields the user wants to exclude.

@trevan would you only want to exclude those large fields from highlighting, or would you want to exclude them from Discover completely? If the latter, we could tie this into the existing "Source filters". But perhaps it's a bad idea to couple those two concepts. It might also make more sense to persist this setting at the search level, it's hard to say.

In any event, I'm thinking highlight exclusions is more of a separate enhancement request. If a user has to increase the limit and they don't want to reindex, it doesn't put them in a different spot than where they are today where there's no limit at all. At least by having a limit, users will be more aware of the impact of highlighting and can choose to turn it off completely for the time being if they need to.

@mayya-sharipova we're not specifying a highlighter in our request, so it sounds like it should use the unified highlighter by default? If that's the case, I don't think there's anything we need to update on the Kibana side of things as long as the limit is a reasonable default and the error message returned from ES gives the user actionable advice.

@trevan
Copy link
Contributor

trevan commented Feb 22, 2018

@Bargs, no I don't want to exclude them from Discover. My specific situation is that we we have a few fields that occasionally get a value that size. The field value is normally less than 1k. It will be really frustrating to have Discover sometimes work and then for it to sometimes not work just because the really large field is sometimes present.

@Bargs
Copy link
Contributor

Bargs commented Feb 22, 2018

@trevan gotcha. Do you agree that, once there's a limit, manually increasing the limit should effectively create the same situation we have today, where there is no limit? If so, could you create a separate ticket to track the enhancement request for excluding individual fields from highlighting? I think it's a great idea and I want to make sure we don't lose it, but I think it'll require some more discussion than we should get into in this comment thread.

@trevan
Copy link
Contributor

trevan commented Feb 22, 2018

@Bargs, good point. I agree with that. I created #16877.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Feb 24, 2018
Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K.

Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to elastic/kibana#16764
mayya-sharipova added a commit to elastic/elasticsearch that referenced this issue Mar 2, 2018
Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K.

Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to elastic/kibana#16764
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Mar 6, 2018
Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to elastic#27934, elastic/kibana#16764
mayya-sharipova added a commit to elastic/elasticsearch that referenced this issue Mar 6, 2018
Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to #27934, elastic/kibana#16764
mayya-sharipova added a commit to elastic/elasticsearch that referenced this issue Mar 6, 2018
Increase the default limit of index.highlight.max_analyzed_offset to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to #27934, elastic/kibana#16764
sebasjm pushed a commit to sebasjm/elasticsearch that referenced this issue Mar 10, 2018
Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K.

Enhance an error message when offset increased to include field name, index name and doc_id.

Relates to elastic/kibana#16764
@Bargs
Copy link
Contributor

Bargs commented Jun 18, 2018

@mayya-sharipova @jimczi I was just re-reading this thread, trying to remember if there's anything we still need to update in Kibana, and it seems like there isn't. We're already using the unified highlighter since we just use the default. As I understand it, everything else is up to the user. Am I missing anything or can we close this ticket out?

@mayya-sharipova
Copy link
Author

@bars I think we can close this ticket.
Using the unified highlighter in Kibana is good, but a user should also ensure to index a field with offsets if a field is really big (more than 1 million characters by default). Otherwise a user will get a warning or error from 7.x, but I guess this could be left to the user.

Ark74 added a commit to Ark74/fulltextsearch_elasticsearch that referenced this issue Jan 6, 2020
With the upgrade to ES 7 there are changes to avoid default limits on higliht with ES structural apps matching strings longer than 10k characters more here: elastic/kibana#16764 (comment)
Ark74 added a commit to Ark74/fulltextsearch_elasticsearch that referenced this issue Jan 7, 2020
With the upgrade to ES 7 there are changes to avoid default limits on higliht with ES structural apps matching strings longer than 10k characters more here: elastic/kibana#16764 (comment)

Signed-off-by: Luis Guzman <[email protected]>
backportbot-nextcloud bot pushed a commit to nextcloud/fulltextsearch_elasticsearch that referenced this issue Jan 14, 2020
With the upgrade to ES 7 there are changes to avoid default limits on higliht with ES structural apps matching strings longer than 10k characters more here: elastic/kibana#16764 (comment)

Signed-off-by: Luis Guzman <[email protected]>
backportbot-nextcloud bot pushed a commit to nextcloud/fulltextsearch_elasticsearch that referenced this issue Jan 14, 2020
With the upgrade to ES 7 there are changes to avoid default limits on higliht with ES structural apps matching strings longer than 10k characters more here: elastic/kibana#16764 (comment)

Signed-off-by: Luis Guzman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants