Default terminate_after for autocomplete requests to unlimited #38073

flash1293 · 2019-06-05T07:57:07Z

Currently, ES queries to fetch terms to provide autocomplete suggestions in filter controls and the query bar have a hard coded terminate_after limit to keep server load low.

With #17388 a setting for this value is introduced to be able to get better results in some circumstances.

IMHO it makes sense to default to no terminate_after limit in these requests but provide a setting to set it. This setting basically comes down to a tradeoff between accuracy and resource usage. With a default setting of 100000 for terminate_after the requests are very fast, but often not accurate which is very annoying for the user because they are not able to select terms they know are present in their dataset.

I'm always in favor of preventing a "Killbana" situation, but I think a more sensible default would be to be as accurate as possible and provide the option to re-gain speed in clusters where this becomes a problem. Additionally to terminate_after there is also a timeout of 1 second, so the request wouldn't run 30s like requests from visualizations even without terminate_after. So this change doesn't seem like a cluster killer and if it becomes one in certain situations, the warning label in #37643 pops up which hints at the problem.

Because this is a breaking behavior change, we should introduce this starting with 8.0 and label it as breaking change.

There are some assumptions in this and I could be totally wrong about them - What do you think @timroes @nreese @Bargs ?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-06-05T07:57:09Z

Pinging @elastic/kibana-app

Bargs · 2019-06-05T22:37:20Z

I'm wary of changing the default. We generally don't want to ship features that don't scale out of the box. If we make the setting configurable I think the question we have to ask ourselves is what's worse:

A. Killing a cluster unexpectedly and then the admin has to figure out why, find this setting, change it, and get everything running again.

or

B. A user gets inaccurate results, complains to admin and gives them the details of the warning message, admin increases the setting carefully and makes their user happy.

the requests are very fast, but often not accurate which is very annoying for the user because they are not able to select terms they know are present in their dataset.

Do we have a lot of users actually complaining about this? I haven't heard of many, but maybe I'm out of the loop.

This setting basically comes down to a tradeoff between accuracy and resource usage.

Not just resource usage but also usability. Autocomplete needs to be fast in order to be useful. If every update takes a full second I think it will be annoying.

there is also a timeout of 1 second

I might be wrong, or my knowledge might be out of date, but I think the timeout may not actually kill the task on the ES side.

So this change doesn't seem like a cluster killer and if it becomes one in certain situations, the warning label in #37643 pops up which hints at the problem.

If terminate_after is not set by default, what would trigger the warning label and what would it say? I'm having a hard time imagining a warning that makes sense to the average user since this would be more of an admin concern.

flash1293 · 2019-06-06T10:26:23Z

Thanks for your input @Bargs

We generally don't want to ship features that don't scale out of the box

Does the current default setting scale? Because clusters with a lot of data won't provide accurate terms lists.

I think the question we have to ask ourselves is what's worse:
A. Killing a cluster unexpectedly and then the admin has to figure out why, find this setting, change it, and get everything running again.
or
B. A user gets inaccurate results, complains to admin and gives them the details of the warning message, admin increases the setting carefully and makes their user happy.

You are right, but we also have to consider the frequency of these events happening. If 5000 people have to go trough the effort to reconfigure their cluster, it might be worth to have one user having a cluster crash where support can help figuring it out. This might be my missing experience, but the chance of these requests killing the server as in contrast to big dashboards which fires lots of requests with 30s timeouts seems pretty small. I'm aware there will be multiple autocomplete requests per user, but not multiple requests per second per user.

Do we have a lot of users actually complaining about this? I haven't heard of many, but maybe I'm out of the loop.

There were at least two support cases I've heard of in the last two weeks and the issue #17388 got some reactions and repeated questions.

Not just resource usage but also usability. Autocomplete needs to be fast in order to be useful. If every update takes a full second I think it will be annoying.

Not having complete results is also annoying. I might be wrong here, but I think users tolerate a slightly longer waiting time more than not having the option they are looking for. My point is if we are recommending users to increase the limit as soon as it gets hit, why bother setting it to this value at all? It's just an extra step to configure for them then.

I might be wrong, or my knowledge might be out of date, but I think the timeout may not actually kill the task on the ES side.

You are right, 1s is no hard upper limit, the actual request might take a bit longer.

If terminate_after is not set by default, what would trigger the warning label and what would it say? I'm having a hard time imagining a warning that makes sense to the average user since this would be more of an admin concern.

If the request times out instead of terminating early because of terminate_after, the user sees the same error message: Terms list is incomplete. Adjust the autocomplete settings in kibana.yml for more results.

I totally see your point, the thing which isn't clear to me and which I might have underestimated is how resource intensive and/or slow these kind of queries get on big clusters without the terminate_after setting. Will they run into the timeout in this case? Because if they will regularly do on production systems, we should definitely keep our current default. Or are we optimizing too far here?

Bargs · 2019-06-06T22:34:49Z

there will be multiple autocomplete requests per user, but not multiple requests per second per user

We have a 100ms debounce on the suggestions update, so technically there could be 10 requests per user per second.

the thing which isn't clear to me and which I might have underestimated is how resource intensive and/or slow these kind of queries get on big clusters without the terminate_after setting. Will they run into the timeout in this case?

This I can't answer for certain. @lukasolson and @jpountz helped with these optimizations so they might have some input on that.

jpountz · 2019-06-07T06:59:00Z

These requests can certainly be very resource-intensive on large datasets, we need some protection anyway in my opinion. terminate_after was mostly used as a workaround to the fact that Elasticsearch doesn't honor the timeout parameter very well, it could take minutes before Elasticsearch figures out that the request has timed out and that it can stop processing it. In short the issue is that we currently only check the current execution time when moving on to a new segment. But in the worst-case scenario that you have a very large force-merged shard (which is typically the case for warm shards), this means that we never check the execution time, so the request will never stop prematurely because of the timeout. We recently improved this situation via elastic/elasticsearch#42291 by checking the execution time while collecting segments as well, which is expected to land in 7.3. So we could look into removing usage of terminate_after for this feature and only rely on timeout in Kibana 7.3+.

flash1293 · 2019-06-07T07:15:40Z

Awesome, thanks for your input, @jpountz
This change was planned for 8.0 anyway, but I think it makes sense to re-visit it later when nearing 8.0 and see what makes sense then, because there will probably change a lot.

0xtf · 2020-05-19T21:41:32Z

I've been going through some issues to understand what the current state of this issue, and others related to this, is. As a user who is encountering the notice to adjust autocomplete settings in Kibana in all my clusters I'd love to be able to address it.

From my understanding there is a 1s timeout for populating the fields that go into the control visualization. Is this still a hardcoded value or can it be changed (as far as I can tell the PR to allow this still hasn't made it to master)?

Sorry if this isn't the best way to talk about this.

nreese · 2020-05-19T21:50:42Z

The value can be configured via kibana.yml setting kibana.autocompleteTimeout and kibana.autocompleteTerminateAfter. The ability to configure these was included in Kibana 7.3.

0xtf · 2020-05-19T22:24:26Z

Guess I'm running into another "Not yet in Cloud" situations:

'kibana.autocompleteTimeout': is not allowed

Is there a way of seeing which options are available for cloud deployments?

nreese · 2020-05-19T23:37:39Z

cc @elastic/cloud-pms

maggieghamry · 2021-03-12T16:25:49Z

Also had similar issue - only workaround was setting kibana.autocompleteTerminateAfter: 2000000

0xtf · 2021-03-12T16:48:19Z

Was that in Cloud @maggieghamry ?

maggieghamry · 2021-03-12T17:04:49Z

@0xtf ECE 2.6 & Kibana 7.11.1

markov00 · 2024-06-11T14:53:16Z

Closing as it seems to be possible to configure kibana.autocompleteTerminateAfter now replaced by unifiedSearch.autocomplete.valueSuggestions.terminateAfter and unifiedSearch.autocomplete.valueSuggestions.timeout

flash1293 added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Jun 5, 2019

antcodd mentioned this issue Feb 26, 2020

[Metrics UI] Investigate metrics explorer group by field selection logic #57733

Closed

flash1293 mentioned this issue Jun 3, 2020

Terms list might be incomplete because the request is taking too long #67746

Closed

markov00 closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default terminate_after for autocomplete requests to unlimited #38073

Default terminate_after for autocomplete requests to unlimited #38073

flash1293 commented Jun 5, 2019

elasticmachine commented Jun 5, 2019

Bargs commented Jun 5, 2019

flash1293 commented Jun 6, 2019 •

edited

Loading

Bargs commented Jun 6, 2019

jpountz commented Jun 7, 2019

flash1293 commented Jun 7, 2019

0xtf commented May 19, 2020

nreese commented May 19, 2020

0xtf commented May 19, 2020

nreese commented May 19, 2020

maggieghamry commented Mar 12, 2021

0xtf commented Mar 12, 2021

maggieghamry commented Mar 12, 2021 •

edited

Loading

markov00 commented Jun 11, 2024

Default terminate_after for autocomplete requests to unlimited #38073

Default terminate_after for autocomplete requests to unlimited #38073

Comments

flash1293 commented Jun 5, 2019

elasticmachine commented Jun 5, 2019

Bargs commented Jun 5, 2019

flash1293 commented Jun 6, 2019 • edited Loading

Bargs commented Jun 6, 2019

jpountz commented Jun 7, 2019

flash1293 commented Jun 7, 2019

0xtf commented May 19, 2020

nreese commented May 19, 2020

0xtf commented May 19, 2020

nreese commented May 19, 2020

maggieghamry commented Mar 12, 2021

0xtf commented Mar 12, 2021

maggieghamry commented Mar 12, 2021 • edited Loading

markov00 commented Jun 11, 2024

flash1293 commented Jun 6, 2019 •

edited

Loading

maggieghamry commented Mar 12, 2021 •

edited

Loading