Kibana keeps open sockets for timed out requests #30058

azasypkin · 2019-02-05T12:21:17Z

Related to https://github.com/elastic/support-dev-help/issues/5765

It looks like when request times out (based on ES JS client timeout setting) the socket stays open for a while even though we abort request. E.g. when running Kibana on master, I see ~5 open sockets in "idle" state with short spikes for stats-collection or Canvas pads, but if I slow down the ES connection speed (e.g. via nginx rate_limit to ~10b/sec), open socket count grows significantly to ~50-60-70-80-90+ and stays on this level until connection speed reaches the acceptable level. It can probably go even higher with our default values for maxSockets (Infinity) and maxFreeSockets (256).

@spalger do you know if it's how it's supposed to work or sockets should be immediately destroyed for aborted requests?

Looks like stats-collection from monitoring is the one who initiates that many requests, I'm going to check whether it does everything properly.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-02-05T12:21:19Z

Pinging @elastic/kibana-platform

azasypkin · 2019-02-05T14:19:33Z

Re stats-collection: not sure what values we use for elasticsearch.requestTimeout and xpack.monitoring.kibana.collection.interval in the cloud, but by default these values are 30s and 10s respectively, and monitoring bulk uploader (the one that issues stats requests) is implemented in a way that it doesn't wait for the previous requests to complete before sending them again (~13 simultaneous requests on master).

So it may happen that in 30s we can send ~25-40 requests just for stats. And taking into account that these sockets may hang around for some time we can end up with a significant amount of open sockets just because of this one thing. It's not a root cause of the problem we're experiencing, but it'd be nice to reduce impact anyway.

Hey @elastic/stack-monitoring and @tsullivan,

Is there any reason we don't want to send new stats requests only when previous requests are completed (aka use setTimeout in promise callbacks instead of setInterval)?
I see there is handleConnectionLost method in bulk uploader that forces it to stop sending requests, but looks like it's not used anywhere. Is that by mistake or ..?

spalger · 2019-02-07T23:32:05Z

I expect an aborted request to release the socket, but we're using keepalive sockets so rather than closing the socket it is probably just keeping it in the free socket pool. The default keepAliveMaxFreeSockets config for the client is 256, which I don't think we provide a way to configure, and a quick search reveals 9 spots where we're creating client instances. I think that means we could have up to 2304 open sockets at any time if every client was in use, sending regular requests, and those requests were taking long enough to overlap each other. The timeout for these free sockets is 1 minute by default, so if my theory is correct then after a minute you should see the number of sockets open drop down to just the amount needed right now.

azasypkin · 2019-02-08T10:13:43Z

Thanks @spalger. Yeah, I believe your theory is correct, I was just not sure whether it's something we want to keep as not configurable.

I think the problem is that request timeout in Cloud is 5 minutes (per @alexbrasetvik) and default stats collection interval is 10s, so during 5 minutes one idling Kibana instance (just monitoring) can send 5m/10s * 13 requests = ~390 requests. These sockets aren't subject to 1m interval since they are busy-waiting during these 5 minutes. After 6 minutes sockets will be disposed (I guess 390 - 256 = 134 sockets will be disposed as soon as requests are aborted, the rest - after 1 minute), but Kibana will keep bombarding ES in the meantime.

In any case it feels like we should fix stats-collection and any other code we may have that doesn't adapt to ES availability.

tsullivan · 2019-02-11T20:13:07Z

Hi, my takes on the questions brought up:

Is there any reason we don't want to send new stats requests only when previous requests are completed (aka use setTimeout in promise callbacks instead of setInterval)?

Sounds like a perfectly fine suggestion to me! I don't recall any specific reason why we went with setInterval

I see there is handleConnectionLost method in bulk uploader that forces it to stop sending requests, but looks like it's not used anywhere. Is that by mistake or ..?

Looks like that is by mistake

azasypkin added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc performance labels Feb 5, 2019

azasypkin mentioned this issue Feb 19, 2019

[monitoring] only start bulk uploader once #31307

Merged

jbudz mentioned this issue Feb 19, 2019

[monitoring] await stats upload before restarting interval #31498

Open

joshdover added the stale Used to mark issues that were closed for being stale label Jan 14, 2021

joshdover closed this as completed Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana keeps open sockets for timed out requests #30058

Kibana keeps open sockets for timed out requests #30058

azasypkin commented Feb 5, 2019 •

edited

Loading

elasticmachine commented Feb 5, 2019

azasypkin commented Feb 5, 2019

spalger commented Feb 7, 2019

azasypkin commented Feb 8, 2019

tsullivan commented Feb 11, 2019

Kibana keeps open sockets for timed out requests #30058

Kibana keeps open sockets for timed out requests #30058

Comments

azasypkin commented Feb 5, 2019 • edited Loading

elasticmachine commented Feb 5, 2019

azasypkin commented Feb 5, 2019

spalger commented Feb 7, 2019

azasypkin commented Feb 8, 2019

tsullivan commented Feb 11, 2019

azasypkin commented Feb 5, 2019 •

edited

Loading