Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana keeps open sockets for timed out requests #30058

Closed
azasypkin opened this issue Feb 5, 2019 · 5 comments
Closed

Kibana keeps open sockets for timed out requests #30058

azasypkin opened this issue Feb 5, 2019 · 5 comments
Labels
performance stale Used to mark issues that were closed for being stale Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@azasypkin
Copy link
Member

azasypkin commented Feb 5, 2019

Related to https://github.com/elastic/support-dev-help/issues/5765

It looks like when request times out (based on ES JS client timeout setting) the socket stays open for a while even though we abort request. E.g. when running Kibana on master, I see ~5 open sockets in "idle" state with short spikes for stats-collection or Canvas pads, but if I slow down the ES connection speed (e.g. via nginx rate_limit to ~10b/sec), open socket count grows significantly to ~50-60-70-80-90+ and stays on this level until connection speed reaches the acceptable level. It can probably go even higher with our default values for maxSockets (Infinity) and maxFreeSockets (256).

@spalger do you know if it's how it's supposed to work or sockets should be immediately destroyed for aborted requests?

Looks like stats-collection from monitoring is the one who initiates that many requests, I'm going to check whether it does everything properly.

@azasypkin azasypkin added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc performance labels Feb 5, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform

@azasypkin
Copy link
Member Author

Re stats-collection: not sure what values we use for elasticsearch.requestTimeout and xpack.monitoring.kibana.collection.interval in the cloud, but by default these values are 30s and 10s respectively, and monitoring bulk uploader (the one that issues stats requests) is implemented in a way that it doesn't wait for the previous requests to complete before sending them again (~13 simultaneous requests on master).

So it may happen that in 30s we can send ~25-40 requests just for stats. And taking into account that these sockets may hang around for some time we can end up with a significant amount of open sockets just because of this one thing. It's not a root cause of the problem we're experiencing, but it'd be nice to reduce impact anyway.

Hey @elastic/stack-monitoring and @tsullivan,

  • Is there any reason we don't want to send new stats requests only when previous requests are completed (aka use setTimeout in promise callbacks instead of setInterval)?
  • I see there is handleConnectionLost method in bulk uploader that forces it to stop sending requests, but looks like it's not used anywhere. Is that by mistake or ..?

@spalger
Copy link
Contributor

spalger commented Feb 7, 2019

I expect an aborted request to release the socket, but we're using keepalive sockets so rather than closing the socket it is probably just keeping it in the free socket pool. The default keepAliveMaxFreeSockets config for the client is 256, which I don't think we provide a way to configure, and a quick search reveals 9 spots where we're creating client instances. I think that means we could have up to 2304 open sockets at any time if every client was in use, sending regular requests, and those requests were taking long enough to overlap each other. The timeout for these free sockets is 1 minute by default, so if my theory is correct then after a minute you should see the number of sockets open drop down to just the amount needed right now.

@azasypkin
Copy link
Member Author

Thanks @spalger. Yeah, I believe your theory is correct, I was just not sure whether it's something we want to keep as not configurable.

I think the problem is that request timeout in Cloud is 5 minutes (per @alexbrasetvik) and default stats collection interval is 10s, so during 5 minutes one idling Kibana instance (just monitoring) can send 5m/10s * 13 requests = ~390 requests. These sockets aren't subject to 1m interval since they are busy-waiting during these 5 minutes. After 6 minutes sockets will be disposed (I guess 390 - 256 = 134 sockets will be disposed as soon as requests are aborted, the rest - after 1 minute), but Kibana will keep bombarding ES in the meantime.

In any case it feels like we should fix stats-collection and any other code we may have that doesn't adapt to ES availability.

@tsullivan
Copy link
Member

Hi, my takes on the questions brought up:

  • Is there any reason we don't want to send new stats requests only when previous requests are completed (aka use setTimeout in promise callbacks instead of setInterval)?

Sounds like a perfectly fine suggestion to me! I don't recall any specific reason why we went with setInterval

  • I see there is handleConnectionLost method in bulk uploader that forces it to stop sending requests, but looks like it's not used anywhere. Is that by mistake or ..?

Looks like that is by mistake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance stale Used to mark issues that were closed for being stale Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

5 participants