Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statsite telemetry leaves socket lingering to CLOSE_WAIT after remote server restarted #3697

Open
vhartikainen opened this issue Nov 16, 2017 · 1 comment
Labels
theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected
Milestone

Comments

@vhartikainen
Copy link

vhartikainen commented Nov 16, 2017

consul version for both Client and Server

Client: N/A
Server: Consul v0.9.3

Operating system and Environment details

Running as Docker container based on Debian 8.9 image
Consul container is running in AWS on top of ECS (can't reproduce in local environment)
Consul connects to telemetry server via an ELB

...
  "telemetry": {
     "statsite_address": "xyz:8125"
  },
...

Description of the Issue (and unexpected/desired result)

When the telemetry server is restarted / killed / somehow temporarily unavailable Consul will not reconnect until the operating system takes care of the TCP socket left in CLOSE_WAIT state. By default this is in Linux I guess something over 2 hours:

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200

Couldn't get any logs from this specific module (https://github.com/hashicorp/consul/blob/v0.9.3/vendor/github.com/armon/go-metrics/statsite.go), even though TRACE log levels enabled.

Reproduction steps

  1. Only reproducible in AWS, not locally with direct connection from a Docker container to another
  2. Telemetry server is restarted / killed / somehow temporarily unavailable
  3. Consul doesn't automatically reconnect, but leaves the connection lingering to a CLOSE_WAIT state:
    tcp        1      0 10.x.y.z:44764          10.x.y.w:8125          CLOSE_WAIT  19920/consul
  1. Restarting Consul will of course reconnect to telemetry server in :8125

Log Fragments or Link to gist

Couldn't get any logs related to the statsite, even with TRACE level.

@slackpad
Copy link
Contributor

This issue is with the upstream go-metrics library (it's a similar issue to hashicorp/go-metrics#39).

@slackpad slackpad added the type/bug Feature does not function as expected label Nov 16, 2017
@slackpad slackpad added this to the Unplanned milestone Nov 16, 2017
@jsosulska jsosulska added the theme/telemetry Anything related to telemetry or observability label Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

3 participants