Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statsd telemetry doesn't recover from statsd outage #39

Open
johnrengelman opened this issue Sep 27, 2016 · 3 comments
Open

Statsd telemetry doesn't recover from statsd outage #39

johnrengelman opened this issue Sep 27, 2016 · 3 comments

Comments

@johnrengelman
Copy link

From: hashicorp/vault#1932

It appears that go-metrics doesn't handle a disconnect of the statsd server, particularly if the address changes.

We are running a telegraf agent with a statsd listener and configuring vault to send data to a linked container with a hostname. When the linked container is restarted (generally getting a new IP address), we stop receiving statsd metrics from vault.

@armon
Copy link
Member

armon commented Nov 5, 2016

The basic issue is that UDP provides no feedback. So if statsd dies or the IP is relocated, we can continue to fire-and-forget packets without ever getting an error. Unlike statsite which is over TCP and we get an error and redial. The only potential work around to this would be to periodically just assume the connection is dead and redo the DNS lookup. I'm not sure there is any other robust mechanism given the lack of feedback.

@johnrengelman
Copy link
Author

According to this http://serverfault.com/a/416269, if the server side of a UDP socket is disconnected, then there should be an error upon writing (Destination Unreachable), triggered by an ICMP packet.
Testing locally with nc, this is the case; establish a connection, terminate the server, and try writing on the client...packet sniffing shows the ICMP packet, and the nc client exits.

I'll have to do some testing next week to see if I'm seeing the same behavior between Vault and statsd, or if I'm somehow dropping the ICMP package on the network.

@armon
Copy link
Member

armon commented Nov 7, 2016

@johnrengelman That's true! But ICMP is not necessarily reliable. It can be disabled, blocked by firewalls, and is fire-and-forget like UDP as well, so it can be simply dropped. There is a best-effort, but the UDP protocol makes no guarantee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants