-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statsd telemetry doesn't recover from statsd outage #39
Comments
The basic issue is that UDP provides no feedback. So if statsd dies or the IP is relocated, we can continue to fire-and-forget packets without ever getting an error. Unlike statsite which is over TCP and we get an error and redial. The only potential work around to this would be to periodically just assume the connection is dead and redo the DNS lookup. I'm not sure there is any other robust mechanism given the lack of feedback. |
According to this http://serverfault.com/a/416269, if the server side of a UDP socket is disconnected, then there should be an error upon writing (Destination Unreachable), triggered by an ICMP packet. I'll have to do some testing next week to see if I'm seeing the same behavior between Vault and statsd, or if I'm somehow dropping the ICMP package on the network. |
@johnrengelman That's true! But ICMP is not necessarily reliable. It can be disabled, blocked by firewalls, and is fire-and-forget like UDP as well, so it can be simply dropped. There is a best-effort, but the UDP protocol makes no guarantee! |
From: hashicorp/vault#1932
It appears that go-metrics doesn't handle a disconnect of the statsd server, particularly if the address changes.
We are running a telegraf agent with a statsd listener and configuring vault to send data to a linked container with a hostname. When the linked container is restarted (generally getting a new IP address), we stop receiving statsd metrics from vault.
The text was updated successfully, but these errors were encountered: