-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add udp_buffer_size option to udp_listener #1883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure I want this in, the reason being that I think linux will silently fail if this number is larger than the max allowed size (net.core.rmem_max).
I just don't want users to expect to be able to increase this buffer from this config setting alone......they need to increase net.core.rmem_max
and increase this setting together, and it's easier to just provide the default way of increasing it via net.core.rmem_default.
for { | ||
select { | ||
case <-u.done: | ||
return nil | ||
default: | ||
u.listener.SetReadDeadline(time.Now().Add(time.Second)) | ||
|
||
if u.UDPBufferSize > 0 { | ||
u.listener.SetReadBuffer(u.UDPBufferSize) // if we want to move away from OS default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SetReadBuffer can return an error, probably best to check and log it if it errors
@@ -111,20 +119,29 @@ func (u *UdpListener) Stop() { | |||
func (u *UdpListener) udpListen() error { | |||
defer u.wg.Done() | |||
var err error | |||
|
|||
address, _ := net.ResolveUDPAddr("udp", u.ServiceAddress) | |||
u.listener, err = net.ListenUDP("udp", address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the buffer only needs to be set once, up here
Background for this patch/request - changing net.core.rmem_default system side is not possible for us. This determines the buffer used for every new socket, and high buffer values causes high memory utilization and bad performance (many, many cache misses) system wide. See https://blog.cloudflare.com/the-story-of-one-latency-spike/ for a good reason why you dont want to set the default value too high for all applications, but sadly we need away to boost this for telegraf because - as demonstrated below - without this we drop frames for telegraf. I believe this is option read-buffer for the InfluxDB server; its there for the same reason I believe - this was put there by @sparrc in ./services/udp/service.go:
To show why we need this for anything that receives UDP metrics, I produced this test server:
And ran a test:
Over 3 runs, this clearly shows that we drop a % of metrics:
On a default box:
Even boosting rmem_default:
And restarting the server didnt help much:
I honestly dont know why, I probably need to change the middle value in /proc/sys/net/ipv4/udp_mem or something. Regardless, thats not what we want to do - we want this only to affect telegraf (as we are trying to run telegraf on every machine, so changing the default value for every telegraf machine is totally impossible for us). Making a one line change:
And now we drop nothing, ever:
Now I set it to a crazy value:
And it seems to simply have set it to the maximum value, because there are still no drops. There is certainly no silent failure:
We really believe this is pretty compelling. metric senders can burst to very, very high rates (as they are local) and are much more likely to overwhelm kernel buffers than most applications. It does not appear to silently drop anything if set to a crazy high value on Linux (its very possible that other OS's do something else - Google suggests BSD does odd things?) What we hope is to merge this with a clear warning that this is unnecessary to tune unless you have collectors injecting megabytes of metrics in very short periods of time (which we do) and that any changes should be tested very thoroughly. |
You may be right that it doesn't silently fail, but instead just sets it to the maximum. fwiw I don't think doing Anyways, I understand now the use-case of wanting to set the max for just a single server rather than for everything. I can merge the change but just put a note in the config explaining that the user first needs to adjust rmem_max using |
Some settings do require you to reboot, others don't (you can reload sysctl settings and any new sockets created will pick up the new settings, at least in normal linux distros hehe, see here). Made the err code change you requested and will add some docs about boosting the max and allocating something underneath this where possible. (Let me know if you want me to squash...) |
updating with errcode adding debug flags to temp msgs moving from debug to info
@sparrc writing into /proc/ should take effect for all future sockets, but as @sebito91 pointed out is not super relevant. Is there anything else we can provide in this ticket to help? We absolutely need this to make effective use of telegraf as a source for UDP data; updating the defaults is impossible for us and so merging this is important if we are not have to indefinitely patch telegraf. It seems very likely that anybody else who is using the UDP listener for bursty data sources (which UDP is likely to produce) will also want this. |
nope, I don't think there are any other changes needed, just need to get around to merging it |
Any chance this can be added to a v1.2 nightly early in the cycle (if so, we can roll out and test) |
yes, will do |
Required for all PRs:
Adding a config option to allow users to set the ReadBuffer for UDP sockets when created. This is intentionally separate from the OS settings which you may not want to make global for one socket call. If the user doesn't specify the setting, it defaults to the OS definition; otherwise it's set to udp_buffer_size.
NOTE: golint is blowing up with alerts for underscores, happy to change these if you'd like.
NOTE_2: @wrigtim contributed too!