-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
splunk nozzle now doesn't block, won't disconnect for being slow #284
Conversation
(I may be able to cleanup some of the code with respect to the eventrouter in testing) |
(I think generally you'll be safer if the dropping buffer is as close to and with as little code in between it and the read call, but IIRC it looked fine, but made the requested changes) As for "having it be an option", if the nozzle blocks, those envelopes are getting dropped one way or another, there's no "not dropping" option, you're only forcing those envelopes to drop at the doppler due to the nozzle being unable to deliver the envelopes fast enough. |
Spent some time looking at this a bit more, this pr is actually not sufficient. Blocking could occur here splunk-firehose-nozzle/eventrouter/default.go Lines 67 to 68 in ecf9398
(In this case, I'm going to define blocking as "a step that can take on the order of milliseconds rather then nanoseconds" I suspect most of the blocking we see in cases is actually downstream blocking and not capi blocking, I think the code there should be mostly performant, but it's something I think is going to be hard to determine with the code instrumented as such. |
Hi @Benjamintf1 I get your point. I am trying to think of way to free the main thread to receive the next event immediately after receiving the current event from the doppler. For example, starting a separate Goroutine to route the event in this line and then either limit the Goroutines count (Bounded parallelism) or fix wait time before dropping the events. Let me know your thoughts. |
No, both of those will still allow promulgation of backpressure at not 100% cpu. The only way to prevent this is to either balloon(very bad) memory, or drop(less bad). |
I can leave more context in the issue(#285), I think we may have alternative or changed prs. |
(as a short note on shortening timeouts. 1000 logs per second is 1 log per millisecond. If the processing is too slow, eventually the buffer will fill up, and a even a extremely slow 10ms timeout is 10 lost logs, 5 seconds is 5,000, that's half a doppler buffer. Even if we reduce that by a factor of 5, a slow nozzle will have dropped 1000 logs on doppler for being unable to read before it reports the one metric it tried to send downstream) |
Actually, based on concerns I'm hearing about capi. I'm going to close this in favor of saying I'd highly suggest a solution that looks more like the first pr. Or a merging of the two. I can make a new pr if wanted. |
No description provided.