splunk nozzle now doesn't block, won't disconnect for being slow #284

Benjamintf1 · 2021-09-07T22:05:27Z

No description provided.

Benjamintf1 · 2021-09-07T22:06:55Z

(I may be able to cleanup some of the code with respect to the eventrouter in testing)

Benjamintf1 · 2021-09-07T22:15:55Z

(I think generally you'll be safer if the dropping buffer is as close to and with as little code in between it and the read call, but IIRC it looked fine, but made the requested changes)

As for "having it be an option", if the nozzle blocks, those envelopes are getting dropped one way or another, there's no "not dropping" option, you're only forcing those envelopes to drop at the doppler due to the nozzle being unable to deliver the envelopes fast enough.

Benjamintf1 · 2021-09-07T22:33:17Z

#285

Benjamintf1 · 2021-09-08T16:16:30Z

Spent some time looking at this a bit more, this pr is actually not sufficient. Blocking could occur here

splunk-firehose-nozzle/eventrouter/default.go

Lines 67 to 68 in ecf9398

    
           event.AnnotateWithEnvelopeData(msg) 
        
           event.AnnotateWithCFMetaData()

.

(In this case, I'm going to define blocking as "a step that can take on the order of milliseconds rather then nanoseconds"

I suspect most of the blocking we see in cases is actually downstream blocking and not capi blocking, I think the code there should be mostly performant, but it's something I think is going to be hard to determine with the code instrumented as such.

kashyap-splunk · 2021-09-09T10:33:15Z

Hi @Benjamintf1 I get your point. I am trying to think of way to free the main thread to receive the next event immediately after receiving the current event from the doppler. For example, starting a separate Goroutine to route the event in this line and then either limit the Goroutines count (Bounded parallelism) or fix wait time before dropping the events.

https://github.com/cloudfoundry-community/splunk-firehose-nozzle/blob/develop/nozzle/nozzle.go#L69

Let me know your thoughts.

Benjamintf1 · 2021-09-09T20:14:29Z

No, both of those will still allow promulgation of backpressure at not 100% cpu. The only way to prevent this is to either balloon(very bad) memory, or drop(less bad).

Benjamintf1 · 2021-09-09T20:17:39Z

I can leave more context in the issue(#285), I think we may have alternative or changed prs.

Benjamintf1 · 2021-09-09T21:30:43Z

(as a short note on shortening timeouts. 1000 logs per second is 1 log per millisecond. If the processing is too slow, eventually the buffer will fill up, and a even a extremely slow 10ms timeout is 10 lost logs, 5 seconds is 5,000, that's half a doppler buffer. Even if we reduce that by a factor of 5, a slow nozzle will have dropped 1000 logs on doppler for being unable to read before it reports the one metric it tried to send downstream)

Benjamintf1 · 2021-09-14T21:41:26Z

Actually, based on concerns I'm hearing about capi. I'm going to close this in favor of saying I'd highly suggest a solution that looks more like the first pr. Or a merging of the two. I can make a new pr if wanted.

splunk nozzle now doesn't block, won't disconnect for being slow

e7fdaef

Benjamintf1 mentioned this pull request Sep 7, 2021

Prevent splunk nozzle from disconnecting due to being a slow consumer #280

Closed

Benjamintf1 closed this Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

splunk nozzle now doesn't block, won't disconnect for being slow #284

splunk nozzle now doesn't block, won't disconnect for being slow #284

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 8, 2021 •

edited

Loading

kashyap-splunk commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 14, 2021

splunk nozzle now doesn't block, won't disconnect for being slow #284

splunk nozzle now doesn't block, won't disconnect for being slow #284

Conversation

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 7, 2021

Benjamintf1 commented Sep 8, 2021 • edited Loading

kashyap-splunk commented Sep 9, 2021 • edited Loading

Benjamintf1 commented Sep 9, 2021 • edited Loading

Benjamintf1 commented Sep 9, 2021 • edited Loading

Benjamintf1 commented Sep 9, 2021 • edited Loading

Benjamintf1 commented Sep 14, 2021

Benjamintf1 commented Sep 8, 2021 •

edited

Loading

kashyap-splunk commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 9, 2021 •

edited

Loading

Benjamintf1 commented Sep 9, 2021 •

edited

Loading