-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add internal event queuing and flushing #227
Conversation
a9ff5d3
to
d16e73e
Compare
interesting, this somewhat relates to the discussion we had around measuring UDP packet drops here #196 (comment) what would happen with the different listeners if the queue overflows? |
I skimmed through that a while back, and I like the idea in general, but I haven't actually read the PR. If nothing else, being able to have metrics for dropped packets I think has value. Another approach to getting those metrics might be to switch the front end listeners to do non-blocking sends on the event queue. That would allow dropping events when the queue is full and there could be a metric for that. That might be a simplest change than adding another stage to the processing pipeline. I think the behavior regarding queue overflows stays mostly the same. The event queue that the front end listeners push events to now has a bigger buffer, and there is the small internal buffer, but eventually both of those will fill and it will block on trying to send on that channel. |
That is my preferred approach as well, because it's platform independent. Does it make sense to add it here? |
At high traffic levels, the locking around sending on channels can cause a large amount of blocking and CPU usage. These adds an event queue mechanism so that events are queued for short period of time, and flushed in batches to the main exporter goroutine periodically. The default is is to flush every 1000 events, or every 200ms, whichever happens first. Signed-off-by: Clayton O'Neill <[email protected]>
If you're ok with it, I'd prefer to address that in a future PR. I'm not sure if I'll have the time to incorporate that into this PR in the short term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, my only concern is about visibility into what's happening with the queue. With only the flush counter, you can kinda sorta infer when it ran full because the rate goes over the baseline. I think it would be useful to get an earlier warning, and to be able to see when the queue is always nearly full, or there are short spikes that are over the batch size and could be handled in one batch with a slightly bigger queue size.
Co-Authored-By: Matthias Rampke <[email protected]> Signed-off-by: Clayton O'Neill <[email protected]>
@matthiasr I think that makes sense. I'd also like to get that metric in place also, but I'm tied up with another project at the moment. Your comment about the metric naming makes sense and I've updated the PR to incorporate that. |
and add the missing changelog entry for #227 Signed-off-by: Matthias Rampke <[email protected]>
At high traffic levels, the locking around sending on channels can cause a large amount of blocking and CPU usage. This adds an event queue mechanism so that events are queued for short period of time, and flushed in batches to the main exporter goroutine periodically.
The default is is to flush every 1000 events, or every 200ms, whichever happens first.