Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Wavefront internal sender's buffer size configurable #8002

Closed

Conversation

hanwavefront
Copy link

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Updating Wavefront Output Plugin with buffer size configuration option (#7409) and related config options.

@sjwang90 sjwang90 added area/wavefront feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Aug 25, 2020
@sjwang90
Copy link
Contributor

Thanks for the PR! We'll take a look and let you know if we have any comments.

@prydin If you have any feedback that'd be great.

@prydin
Copy link
Contributor

prydin commented Aug 25, 2020

@prydin If you have any feedback that'd be great.

Looks good to me.

@sjwang90 sjwang90 added this to the 1.15.3 milestone Aug 25, 2020
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at this point I don't understand why we would want to define these on the plugin. These are settings that are globally available already, and it doesn't appear that the plugin needs to access these.

#batch_size = 10000

## Interval (in seconds) at which to flush data. Defaults to 5 seconds.
#flush_interval_seconds = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush_interval, metric_buffer_limit and metric_batch_size are set automatically and override-able on outputs. Typically we don't add these to the output config readme, as they can be overridden on any output, and adding them to the config is going to conflict with those values.

@ssoroka
Copy link
Contributor

ssoroka commented Sep 8, 2020

Ok, I see the issue. To clarify, the problem is that wavefront-sdk-go's BatchSize, MaxBufferSize, (though not necessarily FlushIntervalSeconds) conflict with our naming of BatchSize, MaxBufferSize, and FlushInterval, and setting these on the output is going to block the ability to override these settings for this output, so we probably don't want to do that.

Possible options:
A) Name them WavefrontBatchSize, WavefrontMaxBufferSize, and WavefrontFlushInterval (the seconds part is kind of pointless if you have duration types).
B) Don't expose them and set these values to whatever the influx values are set to. This might be difficult because I don't think they're directly exposed to the plugin.

What's the problem we're trying to solve here? I don't see an open issue.

@hanwavefront
Copy link
Author

hanwavefront commented Sep 9, 2020

@ssoroka Thanks for the review. The corresponding issue is #7409 -- wavefront-sdk-go's config params (and specifically the buffer limit) need to be customizable to align with the influx values and prevent points from being dropped. I'm hesitant about option B because I can envision a scenario where we would want the values to be different than the influx values (e.g., internal metrics in wavefront-sdk-go). But I'm not clear on where the name conflict is and why option A is necessary? The names are different from the influx names (MetricBatchSize, MetricBufferLimit, and FlushInterval), and they would only be configured on [[outputs.wavefront]]. Let me know if I'm missing something.

@sjwang90 sjwang90 modified the milestones: 1.15.3, 1.15.4 Sep 14, 2020
@hanwavefront
Copy link
Author

@ssoroka Please take a look at my previous comment if you get a chance. These new config params pertain only to the Wavefront output plugin's behavior and I don't believe they conflict with their influx counterparts, but let me know if I'm mistaken.

@ssoroka
Copy link
Contributor

ssoroka commented Sep 21, 2020

After reading the issue, I'm concerned that this change won't resolve the problem. Wavefront should not have its own internal buffer. You end up with two buffers, and it doesn't matter what the size is, it's going to eventually drop metrics. Wavefront should instead be blocking and actually writing the metrics out when passed batches of 1000 (agent.metric_batch_size) metrics, because Telegraf is managing the buffer, and keeping track of retries on failures.

Said another way, because Wavefront is not blocking and applying backpressure, it's not communicating with Telegraf that the buffer is full and to slow down. Due to this lack of communication, Telegraf happily sends all the metrics in, resulting in overflowing the wavefront buffer.

I'd recommend removing the wavefront buffer and sending the metrics immediately when passed a batch, and blocking until it succeeds or fails. If for some reason you really must have double buffers (I still think this is a mistake), then wavefront needs to block when the buffer is full, until there is room in the buffer to accept the new batch, not return an error and drop messages.

cc @prydin

@ssoroka
Copy link
Contributor

ssoroka commented Nov 3, 2020

closing this in favor of #8165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/wavefront feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants