Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTSDB output can block Telegraf due to no timeout #3010

Closed
wei-hai opened this issue Jul 12, 2017 · 3 comments
Closed

OpenTSDB output can block Telegraf due to no timeout #3010

wei-hai opened this issue Jul 12, 2017 · 3 comments
Labels
area/opentsdb bug unexpected problem or unintended behavior

Comments

@wei-hai
Copy link

wei-hai commented Jul 12, 2017

Let's say I have two output plugins, influxdb and opentsdb, when I take down opentsdb, telegraf will be unable to send metric to neither as it reports connection failure. I suggest that if one of the outputs is not available, let telegraf send metric to the available ones, for the unavailable output, let telegraf either drop the data or keep it in the buffer/disk with a limit of size.

@danielnelson
Copy link
Contributor

This is a mostly known issue, there is a bit more discussion on #2919. However, so long as the output does not block for too long the other outputs should continue working, though it can still cause other performance issues. One of the challenges in fixing this is that under normal use it is used to throttle service inputs that read from queuing systems.

The best thing that can be done is to ensure there are timeouts configured on all of your outputs, and that are not too large. This way the output cannot block the main process for too long. However, as I look over the OpenTSDB output I don't see any timeout configuration, so it could blocked it would totally halt Telegraf.

I think we should change this ticket to be "OpenTSDB output can block Telegraf", does that sound alright?

@wei-hai
Copy link
Author

wei-hai commented Jul 12, 2017

@danielnelson agree, please feel free to update

@danielnelson danielnelson changed the title [Bug?] Telegraf doesn't work when one of multiple outputs is not available OpenTSDB output can block Telegraf due to no timeout Jul 12, 2017
@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Aug 23, 2017
@reimda
Copy link
Contributor

reimda commented Jun 13, 2022

This is expected behavior currently. When a metric is ready to be sent to outputs, it's added to the outputs one by one. If one output takes a long time sending, buffers can fill up and metrics can be dropped.

for metric := range unit.src {

To change this, telegraf would need to have a separate buffer per output plugin. We currently don't have plans to make that change.

@reimda reimda closed this as completed Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/opentsdb bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants