-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
outputs.influxdb not buffering points on telegraf 1.19.1 #9514
Comments
Maybe related with #9296 |
I ran your exact same config without localhost:8086 running and get a connection refused error.
When I start up localhost:8086 I get metrics_written=0i every time. Here is my output:
Can you clarify what you mean by, "No processes are listening on localhost:8086"? |
The idea of this configuration is to test the buffering mechanism of outputs.influxdb with telegraf 1.19.1 compared with 1.18.3 and also with outputs.http. In order to do that the test is done with no influxdb is listening in localhost:8086 nor endpoint listening 127.0.0.1:8080, we just run telegraf and the OS at the test machine. As a lot of metrics are collected, they should be entering to the buffer and no metrics should be reported as written. Telegraf 1.18.3 does exactly this as reported with: Telegraf 1.19.1 doesn't seem to buffer metrics correctly for outputs.influxdb as reported with: If internal_write is reporting real values, telegraf 1.19.1 agents that for some reason lose connectivity with influxdb will not properly buffer the metrics and if the connectivity problems take time to recover most metrics will be lost (buffer_size will not reach 'metric_buffer_limit' metrics regardless of the time the connection is down because of this behavior, it will not even reach 'metric_batch_size') |
Please try pr #9526 and see if it fixes your problem. |
pr #9526 artifacts behave as expected. |
With telegraf 1.19.1, internal_write reports values of buffer_size for outputs.influxdb that are smaller than metric_batch_size even when the influxdb instance is down and more points have been generated. Here is a configuration for comparison also with http output which buffers points correctly (as with previous versions).
Relevant telegraf.conf:
System info:
Linux on AMD64 with telegraf 1.19.1 (https://dl.influxdata.com/telegraf/releases/telegraf-1.19.1_linux_amd64.tar.gz) and 1.18.3 (https://dl.influxdata.com/telegraf/releases/telegraf-1.18.3_linux_amd64.tar.gz)
No processes are listening on localhost:8086 nor 127.0.0.1:8080.
Steps to reproduce:
$ tail telegraf-1.19.1.out | grep write | tail -3
internal_write,host=xxxxx,output=influxdb,region=eu-west-1,version=1.19.1 metrics_written=29000i,metrics_dropped=0i,buffer_size=682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=1823419i,errors=29i,metrics_added=29682i 1626627449853000000
internal_write,host=xxxxx,output=http,region=eu-west-1,version=1.19.1 metrics_dropped=0i,buffer_size=29682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=11507427i,errors=0i,metrics_added=29682i,metrics_written=0i 1626627449853000000
internal_write,host=xxxxx,output=file,region=eu-west-1,version=1.19.1 metrics_dropped=0i,buffer_size=682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=17995062i,errors=0i,metrics_added=29682i,metrics_written=29000i 1626627449853000000
$ tail telegraf-1.18.3.out | grep write | tail -3
internal_write,host=xxxxx,output=influxdb,region=eu-west-1,version=1.18.3 metrics_dropped=0i,buffer_size=29718i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=3341971i,errors=29i,metrics_added=29718i,metrics_written=0i 1626627746465000000
internal_write,host=xxxxx,output=http,region=eu-west-1,version=1.18.3 buffer_limit=100000i,metrics_filtered=0i,write_time_ns=13574467i,errors=0i,metrics_added=29718i,metrics_written=0i,metrics_dropped=0i,buffer_size=29718i 1626627746465000000
internal_write,host=xxxxx,output=file,region=eu-west-1,version=1.18.3 metrics_added=29718i,metrics_written=29000i,metrics_dropped=0i,buffer_size=718i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=20608019i,errors=0i 1626627746465000000
Expected behavior:
internal_write should report similar numbers for output=influxdb and output=http, in particular for metrics_written and buffer_size, just as when using telegraf 1.18.3. Gathered metric points should be buffered for both unavailable outputs, but only http output is buffered if using 1.19.1.
Actual behavior:
When using 1.19.1 binary internal_write reports high number for metrics_written even though influxdb is not up and low number for buffer_size as it wasn't buffering the gathered points.
internal_write,host=xxx,output=influxdb,...,version=1.19.1 metrics_written=29000i,...,buffer_size=682i
On the contrary, 1.18.3 binary works as expected.
internal_write,host=xxxxx,output=influxdb,...,version=1.18.3 ...buffer_size=29718i,...,metrics_written=0i
http output also works as expected for both binary versions:
internal_write,host=xxxxx,output=http,...,version=1.19.1 ...buffer_size=29682i,...,metrics_written=0i
internal_write,host=xxxxx,output=http,...,version=1.18.3 ...,metrics_written=0i,...,buffer_size=29718i
The text was updated successfully, but these errors were encountered: