Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in 1.9.0 #5035

Closed
benschweizer opened this issue Nov 26, 2018 · 3 comments · Fixed by #5052
Closed

Memory Leak in 1.9.0 #5035

benschweizer opened this issue Nov 26, 2018 · 3 comments · Fixed by #5052
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@benschweizer
Copy link
Contributor

benschweizer commented Nov 26, 2018

Hi there,

looks like telegraf 1.9.0 introduced some memory leak, I see a steady growth here:

image

Relevant telegraf.conf:

[global_tags]
  foo  = "bar"
[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 100000
  collection_jitter = "0s"
  flush_interval = "60s"
  flush_jitter = "5s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = "server.example.com"
  omit_hostname = false
[[outputs.influxdb]]
  urls = ['https://influxdb.example.com:8086']
  database = "telegraf"
  username = "telegraf"
  password = "s3cr3t"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.kernel_vmstat]]
[[inputs.net]]
[[inputs.netstat]]

System info:

Telegraf 1.9.0 release on Debian 9 AMD64

Steps to reproduce:

  1. start telegraf (in mostly the default config)
  2. watch memory consumption over time

Expected behavior:

memory consumption should be steady

Actual behavior:

memory consumption grows

Additional info:

Growth rate is ~1.5 MB/hour

@glinton glinton added the bug unexpected problem or unintended behavior label Nov 26, 2018
@danielnelson danielnelson self-assigned this Nov 27, 2018
@danielnelson
Copy link
Contributor

Thanks for the report, could you try enabling the internal input and check the values of the internal_write,plugin=influxdb series, in particular these fields:

  • metrics_added
  • metrics_written
  • metrics_dropped
  • metrics_filtered

@bolek2000
Copy link

bolek2000 commented Nov 28, 2018

I also see this problem here. I just installed a new VM with Ubuntu 18.04 and telegraf 1.9 and after initial startup of the service the memory usage is quickly rising with just a couple of basic system metrics enabled. After doing systemctl reload telegraf it drops down, but starts rising again.
On another machine (Ubuntu14.04), where I upgraded from 1.8.3 to 1.9. I don't see this behavior, but here I do periodic reloads of the agent, because of other issues. I also disabled the automatic reload and the memory consuption stays stable.

telegraf1 9_ubuntu18 04
telegraf1 9_ubuntu1404

Ubuntu18.04 host (here some lines are missing compared to 14.04):

2018-11-28T09:00:52Z I! Reloading Telegraf config
2018-11-28T09:00:52Z D! [agent] Stopping service inputs
2018-11-28T09:00:52Z D! [agent] Input channel closed
2018-11-28T09:00:52Z I! [agent] Hang on, flushing any cached metrics before shutdown
2018-11-28T09:00:52Z D! [outputs.influxdb] wrote batch of 33 metrics in 19.078282ms
2018-11-28T09:00:52Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 
2018-11-28T09:00:52Z D! [agent] Closing outputs
2018-11-28T09:00:52Z I! Loaded inputs: inputs.internal inputs.disk inputs.mem inputs.processes inputs.swap inputs.system inputs.diskio inputs.kernel inputs.linux_sysctl_fs inputs.netstat inputs.cpu inputs.kernel
_vmstat inputs.net inputs.nginx inputs.procstat
2018-11-28T09:00:52Z I! Loaded aggregators: 
2018-11-28T09:00:52Z I! Loaded processors: 
2018-11-28T09:00:52Z I! Loaded outputs: influxdb

Ubuntu14.04 host:

2018-11-27T06:50:02Z I! Reloading Telegraf config
2018-11-27T06:50:02Z D! [agent] Stopping service inputs
2018-11-27T06:50:02Z D! [inputs.tail] tail removed for file: /var/log/nginx/wp_access.log.sorted.10min
2018-11-27T06:50:02Z D! [agent] Input channel closed
2018-11-27T06:50:02Z D! [agent] Processor channel closed
2018-11-27T06:50:02Z I! [agent] Hang on, flushing any cached metrics before shutdown
2018-11-27T06:50:02Z D! [outputs.influxdb] wrote batch of 591 metrics in 41.72005ms
2018-11-27T06:50:02Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics. 
2018-11-27T06:50:02Z D! [agent] Closing outputs
2018-11-27T06:50:02Z I! Starting Telegraf 1.9.0
2018-11-27T06:50:02Z I! Loaded inputs: inputs.swap inputs.system inputs.procstat inputs.disk inputs.diskio inputs.kernel inputs.internal inputs.net inputs.netstat inputs.nginx inputs.cpu inputs.mem inputs.proces
ses inputs.memcached inputs.mysql inputs.phpfpm inputs.tail
2018-11-27T06:50:02Z I! Loaded aggregators: 
2018-11-27T06:50:02Z I! Loaded processors: regex regex regex
2018-11-27T06:50:02Z I! Loaded outputs: influxdb

@benschweizer
Copy link
Contributor Author

Hi @danielnelson

I've created a default config (--sample-config), the issue ist present with the defaults.
Here's a chart of the requested metrics:
image

Here's an update to the memory consumption (time range 14:12-14:50, growth +3mb):
image

Here's the used config (stripped comments)

[global_tags]
  foo= "bar"
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = "server.example.com"
  omit_hostname = false
[[outputs.influxdb]]
  urls = ['https://influxdb.example.com:8086']
  database = "telegraf"
  username = "telegraf"
  password = "s3cr3t"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.internal]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants