-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goroutine panic in docker.parseContainerStats #8692
Comments
I think it should be like:
|
Just upgraded to Docker version 20.10.2+dfsg1, same issue. |
@trygvis |
Telegraf is running in a Docker container through Docker compose: version: "3"
services:
telegraf:
image: telegraf:1.17
privileged: true
network_mode: host
volumes:
- /etc/XXX/telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- /sys:/rootfs/sys:ro
- /proc:/rootfs/proc:ro
- /etc:/rootfs/etc:ro
command:
- sh
- -c
- apt update && apt install -y --install-recommends=no smartmontools; exec telegraf
environment:
INFLUX_URL: "..."
INFLUX_SKIP_DATABASE_CREATION: "true"
HOST_PROC: "/rootfs/proc"
HOST_SYS: "/rootfs/sys"
HOST_ETC: "/rootfs/etc"
HOST_MOUNT_PREFIX: "/rootfs" Telegraf.conf: [global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = "akili"
omit_hostname = false
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
[[outputs.influxdb]]
urls = ["$INFLUX_URL"]
skip_database_creation = false |
@Aladex Is there anything else you need from me to debug? |
No. thanks. I will try with you config |
What about default docker stat?
or just
What is output of these commands? Try it with sudo too |
I'm getting the same output with with and without sudo, however my user is in the
|
@Aladex Any progress on this issue? Did my output help or do you need more tests? |
Hi. |
Hi. Can you check this bug from bare metal not from docker? Just start on your server or on PC from binary with your config |
I have this problem in bare metal. You need any documentation ? |
The problem stays the same, it doesn't matter if is Telegraf in a container or bare metal. |
Sry. But i can't reproduce this bug. @ssoroka who can help? |
I "solved" this problem run container via |
I've got the exact same issue, telegraf running in docker and if the docker plugin is enabled same trace. |
I am also encountering the same issue. |
@Aladex It it possible to fix the out-of-bounds problem if if you can't reproduce it? It is stopping me from getting any Docker statistics, anything would be better than nothing. |
This problem appeared to me when I ran a new instance on my pc. The exact same configuration on my other machine is running just fine. It has the same docker and telegraf versions. Almost the same telegraf config (except from a few paths and currently disabled docker input). The only difference I can think of is that the one which is working fine has been upgraded from older versions (docker and telegraf) while this issue manifested after a clean install. I am running Debian Buster on the "good" machine, and Debian Sid on the other. Same docker from their official apt repo ( |
I am also having the same issue. InfluxDB 2.0.3 and Telegraf v 1.17.2 running in docker. |
I'll add on, I am having this issue as well. InfluxDB 1.8.4 and Telegraf 1.17.3 |
Still have same issue with Docker 20.10.4 and Telegraf 1.17.3
|
Same issue on Linux with:
and Telegraf 1.15.2 Change version of telegraf and docker many time, doesn't fix the issue |
Has anyone had success with a lower version of Docker? |
Same issue. I think that it might be related to running docker with cgroups v2. |
I ran
From good server:
|
It seems I have the same issue with
In case you need more info please let me know. |
It also bombs when run outside of a container!
|
Same issue if I pass to the kernel command line
.... but not passing |
I am not sure what
|
without the change in the kernel command line I get the infamous:
but now it looks like
A preliminary check looks pretty good now. Seems not to crash anymore ;) |
@Aladex have you look at @RobertBerger's finings about cgroup2? |
Hi all |
This is a 100% bug in telegraf. Not only docker metrics aren't collected - telegraf stops working at all in such cases (nothing is collected). |
I tend to completely agree with you!
seems to throw the error. So I guess the only thing which needs to be done is to catch the case with 0/NULL with the docker plugin. It is 100% reproducible with telegraf native or telegraf in docker. Even better would be to check for cgroup v1 and cgroup v2 and try to read differenr things. |
Ok. I will reproduce this bug on VM with Fedora 20 an try to fix it |
@valodzka |
Bad:
Good:
|
As a workaround with 1.18 following config should fix this issue:
|
No. With this config you will switch off any kind of cpu stats. Or you need to set all another metrics except cpu from docker socket in this array. And dont forget about another metrics from cpu from this dictionary: |
Hi all |
I'm going to close this issue due to the fix that was merged, if the issue is still present please reopen the issue |
Thanks for the fix. e.g.
|
I'm getting a panic when running Telegraf v1.15 and v1.17:
Stacktrace
Relevant source code:
telegraf/plugins/inputs/docker/docker.go
Lines 673 to 680 in fbd54e8
It seems that a double check of the data actually being present would be useful, even though it might hide other problems. Perhaps a warning is in order.
Environment
Linux
OS: Debian unstable.
Docker
The text was updated successfully, but these errors were encountered: