Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf runs /sbin/init --version unconditionally #6846

Closed
metalgrid opened this issue Jan 2, 2020 · 5 comments · Fixed by #6849
Closed

Telegraf runs /sbin/init --version unconditionally #6846

metalgrid opened this issue Jan 2, 2020 · 5 comments · Fixed by #6849
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@metalgrid
Copy link

System info:

Linux 3.18 buildroot with sninit (https://github.com/arsv/sninit/)
Telegraf version 1.13.0

Steps to reproduce:

  1. run telegraf --help or telegraf --version

Expected behavior:

Telegraf help or telegraf version is printed to stdout

Actual behavior:

Telegraf executes /sbin/init --version. sninit has no understanding of --version and begins executing system initialization.

Additional info:

Configuration file or commandline options passed to telegraf are parsed only after /sbin/init --version invocation

@danielnelson
Copy link
Contributor

I don't understand how this could be correct, why would Telegraf execute /sbin/init?

@metalgrid
Copy link
Author

I'm not sure why would telegraf call that either, but the fact is it happens. And the moment I kill that other init process, telegraf resumes operation properly. I patched the init process to return an error when executed not as PID 1 and that helps - telegraf is running correctly now, but it's odd to have to do that in the first place. Below is some more information regarding the case.

Telegraf is currently Telegraf 1.12.6 (git: HEAD 6c7f2d62) but this is reproduced with the latest 1.13.0 as well.

[root][localhost][~]# sha256sum /usr/bin/telegraf 
2182f9734e7c536bf594b5a6253cde9a8f9c2e115806bdf1e025e417108ac1eb  /usr/bin/telegraf

strace-ing the main PID we can see it makes some checks for various binaries:

[root][localhost][~]# strace -e newfstatat telegraf
newfstatat(AT_FDCWD, "/usr/local/sbin/chronyc", 0xc00019a6b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/chronyc", 0xc00019a788, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/chronyc", 0xc00019a858, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/chronyc", {st_mode=S_IFREG|0755, st_size=80328, ...}, 0) = 0
newfstatat(AT_FDCWD, "/usr/local/sbin/getconf", 0xc0003aa378, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/getconf", 0xc0003aa448, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/getconf", 0xc0003aa518, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/getconf", 0xc0003aa5e8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/getconf", 0xc0003aa6b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/getconf", 0xc0003aa788, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/sbin/fail2ban-client", 0xc0003ab218, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/fail2ban-client", 0xc00019a6b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/fail2ban-client", 0xc00019a788, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/fail2ban-client", 0xc00019a858, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/fail2ban-client", 0xc00019a928, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/fail2ban-client", 0xc00019a9f8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/sbin/ipmitool", 0xc00019aac8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/ipmitool", 0xc00019ac68, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/ipmitool", 0xc00019ad38, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/ipmitool", {st_mode=S_IFREG|0755, st_size=825416, ...}, 0) = 0
newfstatat(AT_FDCWD, "/usr/local/sbin/sensors", 0xc0000bfa38, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/sensors", 0xc0000bfb08, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/sensors", 0xc0000bfbd8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/sensors", 0xc0000bfca8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/sensors", 0xc0000bfd78, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/sensors", 0xc0000bfe48, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/sbin/sadf", 0xc0000bff18, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/sadf", 0xc0004c0038, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/sadf", 0xc0000bfa38, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/sadf", {st_mode=S_IFREG|0755, st_size=289672, ...}, 0) = 0
newfstatat(AT_FDCWD, "/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/systemd/system", 0xc00019a9f8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/upstart-udev-bridge", 0xc00019aac8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/init", {st_mode=S_IFREG|0755, st_size=13160, ...}, 0) = 0

And here's the execve that runs /sbin/init:

[root][localhost][~]# strace -e execve -f telegraf --help
execve("/usr/bin/telegraf", ["telegraf", "--help"], 0x7ffe1c69b380 /* 14 vars */) = 0
strace: Process 3220 attached
strace: Process 3221 attached
strace: Process 3222 attached
strace: Process 3223 attached
strace: Process 3224 attached
strace: Process 3225 attached
strace: Process 3226 attached
strace: Process 3227 attached
strace: Process 3228 attached
[pid  3228] execve("/sbin/init", ["/sbin/init", "--version"], 0xc00012a000 /* 14 vars */) = 0
<snip>

Here's a htop snapshot showing the process tree. As you can see there's barely anything running on this system.
image

@metalgrid
Copy link
Author

While I was on this, I installed telegraf on my workstation (Telegraf 1.13.0 (git: HEAD 773e4ca)
from the apt repo on Debian 10) and tried strace on it. If it finds /run/systemd/system/ it runs fine:

root@workstation:~# strace -e execve -f telegraf --help
execve("/usr/bin/telegraf", ["telegraf", "--help"], 0x7ffc82784040 /* 18 vars */) = 0
strace: Process 98563 attached
strace: Process 98564 attached
strace: Process 98565 attached
strace: Process 98566 attached
strace: Process 98567 attached
strace: Process 98568 attached
strace: Process 98569 attached
strace: Process 98570 attached
strace: Process 98571 attached
[pid 98571] execve("/usr/bin/getconf", ["/usr/bin/getconf", "CLK_TCK"], 0xc0002cee60 /* 18 vars */) = 0
[pid 98571] +++ exited with 0 +++
[pid 98562] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=98571, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 98572 attached
strace: Process 98573 attached
strace: Process 98574 attached
strace: Process 98575 attached
strace: Process 98576 attached
strace: Process 98577 attached
strace: Process 98578 attached
Telegraf, The plugin-driven server agent for collecting and reporting metrics.

Usage:
<snip>

But if we remove the /run/systemd/system directory we see the behaviour from earlier:

root@workstation:~# mv /run/systemd/system{,.old}
root@workstation:~# strace -e newfstatat telegraf --help
newfstatat(AT_FDCWD, "/run/systemd/system", 0xc000134858, 0) = -1 ENOENT (Няма такъв файл или директория)
newfstatat(AT_FDCWD, "/sbin/upstart-udev-bridge", 0xc000134928, 0) = -1 ENOENT (Няма такъв файл или директория)
newfstatat(AT_FDCWD, "/sbin/init", {st_mode=S_IFREG|0755, st_size=1489208, ...}, 0) = 0

And execves:

root@workstation:~# strace -e execve -f telegraf --help
execve("/usr/bin/telegraf", ["telegraf", "--help"], 0x7ffc7685b570 /* 18 vars */) = 0
strace: Process 98710 attached
strace: Process 98711 attached
strace: Process 98712 attached
strace: Process 98713 attached
strace: Process 98714 attached
strace: Process 98715 attached
strace: Process 98716 attached
[pid 98716] execve("/sbin/init", ["/sbin/init", "--version"], 0xc00015d400 /* 18 vars */) = 0
[pid 98716] execve("/bin/systemctl", ["/sbin/init", "--version"], 0x7ffd45e73b10 /* 18 vars */) = 0
[pid 98716] execve("/lib/sysvinit/telinit", ["/sbin/init", "--version"], 0x7ffc4ef65ea0 /* 18 vars */) = -1 ENOENT (Няма такъв файл или директория)
[pid 98716] +++ exited with 1 +++
[pid 98713] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=98716, si_uid=0, si_status=1, si_utime=0, si_stime=1} ---
strace: Process 98717 attached
strace: Process 98718 attached
strace: Process 98719 attached
strace: Process 98720 attached
[pid 98720] execve("/usr/bin/getconf", ["/usr/bin/getconf", "CLK_TCK"], 0xc00023f860 /* 18 vars */) = 0
[pid 98720] +++ exited with 0 +++
[pid 98709] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=98720, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 98721 attached
strace: Process 98722 attached
strace: Process 98723 attached
Telegraf, The plugin-driven server agent for collecting and reporting metrics.

Usage:
<snip>

@danielnelson
Copy link
Contributor

Okay, I see this too, I think this might be related to github.com/kardianos/service, which is a library we are using to run as a service on Windows.

$ ack /sbin/init vendor/
vendor/github.com/kardianos/service/service_upstart_linux.go
24:     if _, err := os.Stat("/sbin/init"); err == nil {
25:             if out, err := exec.Command("/sbin/init", "--version").Output(); err == nil {
72:     out, err := exec.Command("/sbin/init", "--version").Output()

I'll keep investigating, but I expect we can remove this.

@danielnelson
Copy link
Contributor

Look like this has been fixed upstream, and we just need to update the library: kardianos/service#115

@danielnelson danielnelson added this to the 1.13.1 milestone Jan 3, 2020
@danielnelson danielnelson added fix pr to fix corresponding bug bug unexpected problem or unintended behavior and removed need more info fix pr to fix corresponding bug labels Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants