Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Node exporter does not work after the upgrade - unknown long flag '--collector.netdev.ignored-devices' #1681

Closed
przemyslavic opened this issue Sep 17, 2020 · 1 comment · Fixed by #1685

Comments

@przemyslavic
Copy link
Collaborator

Describe the bug

  1. Node exporter service is not working after running the upgrade command. It looks like an old systemd service configuration has been applied.
  2. To check the installed version the command /opt/node_exporter/node_exporter --version is called which returns too much information when called directly on the vm and in the case of Ansible it shows nothing:
root@ec2-xx-xx-xx-xx:~# /opt/node_exporter/node_exporter --version
node_exporter, version 0.16.0 (branch: HEAD, revision: d42bd70f4363dced6b77d8fc311ea57b63387e4f)
  build user:       root@a67a9bc13a69
  build date:       20180515-15:52:42
  go version:       go1.9.6
2020-09-17T12:56:19.9159473Z[38;21m12:56:19 INFO cli.engine.ansible.AnsibleCommand - TASK [upgrade : Node Exporter | Print version] *********************************[0m
2020-09-17T12:56:20.0773448Z[38;21m12:56:20 INFO cli.engine.ansible.AnsibleCommand - ok: [ec2-34-247-70-225.eu-west-1.compute.amazonaws.com] => {[0m
2020-09-17T12:56:20.0774876Z[38;21m12:56:20 INFO cli.engine.ansible.AnsibleCommand -     "msg": [[0m
2020-09-17T12:56:20.0775705Z[38;21m12:56:20 INFO cli.engine.ansible.AnsibleCommand -         "Installed version: ",[0m
2020-09-17T12:56:20.0781098Z[38;21m12:56:20 INFO cli.engine.ansible.AnsibleCommand -         "Target version: 1.0.1"[0m
  1. When running the epicli upgrade command, the update of the node exporter will always be started, which in the case of the next versions, where we will probably not update the node exporter again, is redundant in my opinion. I think we should compare the current version with the target version and make the update dependent on it.

To Reproduce
Steps to reproduce the bug:

  1. Deploy a new cluster from v0.7 branch (or use the following image epiphanyplatform/epicli:0.7.1). One master vm should be enough to reproduce it.
  2. Run epicli upgrade -b /path/to/build/dir from develop branch.

Expected behavior
Node exporter service is working properly after upgrading to 1.0.1.

Actual behavior
Node exporter service failed to start.

root@ec2-xx-xx-xx-xx:/opt/node_exporter# systemctl status prometheus-node-exporter.service
● prometheus-node-exporter.service - Service that runs Prometheus Node Exporter
   Loaded: loaded (/etc/systemd/system/prometheus-node-exporter.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2020-09-17 13:35:07 UTC; 13min ago
  Process: 13791 ExecStart=/opt/node_exporter/node_exporter --collector.conntrack --collector.diskstats --collector.entropy --collector.filefd --collector.filesystem --collector.loadavg --collector.mdadm --colle Main PID: 13791 (code=exited, status=1/FAILURE)

Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: prometheus-node-exporter.service: Main process exited, code=exited, status=1/FAILURE
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: prometheus-node-exporter.service: Failed with result 'exit-code'.
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: prometheus-node-exporter.service: Service hold-off time over, scheduling restart.
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: prometheus-node-exporter.service: Scheduled restart job, restart counter is at 5.
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: Stopped Service that runs Prometheus Node Exporter.
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: prometheus-node-exporter.service: Start request repeated too quickly.
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: prometheus-node-exporter.service: Failed with result 'exit-code'.
Sep 17 13:35:07 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com systemd[1]: Failed to start Service that runs Prometheus Node Exporter.

OS (please complete the following information):

  • OS: [all]

Cloud Environment (please complete the following information):

  • Cloud Provider [all]

Logs:

Sep 17 12:55:54 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[23143]: time="2020-09-17T12:55:54Z" level=error msg="ERROR: diskstats collector failed after 0.000153s: invalid line for /proc/diskstats for nvme0n1p1" source="collector.go:132"
Sep 17 12:56:09 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[23143]: time="2020-09-17T12:56:09Z" level=error msg="Error reading textfile collector directory \"/var/lib/prometheus/node-exporter\": open /var/lib/prometheus/node-exporter: no such file or directory" source="textfile.go:192"
Sep 17 12:56:09 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[23143]: time="2020-09-17T12:56:09Z" level=error msg="ERROR: diskstats collector failed after 0.000145s: invalid line for /proc/diskstats for nvme0n1" source="collector.go:132"
Sep 17 12:56:40 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[22664]: node_exporter: error: unknown long flag '--collector.netdev.ignored-devices', try --help
Sep 17 12:56:40 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[22724]: node_exporter: error: unknown long flag '--collector.netdev.ignored-devices', try --help
Sep 17 12:56:40 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[22747]: node_exporter: error: unknown long flag '--collector.netdev.ignored-devices', try --help
Sep 17 12:56:41 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[22775]: node_exporter: error: unknown long flag '--collector.netdev.ignored-devices', try --help
Sep 17 12:56:41 ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com prometheus-node-exporter[22801]: node_exporter: error: unknown long flag '--collector.netdev.ignored-devices', try --help
@przemyslavic
Copy link
Collaborator Author

Node Exporter upgrade works fine after applying the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants