Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputs.cpu errors or return potential wrong values on Windows #4269

Closed
back2root opened this issue Jun 11, 2018 · 2 comments
Closed

inputs.cpu errors or return potential wrong values on Windows #4269

back2root opened this issue Jun 11, 2018 · 2 comments
Labels
bug unexpected problem or unintended behavior platform/windows upstream bug or issues that rely on dependency fixes

Comments

@back2root
Copy link

Relevant telegraf.conf:

# Read metrics about cpu usage
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = false

Not related to the error but only for ease of reproduction:

[[outputs.file]]
#   Write output to "stdout"
files = ["stdout"]
data_format = "influx"

System info:

Affected operating System: Windows

Tested on:

  • Telegraf 1.6.4
  • Telegraf v1.8.0~87f711a1 (git: master 87f711a)

Steps to reproduce:

  1. Download Telegraf
  2. Ensure that inputs.cpu is enabled
  3. I suggest to enable outputs.file with: files = ["stdout"] as one and only output but the output shouldn't matter
  4. Run Telegraf. E.g.: telegraf -config telegraf.conf (--test)

Expected behavior:

Telegraf is running smothely and collecting every interval seconds CPU metrics.

Actual behavior:

From time to time Telegraf is throwing an error and isn't reporting CPU metrics for all CPU cores:

E! Error: current total CPU time is less than previous total CPU time

The more cpu cores you have (+ percpu = true) the more likely it is that the error is thrown.

Additional info:

Telegraf uses github.com/shirou/gopsutil/cpu to gather cpu metrics and expects that the returned values are used cpu time. How ever for Windows Plattform the used library already returns percentage values that the library itself gatherd via WMI.
Thus later checks on the returned values fail as it is expected that the cpu time used may only rise on normal conditions. How ever the retruned cpu percent used will not follow this expectation. In addition later calculations of the cpu percent used makes no sence on percent values.

So for Windows plattform all the checks and calculation made by Telegraf using the variable lastStats are not needed/problematic.

@danielnelson danielnelson added bug unexpected problem or unintended behavior platform/windows labels Jun 11, 2018
@argerus
Copy link

argerus commented Jul 5, 2018

Probably caused by this bug in github.com/shirou/gopsutil/cpu

@danielnelson danielnelson added the upstream bug or issues that rely on dependency fixes label Jul 5, 2018
@zak-pawel
Copy link
Collaborator

I easily reproduced it using Telegraf 1.6.4.
I couldn't reproduce it using Telegraf 1.16.2 - seems that mentioned bug in github.com/shirou/gopsutil/cpu was fixed here in v2.18.12 version. Currently Telegraf uses v2.20.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior platform/windows upstream bug or issues that rely on dependency fixes
Projects
None yet
Development

No branches or pull requests

4 participants