Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows input win_perf_counters #1291

Closed
discoduck2x opened this issue May 27, 2016 · 17 comments
Closed

Windows input win_perf_counters #1291

discoduck2x opened this issue May 27, 2016 · 17 comments
Labels
area/windows Related to windows plugins (win_eventlog, win_perf_counters, win_services) help wanted Request for community participation, code, contribution platform/windows
Milestone

Comments

@discoduck2x
Copy link

First , this is starting to look really good - small cpu footprint when collecting quiete a few counters - almost as good as typeperf locally :)

Two problems ive encountered now on my first run:

  1. Cannot use wildcards on instance names - would be nice to catch just a few chosen processes
    Instance = ["beginningofanyprocessname"]
  2. Also, the value that gets written into influx maxes out at 100 even though the process is using more cpu% - on a 8core machine when running for example 7zip to creata an archine the 7zG process is using 5-600% of total 800% available but telegraf only enters 100 on %usage for the 7zG process.
@sparrc
Copy link
Contributor

sparrc commented Jun 1, 2016

any ideas @TheFlyingCorpse?

@TheFlyingCorpse
Copy link
Contributor

@discoduck2x:

  1. Right, this feature IS missing iirc. It could be implemented in a few ways, any suggestions on how you would write a query for it?
  2. I am not doing anything to the values, it is how perfmon sees them.

An "issue" with the current implementation I did is that it doesnt re-check for new matches on the perfcounters regularly, say every 10 mins or so. Depending on how often you restart / start new instances, it might not show at all with Telegraf, which should also be looked at incase you fall under this.

@discoduck2x
Copy link
Author

@TheFlyingCorpse

  1. unfortunatley im not a developer, mearly a metric thirsty tech ops, wish i could help
  2. but then what are you using to check the counters? a "lib" or function/third party implementation? feels like that implementation somehow isnt getting the correct value for the process cpu counters , if you check with perfmon manually, or with typeperf , or with any other "standard" way to query performance counters you get the "correct" values.

typeperf/perfmon gives this data:
image
...where telegraf simply will flat out with just 100 . . . .

I've been until now using typeperf and dumping that into csv files which i later parse and ship to influx, its the most non cpu intrusive way to collect windows counters - 1-2% cpu and no spikes whatsoever while collecting alot of counters , but the way to process it is "ugly" and a good agent like telegraf would make it into everybody´s livingroom (windowspeeps)

@elvarb
Copy link

elvarb commented Aug 8, 2016

  1. Right, this feature IS missing iirc. It could be implemented in a few ways, any suggestions on how you would write a query for it?

The easiest implementation and possibly the fastest is to just query * and then have telegraf do the filtering

@discoduck2x
Copy link
Author

@elvarb but not sufficient,, for counters with many instances you will get a CPU spike/hit every time you query them, sure * is fine for non timecritical/lowlatency systems but for such its not ok to hog down your hosts.

@elvarb
Copy link

elvarb commented Aug 9, 2016

@discoduck2x Depends on the counter, some are extremely fast. The only other method is to query * when Telegraf starts and make independent queries out of the result.

@wardboumans
Copy link
Contributor

Any idea when this will be implemented? Really usefull for instances that change name. E.g. include runtime version in name.

@discoduck2x
Copy link
Author

@TheFlyingCorpse @sparrc
any updates? still not able to use telegraf for windows since both issues remain

@VVvKamper
Copy link
Contributor

  1. Also, the value that gets written into influx maxes out at 100 even though the process is using more cpu% - on a 8core machine when running for example 7zip to creata an archine the 7zG process is using 5-600% of total 800% available but telegraf only enters 100 on %usage for the 7zG process.

This is because of missing PDH_FMT_NOCAP100 flag on pdh_GetFormattedCounterValue (source) and pdh_GetFormattedCounterArrayW (source) calls. I fixed this in local installation, but i'm not sure how it will affect the rest of the win perf counters.

@discoduck2x
Copy link
Author

@VVvKamper nice catch - any chance you providing a test bin for dl somewhere?

@VVvKamper
Copy link
Contributor

Yes, i uploaded it on https://ufile.io/4ecb6. Builded from 1.2.1 tag with following fix.

Fix was just to change format option from uintptr(PDH_FMT_DOUBLE) to uintptr(PDH_FMT_DOUBLE|PDH_FMT_NOCAP100).

@sparrc
Copy link
Contributor

sparrc commented Mar 2, 2017

wow, good catch, can you submit a PR @VVvKamper?

Windows is really something else 🤦‍♂️

@discoduck2x
Copy link
Author

@sparrc - both species can cooexist in perfect harmony...almost....well ok they should in theory

VVvKamper added a commit to VVvKamper/telegraf that referenced this issue Mar 2, 2017
added PDH_FMT_NOCAP100 format option
@VVvKamper
Copy link
Contributor

@sparrc I submitted PR. Feel free to ask for changes cause it's my first PR :)

sparrc pushed a commit that referenced this issue Mar 8, 2017
added PDH_FMT_NOCAP100 format option

closes #2483
ssorathia pushed a commit to ssorathia/telegraf that referenced this issue Mar 25, 2017
added PDH_FMT_NOCAP100 format option

closes influxdata#2483
calerogers pushed a commit to calerogers/telegraf that referenced this issue Apr 5, 2017
added PDH_FMT_NOCAP100 format option

closes influxdata#2483
@danielnelson danielnelson added area/windows Related to windows plugins (win_eventlog, win_perf_counters, win_services) help wanted Request for community participation, code, contribution and removed help wanted Request for community participation, code, contribution labels May 11, 2017
vlamug pushed a commit to vlamug/telegraf that referenced this issue May 30, 2017
added PDH_FMT_NOCAP100 format option

closes influxdata#2483
@discoduck2x
Copy link
Author

still no update to #1 on this one?

@christianhill
Copy link

christianhill commented Sep 28, 2017

Just started using Telegraf, quite impressed with what I'm seeing so far.

However, I too would like to see the ability to use wildcards on instance names, if possible. It's a pain to have to edit the config every time we add new queues to a server, but at the same time, we don't necessarily want to have to scrape the data for all queues.

@danielnelson
Copy link
Contributor

Wildcard support should be fixed by #4189, it would be great if everyone could try it out using the nightly builds.

maxunt pushed a commit that referenced this issue Jun 26, 2018
added PDH_FMT_NOCAP100 format option

closes #2483
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/windows Related to windows plugins (win_eventlog, win_perf_counters, win_services) help wanted Request for community participation, code, contribution platform/windows
Projects
None yet
Development

No branches or pull requests

8 participants