Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new performance counter metrics #14625

Merged
merged 5 commits into from
Jun 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions vsphere/datadog_checks/vsphere/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@
'cpu.usagemhz.avg',
'cpu.used.sum',
'cpu.wait.sum',
'cpu.capacity.demand.avg',
'cpu.capacity.usage.avg',
'cpu.capacity.contention.avg',
'datastore.maxTotalLatency.latest',
'datastore.numberReadAveraged.avg',
'datastore.numberWriteAveraged.avg',
Expand Down Expand Up @@ -123,6 +126,8 @@
'mem.usage.avg',
'mem.vmmemctl.avg',
'mem.vmmemctltarget.avg',
'mem.capacity.usage.avg',
'mem.capacity.contention.avg',
'mem.zero.avg',
'mem.zipSaved.latest',
'mem.zipped.latest',
Expand All @@ -141,6 +146,7 @@
'net.received.avg',
'net.transmitted.avg',
'net.usage.avg',
'net.throughput.usage.avg',
'power.energy.sum',
'power.power.avg',
'rescpu.actav1.latest',
Expand Down Expand Up @@ -202,6 +208,8 @@
'cpu.used.sum',
'cpu.utilization.avg',
'cpu.wait.sum',
'cpu.capacity.usage.avg',
'cpu.capacity.contention.avg',
'datastore.datastoreIops.avg',
'datastore.datastoreMaxQueueDepth.latest',
'datastore.datastoreNormalReadLatency.latest',
Expand Down Expand Up @@ -286,6 +294,8 @@
'mem.totalCapacity.avg',
'mem.unreserved.avg',
'mem.usage.avg',
'mem.capacity.usage.avg',
'mem.capacity.contention.avg',
'mem.vmfs.pbc.capMissRatio.latest',
'mem.vmfs.pbc.overhead.latest',
'mem.vmfs.pbc.size.latest',
Expand All @@ -310,6 +320,7 @@
'net.transmitted.avg',
'net.unknownProtos.sum',
'net.usage.avg',
'net.throughput.usage.avg',
'power.energy.sum',
'power.power.avg',
'power.powerCap.avg',
Expand Down Expand Up @@ -436,11 +447,15 @@
'cpu.totalmhz.avg',
'cpu.usage.avg',
'cpu.usagemhz.avg',
'cpu.capacity.usage.avg',
'cpu.capacity.contention.avg',
'mem.consumed.avg',
'mem.overhead.avg',
'mem.totalmb.avg',
'mem.usage.avg',
'mem.vmmemctl.avg',
'mem.capacity.usage.avg',
'mem.capacity.contention.avg',
'vmop.numChangeDS.latest',
'vmop.numChangeHost.latest',
'vmop.numChangeHostDS.latest',
Expand Down
6 changes: 6 additions & 0 deletions vsphere/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ vsphere.cpu.usagemhz.avg,gauge,,megahertz,,"CPU usage, as measured in megahertz"
vsphere.cpu.used.sum,gauge,,millisecond,,"Time accounted to the virtual machine. If a system service runs on behalf of this virtual machine, the time spent by that service (represented by cpu.system) should be charged to this virtual machine. If not, the time spent (represented by cpu.overlap) should not be charged against this virtual machine.",0,vsphere,cpu used sum,
vsphere.cpu.utilization.avg,gauge,,percent,,CPU utilization as a percentage during the interval (CPU usage and CPU utilization might be different due to power management technologies or hyper-threading),-1,vsphere,cpu utilization avg,
vsphere.cpu.wait.sum,gauge,,millisecond,,"Total CPU time spent in wait state.The wait total includes time spent the CPU Idle, CPU Swap Wait, and CPU I/O Wait states.",0,vsphere,cpu wait sum,
vsphere.cpu.capacity.demand.avg,gauge,,megahertz,,"The amount of CPU resources a virtual machine would use if there were no CPU contention or CPU limit.",0,vsphere,cpu capacity demand avg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to remove the .avg suffix since it's not a percent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the suffix, since we have a naming standard where existing percent metrics have the average suffix: https://github.com/DataDog/integrations-core/blob/master/vsphere/datadog_checks/vsphere/metrics.py#L10

The suffix corresponds to the rollup type in vsphere, and this metric uses the average rollup type https://vdc-download.vmware.com/vmwb-repository/dcr-public/b50dcbbf-051d-4204-a3e7-e1b618c1e384/538cf2ec-b34f-4bae-a332-3820ef9e7773/cpu_counters.html

vsphere.cpu.capacity.usage.avg,gauge,,megahertz,,"CPU usage as a percent during the interval.",0,vsphere,cpu capacity usage avg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the unit for this be percent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this as well but I wanted to follow the vSphere documentation exactly: https://vdc-download.vmware.com/vmwb-repository/dcr-public/b50dcbbf-051d-4204-a3e7-e1b618c1e384/538cf2ec-b34f-4bae-a332-3820ef9e7773/cpu_counters.html I've gotten samples in the thousands so I think the description is slightly inaccurate since the data and unit match.

vsphere.cpu.capacity.contention.avg,gauge,,percent,,"Percent of time the virtual machine is unable to run because it is contending for access to the physical CPU(s).",0,vsphere,cpu capacity contention avg,
vsphere.datastore.busResets.sum,gauge,,command,,Number of SCSI-bus reset commands issued,0,vsphere,datastore busResets sum,
vsphere.datastore.commandsAborted.sum,gauge,,command,,Number of SCSI commands aborted,-1,vsphere,commandsAborted sum,
vsphere.datastore.datastoreIops.avg,gauge,,operation,second,Storage I/O Control aggregated IOPS,-1,vsphere,datastoreIops avg,
Expand Down Expand Up @@ -156,6 +159,8 @@ vsphere.mem.vmmemctltarget.avg,gauge,,kibibyte,,"Target value set by VMkernal fo
vsphere.mem.zero.avg,gauge,,kibibyte,,"Memory that contains 0s only. Included in shared amount. Through transparent page sharing, zero memory pages can be shared among virtual machines that run the same operating system",0,vsphere,mem zero avg,
vsphere.mem.zipSaved.latest,gauge,,kibibyte,,Memory saved due to memory zipping,0,vsphere,mem zipSaved latest,
vsphere.mem.zipped.latest,gauge,,kibibyte,,Memory zipped,0,vsphere,mem zipped latest,
vsphere.mem.capacity.usage.avg,gauge,,kilobyte,,"Amount of physical memory actively used.",0,vsphere,mem capacity usage avg,
vsphere.mem.capacity.contention.avg,gauge,,percent,,"Percentage of time VMs are waiting to access swapped, compressed or ballooned memory.",0,vsphere,mem capacity contention avg,
vsphere.net.broadcastRx.sum,gauge,,packet,,Number of broadcast packets received,0,vsphere,net broadcastRx sum,
vsphere.net.broadcastTx.sum,gauge,,packet,,Number of broadcast packets transmitted,0,vsphere,net broadcastTx sum,
vsphere.net.bytesRx.avg,gauge,,kibibyte,second,Average amount of data received per second,0,vsphere,net bytesRx avg,
Expand All @@ -174,6 +179,7 @@ vsphere.net.received.avg,gauge,,kibibyte,second,Average rate at which data was r
vsphere.net.transmitted.avg,gauge,,kibibyte,second,Average rate at which data was transmitted during the interval. This represents the bandwidth of the network,0,vsphere,net transmitted avg,
vsphere.net.unknownProtos.sum,gauge,,kibibyte,second,Number of frames with unknown protocol received,0,vsphere,net unknownProtos sum,
vsphere.net.usage.avg,gauge,,kibibyte,second,Network utilization (combined transmit- and receive-rates),0,vsphere,net usage avg,
vsphere.net.throughput.usage.avg,gauge,,kibibyte,second,The current network bandwidth usage for the host.,0,vsphere,net throughput usage avg,
vsphere.network.received,rate,,kibibyte,,Number of kilobytes received by the host,-1,vsphere,net rx,
vsphere.network.transmitted,rate,,kibibyte,,Number of kilobytes transmitted by the host,-1,vsphere,net tx,
vsphere.power.energy.sum,gauge,,,,"Total energy (in joule) used since last stats reset.",-1,vsphere,power energy sum,
Expand Down
90 changes: 90 additions & 0 deletions vsphere/tests/fixtures/metrics_historical.json
Original file line number Diff line number Diff line change
Expand Up @@ -3497,5 +3497,95 @@
],
"counterId": 279,
"instance": ""
},
{
"entity": "datastore-cluster",
"value": [
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205
],
"counterId": 19,
"instance": ""
},
{
"entity": "Cluster1",
"value": [
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205
],
"counterId": 19,
"instance": ""
},
{
"entity": "Cluster2",
"value": [
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205,
205
],
"counterId": 19,
"instance": ""
}
]
36 changes: 36 additions & 0 deletions vsphere/tests/fixtures/metrics_historical_values.json
Original file line number Diff line number Diff line change
Expand Up @@ -1563,6 +1563,42 @@
"vsphere_type:cluster"
]
},
{
"name": "vsphere.cpu.capacity.contention.avg",
"value": 2.05,
"tags": [
"vcenter_server:FAKE",
"vsphere_cluster:Cluster2",
"vsphere_datacenter:Dätacenter",
"vsphere_folder:Datacenters",
"vsphere_folder:host",
"vsphere_type:cluster"
]
},
{
"name": "vsphere.cpu.capacity.contention.avg",
"value": 2.05,
"tags": [
"vcenter_server:FAKE",
"vsphere_cluster:Cluster2",
"vsphere_datacenter:Datacenter2",
"vsphere_folder:Datacenters",
"vsphere_folder:host",
"vsphere_type:cluster"
]
},
{
"name": "vsphere.cpu.capacity.contention.avg",
"value": 2.05,
"tags": [
"vcenter_server:FAKE",
"vsphere_cluster:datastore-cluster",
"vsphere_datacenter:Datacenter2",
"vsphere_folder:Datacenters",
"vsphere_folder:host",
"vsphere_type:cluster"
]
},
{
"name": "vsphere.datacenter.count",
"value": 1.0,
Expand Down
Loading