Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for basic system metrics for Windows. #554

Merged
merged 1 commit into from
May 10, 2021
Merged

Add support for basic system metrics for Windows. #554

merged 1 commit into from
May 10, 2021

Conversation

jeremyje
Copy link
Contributor

@jeremyje jeremyje commented May 8, 2021

This change introduces a system stats monitor support for Windows, #461. Given that node-problem-detector uses gopsutil there's a bunch of metrics that can be forwarded without any code changes. Other parts of NPD require access to /proc and friends which there's no Windows equivalent. Those will need to be accessed through WMI at a later time but for now some were backfilled using gopsutil for Windows.

This change introduces a config/windows-system-stats-monitor.json configuration that can be used to expose the system metrics without errors or warnings. It is a subset of functionality that's exposed in Linux.

Lastly, this change adds Windows support for GetOSVersion() as that's required for running the system metrics handler. Each change also includes a smoke test.

With this change the /metrics endpoint looks like this.

# HELP cpu_usage_time CPU usage, in seconds
# TYPE cpu_usage_time counter
cpu_usage_time{state="guest"} 0
cpu_usage_time{state="guest_nice"} 0
cpu_usage_time{state="idle"} 302076.5625
cpu_usage_time{state="iowait"} 0
cpu_usage_time{state="irq"} 0
cpu_usage_time{state="nice"} 0
cpu_usage_time{state="softirq"} 0
cpu_usage_time{state="steal"} 0
cpu_usage_time{state="system"} 18818.75
cpu_usage_time{state="user"} 34785.9375
# HELP disk_avg_queue_len The average queue length on the disk
# TYPE disk_avg_queue_len gauge
disk_avg_queue_len{device_name="C:"} 0
# HELP disk_bytes_used Disk bytes used, in Bytes
# TYPE disk_bytes_used gauge
disk_bytes_used{device_name="C:",fs_type="NTFS",mount_option="rw.compress",state="free"} 9.1750907904e+10
disk_bytes_used{device_name="C:",fs_type="NTFS",mount_option="rw.compress",state="used"} 1.55016192e+10
# HELP disk_io_time The IO time spent on the disk, in ms
# TYPE disk_io_time counter
disk_io_time{device_name="C:"} 0
# HELP disk_merged_operation_count Disk merged operations count
# TYPE disk_merged_operation_count counter
disk_merged_operation_count{device_name="C:",direction="read"} 0
disk_merged_operation_count{device_name="C:",direction="write"} 0
# HELP disk_operation_bytes_count Bytes transferred in disk operations
# TYPE disk_operation_bytes_count counter
disk_operation_bytes_count{device_name="C:",direction="read"} 0
disk_operation_bytes_count{device_name="C:",direction="write"} 0
# HELP disk_operation_count Disk operations count
# TYPE disk_operation_count counter
disk_operation_count{device_name="C:",direction="read"} 0
disk_operation_count{device_name="C:",direction="write"} 0
# HELP disk_operation_time Time spent in disk operations, in ms
# TYPE disk_operation_time counter
disk_operation_time{device_name="C:",direction="read"} 0
disk_operation_time{device_name="C:",direction="write"} 0
# HELP disk_weighted_io The weighted IO on the disk, in ms
# TYPE disk_weighted_io counter
disk_weighted_io{device_name="C:"} 0
# HELP host_uptime The uptime of the operating system
# TYPE host_uptime gauge
host_uptime{kernel_version="10.0.17763 Build 17763",os_version="windows 10.0.17763.1697 (Windows Server 2019 Datacenter)"} 889
# HELP memory_bytes_used Memory usage by each memory state, in Bytes. Summing values of all states yields the total memory on the node.
# TYPE memory_bytes_used gauge
memory_bytes_used{state="free"} 1.5503225520128e+13
memory_bytes_used{state="used"} 2.0876099584e+12
# HELP problem_counter Number of times a specific type of problem have occurred.
# TYPE problem_counter counter
problem_counter{reason="ContainerCreationFailed"} 0
problem_counter{reason="CorruptContainerImageLayer"} 0
problem_counter{reason="DockerUnhealthy"} 0
problem_counter{reason="KubeletUnhealthy"} 0
# HELP problem_gauge Whether a specific type of problem is affecting the node or not.
# TYPE problem_gauge gauge
problem_gauge{reason="ContainerRuntimeIsHealthy",type="ContainerRuntimeUnhealthy"} 0
problem_gauge{reason="DockerUnhealthy",type="ContainerRuntimeUnhealthy"} 0
problem_gauge{reason="KubeletIsHealthy",type="KubeletUnhealthy"} 0
problem_gauge{reason="KubeletUnhealthy",type="KubeletUnhealthy"} 0

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2021
@jeremyje
Copy link
Contributor Author

jeremyje commented May 8, 2021

/cc @mcshooter

@k8s-ci-robot
Copy link
Contributor

@jeremyje: GitHub didn't allow me to request PR reviews from the following users: mcshooter.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mcshooter

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 8, 2021
README.md Show resolved Hide resolved
@Random-Liu Random-Liu self-assigned this May 10, 2021
Copy link
Member

@Random-Liu Random-Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/util/helpers_windows.go Outdated Show resolved Hide resolved
pkg/util/helpers_windows.go Outdated Show resolved Hide resolved
@jeremyje
Copy link
Contributor Author

Update https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemstatsmonitor about the windows support?

Added some details about Windows support for system metrics to the README.md.

@jeremyje
Copy link
Contributor Author

/retest

@Random-Liu
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jeremyje, Random-Liu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2021
@k8s-ci-robot k8s-ci-robot merged commit 228f0f5 into kubernetes:master May 10, 2021
@jeremyje
Copy link
Contributor Author

/sig windows
/sig node

@k8s-ci-robot k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants