Pod and container level io stats via cgroups #35218

RainofTerra · 2024-09-16T18:48:39Z

Component(s)

receiver/hostmetrics, receiver/kubeletstats

Is your feature request related to a problem? Please describe.

In the past we have used something like telegraf with an iostats plugin to monitor system-wide I/O statistics (IOPS, throughput, etc.) on servers running high I/O services (like our internal datastore, or Kafka). In Kubernetes (we're using EKS) that data is available at the various cgroup levels with io.stat.
Pod level:

[root@ip-1-2-3-4 kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice]# pwd
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice
[root@ip-1-2-3-4 kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice]# cat io.stat
259:0 rbytes=158363648 wbytes=0 rios=8977 wios=0 dbytes=0 dios=0

Container level:

[root@ip-1-2-3-4 cri-containerd-a796269837fbc314e36e5d0b1997e558548c68bfc3c6819fec00d49abb9b4d90.scope]# pwd
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice/cri-containerd-a796269837fbc314e36e5d0b1997e558548c68bfc3c6819fec00d49abb9b4d90.scope
[root@ip-1-2-3-4 cri-containerd-a796269837fbc314e36e5d0b1997e558548c68bfc3c6819fec00d49abb9b4d90.scope]# cat io.stat
259:0 rbytes=135905280 wbytes=0 rios=7120 wios=0 dbytes=0 dios=0

Describe the solution you'd like

It would be useful to be able to take something like system.disk.operations and group it by pod name and container name. Currently we can only get it for the overall node. This would let us do things like monitor the iO of individual containers (we have both a reader and a writer container, we'd like to be able to see their IO separately).

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-16T18:48:53Z

Pinging code owners:

receiver/hostmetrics: @dmitryax @braydonk
receiver/kubeletstats: @dmitryax @TylerHelmuth @ChrsMark

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ChrsMark · 2024-09-18T09:58:24Z

It would be useful to be able to take something like system.disk.operations and group it by pod name and container name. Currently we can only get it for the overall node. This would let us do things like monitor the iO of individual containers (we have both a reader and a writer container, we'd like to be able to see their IO separately).

If I understand this correctly the proposal is to emit a metric called system.disk.operations with proper container and k8s metadata as attributes?

My concern here is that we should first come up with a valid data model. At the moment the system.* namespace is supposed to be used for metrics that are related to a system/host/vm etc as a whole. Then we have process.* namespace for per process metrics. So in that case I assume we should emit per container/pod metrics, right?

On another note, I wonder if this metric can come directly by scraping the cadvisor's prometheus endpoint: https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics. In that case that would be already possible by using the prometheus receiver?

github-actions · 2024-11-18T03:37:55Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/hostmetrics: @dmitryax @braydonk
receiver/kubeletstats: @dmitryax @TylerHelmuth @ChrsMark

See Adding Labels via Comments if you do not have permissions to add labels yourself.

RainofTerra added enhancement New feature or request needs triage New item requiring triage labels Sep 16, 2024

github-actions bot added the receiver/hostmetrics label Sep 16, 2024

github-actions bot added the receiver/kubeletstats label Sep 16, 2024

TylerHelmuth added help wanted Extra attention is needed and removed needs triage New item requiring triage labels Sep 16, 2024

github-actions bot mentioned this issue Sep 17, 2024

Weekly Report: 2024-09-10 - 2024-09-17 #35228

Closed

github-actions bot added the Stale label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod and container level io stats via cgroups #35218

Pod and container level io stats via cgroups #35218

RainofTerra commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

ChrsMark commented Sep 18, 2024

github-actions bot commented Nov 18, 2024