Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod and container level io stats via cgroups #35218

Open
RainofTerra opened this issue Sep 16, 2024 · 3 comments
Open

Pod and container level io stats via cgroups #35218

RainofTerra opened this issue Sep 16, 2024 · 3 comments
Labels

Comments

@RainofTerra
Copy link

Component(s)

receiver/hostmetrics, receiver/kubeletstats

Is your feature request related to a problem? Please describe.

In the past we have used something like telegraf with an iostats plugin to monitor system-wide I/O statistics (IOPS, throughput, etc.) on servers running high I/O services (like our internal datastore, or Kafka). In Kubernetes (we're using EKS) that data is available at the various cgroup levels with io.stat.
Pod level:

[root@ip-1-2-3-4 kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice]# pwd
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice
[root@ip-1-2-3-4 kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice]# cat io.stat
259:0 rbytes=158363648 wbytes=0 rios=8977 wios=0 dbytes=0 dios=0

Container level:

[root@ip-1-2-3-4 cri-containerd-a796269837fbc314e36e5d0b1997e558548c68bfc3c6819fec00d49abb9b4d90.scope]# pwd
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podcfff9e92_9e21_41e7_b59e_59dfaeca3c2b.slice/cri-containerd-a796269837fbc314e36e5d0b1997e558548c68bfc3c6819fec00d49abb9b4d90.scope
[root@ip-1-2-3-4 cri-containerd-a796269837fbc314e36e5d0b1997e558548c68bfc3c6819fec00d49abb9b4d90.scope]# cat io.stat
259:0 rbytes=135905280 wbytes=0 rios=7120 wios=0 dbytes=0 dios=0

Describe the solution you'd like

It would be useful to be able to take something like system.disk.operations and group it by pod name and container name. Currently we can only get it for the overall node. This would let us do things like monitor the iO of individual containers (we have both a reader and a writer container, we'd like to be able to see their IO separately).

Describe alternatives you've considered

No response

Additional context

No response

@RainofTerra RainofTerra added enhancement New feature or request needs triage New item requiring triage labels Sep 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth TylerHelmuth added help wanted Extra attention is needed and removed needs triage New item requiring triage labels Sep 16, 2024
@ChrsMark
Copy link
Member

It would be useful to be able to take something like system.disk.operations and group it by pod name and container name. Currently we can only get it for the overall node. This would let us do things like monitor the iO of individual containers (we have both a reader and a writer container, we'd like to be able to see their IO separately).

If I understand this correctly the proposal is to emit a metric called system.disk.operations with proper container and k8s metadata as attributes?

My concern here is that we should first come up with a valid data model. At the moment the system.* namespace is supposed to be used for metrics that are related to a system/host/vm etc as a whole. Then we have process.* namespace for per process metrics. So in that case I assume we should emit per container/pod metrics, right?

On another note, I wonder if this metric can come directly by scraping the cadvisor's prometheus endpoint: https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md#prometheus-container-metrics. In that case that would be already possible by using the prometheus receiver?

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants