Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/kubeletstats] Add new percent-based cpu and memory metrics #25835

Conversation

TylerHelmuth
Copy link
Member

@TylerHelmuth TylerHelmuth commented Aug 15, 2023

Description:
This PR adds 4 new metrics to represent pod and container cpu and memory consumption as a percentage of the set limits. The metric is only emitted for pods/containers that have defined resource limits. It takes advantage of the pod metadata to acquire the container limits. A pod limit is computed as the sum of all the container limits and if any container limit is zero or undefined the pod limit is also considered undefined.

Link to tracking Issue:

Closes #24905

Testing:

Added unit tests. Tested locally using the otel demo:

Screenshot 2023-08-16 at 2 25 43 PM

@TylerHelmuth
Copy link
Member Author

@dmitryax a pretty significant amount of these changes are generated files and tests. If it is too large I believe I could breakout the kubelet/metadata.go changes into their own PR.

@@ -185,6 +185,13 @@ metrics:
gauge:
value_type: double
attributes: [ ]
k8s.pod.cpu.usagePercent:
enabled: false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable these by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the naming. I don't believe we have camelCase recommended anywhere in sem conv. The spec also recommends using [0,1] ratio instead. I was thinking of making it configurable generally in the metrics builder tho. e.g via scale option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean update mdatagen to allow the cpu.utilization and memory.usage to switch back and forth between the raw value and percentage value? Id prefer multiple metrics as it is useful to know both the exact value and how that compares to the limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to any name changes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I agree using feature gates and waiting for the spec finalization is important here. @mx-psi has the topic of utilization vs usage come up in the system metrics conversations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax since this is not a breaking change for memory, how about I close this PR and open one only to add memory.utilization? Then I'll open another for CPU with feature gates and we can have the hard discussion about the impacts and breaking changes there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the utilization metrics will require additional k8s API calls, we should keep them optional going forward along with cpu.utilization. So we can probably start by disabling them before applying the feature gate. Changing optional metrics is less disruptive.

If it makes sense to you, the first PR would be to add an if_enabled_not_set warning to the cpu utilization metrics with a message like "This metric will be disabled soon. After that, it'll be changed to report the ratio of usage/limit applied via a feature gate. Use container.cpu.usage metric instead."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please take a look at the metrics dockerstats metrics. We need to be consistent with them

@dmitryax
Copy link
Member

Sorry, I missed #24905 when it was submitted. Let me think more about it and reply there

@TylerHelmuth TylerHelmuth marked this pull request as ready for review August 16, 2023 20:26
@TylerHelmuth TylerHelmuth requested a review from a team August 16, 2023 20:26
@TylerHelmuth TylerHelmuth force-pushed the kubeletstats-percentage-metrics branch from adae910 to 47e4652 Compare August 17, 2023 14:46
@TylerHelmuth
Copy link
Member Author

Closing this based on our conversation. Will open more PRs to work towards proper names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants