-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/kubeletstats] Add new percent-based cpu and memory metrics #25835
[receiver/kubeletstats] Add new percent-based cpu and memory metrics #25835
Conversation
@dmitryax a pretty significant amount of these changes are generated files and tests. If it is too large I believe I could breakout the |
@@ -185,6 +185,13 @@ metrics: | |||
gauge: | |||
value_type: double | |||
attributes: [ ] | |||
k8s.pod.cpu.usagePercent: | |||
enabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we enable these by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New metrics should be added as optional initially. See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/scraping-receivers.md#changing-the-emitted-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the naming. I don't believe we have camelCase
recommended anywhere in sem conv. The spec also recommends using [0,1] ratio instead. I was thinking of making it configurable generally in the metrics builder tho. e.g via scale
option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean update mdatagen to allow the cpu.utilization and memory.usage to switch back and forth between the raw value and percentage value? Id prefer multiple metrics as it is useful to know both the exact value and how that compares to the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to any name changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya I agree using feature gates and waiting for the spec finalization is important here. @mx-psi has the topic of utilization vs usage come up in the system metrics conversations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitryax since this is not a breaking change for memory, how about I close this PR and open one only to add memory.utilization? Then I'll open another for CPU with feature gates and we can have the hard discussion about the impacts and breaking changes there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the utilization
metrics will require additional k8s API calls, we should keep them optional going forward along with cpu.utilization
. So we can probably start by disabling them before applying the feature gate. Changing optional metrics is less disruptive.
If it makes sense to you, the first PR would be to add an if_enabled_not_set
warning to the cpu utilization metrics with a message like "This metric will be disabled soon. After that, it'll be changed to report the ratio of usage/limit applied via a feature gate. Use container.cpu.usage metric instead."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please take a look at the metrics dockerstats metrics. We need to be consistent with them
Sorry, I missed #24905 when it was submitted. Let me think more about it and reply there |
adae910
to
47e4652
Compare
Closing this based on our conversation. Will open more PRs to work towards proper names. |
Description:
This PR adds 4 new metrics to represent pod and container cpu and memory consumption as a percentage of the set limits. The metric is only emitted for pods/containers that have defined resource limits. It takes advantage of the pod metadata to acquire the container limits. A pod limit is computed as the sum of all the container limits and if any container limit is zero or undefined the pod limit is also considered undefined.
Link to tracking Issue:
Closes #24905
Testing:
Added unit tests. Tested locally using the otel demo: