-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/kubeletstats] Add pod and container state metrics #29157
Conversation
* `k8s.pod.state` * `k8s.container.state` * `k8s.container.last_termination_state`
@sirianni are you available to bring this issue/pr up at a SIG meeting? |
Here is the Datadog Agent PR I referenced in today's SIG call that mentions the scalability issues using a cluster-wide collector (i.e. kube-state-metrics) for this data. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Why has this been closed? If there are scaling issues with the current solution, wouldn't it make sense to include it in the kubelet receiver? |
FYI this looks more like resource attributes not metrics and we recently added last terminated state to cluster receiver. See #31282 for the discussion |
I'm not following. Resources are not first-class signals in OTel. They are only relevant when attached to a metric, log, or trace. Without a metric, how do you envision the state change being actually encoded and transmitted? Would it piggyback on an existing metric? How would it be queried? Resources are nothing but metric tags in most vendors data models. This seems to be leaking vendor-specific semantics into the OTel model. Splunk seems to be pushing some odd semantics with |
IMO Resource model is well documented in otel, its not really a signal but the way otel represents entities.
Also Prometheus is working on supporting resource model https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/#native-support-for-resource-attributes |
Yes, I am aware of that. But can you answer my specific questions? OTel does not specify a mechanism to track, encode, and export "state changes" to resource attributes as first-class things. |
I dont follow. So the resource attributes would change, we transmit them with logs, traces, metrics. |
But what metric would you transmit it with? All of them? And how would this be queried? In most backends you can't query "resource attributes" as first class things. Resource attributes are simply extra tags on metrics. This is what I meant by "piggybacking of an existing metric". |
Description
Add the following state metrics to
kubeletstats
receiver:k8s.pod.state
k8s.container.state
k8s.container.last_termination_state
Note
There is potential overlap here with the following metrics from
k8sclusterreceiver
k8s.pod.phase
k8s.pod.status_reason
k8s.container.ready
The above metrics don't work well for our use case for two main reasons:
k8sclusterreciver
singleton. Collecting these metrics directly from thekubelet
via a daemonset OTel Collector deployment pattern scales nicely with the size of the k8s cluster.Testing
A new unit test was added using the
pods.json
test fixture.This was also tested manually using the
debug
exporter.k8s.container.state
metrick8s.pod.state
metricMy company is also running this code in production. Here are some screenshots showing how we use it in the Datadog UI
Documentation
New metrics are added automatically to
documentation.md