-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[integrations-core][kubernetes_state] Metrics update #853
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good first iteration, please also update the metadata.csv
kubernetes_state/check.py
Outdated
@@ -68,6 +72,8 @@ def __init__(self, name, init_config, agentConfig, instances=None): | |||
'kube_pod_container_status_running': 'container.running', | |||
'kube_pod_container_status_terminated': 'container.terminated', | |||
'kube_pod_container_status_waiting': 'container.waiting', | |||
'kube_pod_container_resource_requests_nvidia_gpu_devices': 'container.gpu.resource_request', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just rename them container.gpu.request
/ .limit
like for cpu and memory
@@ -73,9 +82,9 @@ def assertMetricNotAllZeros(self, metric_name): | |||
|
|||
@mock.patch('checks.prometheus_check.PrometheusCheck.poll') | |||
def test__update_kube_state_metrics(self, mock_poll): | |||
f_name = os.path.join(os.path.dirname(__file__), 'ci', 'fixtures', 'prometheus', 'protobuf.bin') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the protobuf payload if unused, we can still retrieve it from the git history
kubernetes_state/check.py
Outdated
tags = [self._format_tag(label.name, label.value) for label in metric.label] | ||
self.gauge(metric_name, 0, tags) | ||
|
||
def kube_cronjob_next_schedule_time(self, message, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great use case, we must test during RC phase that it recovers correctly (one could taint noschedule
all the nodes, make sure the cronjob goes critical, then remove the taint and observe it recovering).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a message in the service check to compute the delay (now - scheduled)
kubernetes_state/check.py
Outdated
@@ -231,6 +254,33 @@ def kube_pod_status_phase(self, message, **kwargs): | |||
status = self.pod_phase_to_status.get(phase, self.UNKNOWN) | |||
self.service_check(check_basename + phase, status, tags=tags) | |||
|
|||
def kube_cronjob_status_last_schedule_time(self, message, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant with kube_cronjob_next_schedule_time
, and the .delay
name is misleading (I'm expecting it to be positive it scheduling is lagging behind). I'd rather remove it unless we have a clear use case for this gauge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I like the recommendation of adding a message in the SC kube_cronjob_next_schedule_time
.
kubernetes_state/check.py
Outdated
@@ -326,6 +376,22 @@ def kube_node_spec_unschedulable(self, message, **kwargs): | |||
else: | |||
self.log.error("Metric type %s unsupported for metric %s" % (message.type, message.name)) | |||
|
|||
def kube_pod_container_status_waiting_reason(self, message, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these two metrics, I'm not sure whether we should tag by podname, or just increment a node counter. @hkaj what's your opinion on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the pod name, node name and reason for this metric. One reason a pod can be waiting is ErrImagePull
, as a user I'd like to know if I fat-fingered the image name in my pod spec and all the pods of a same family are failing because of that.
I would also like to know if a node is making all its containers wait because it can't pull images or its containers are stuck in creation.
Now how we expose this info is up to discussion. I think the metric name should be self.NAMESPACE + '.pod_container_status_waiting.' + reason
. Reasons are finite and hardcoded here: https://github.com/kubernetes/kube-state-metrics/blob/b26cfb26fe5e858f8033a641a321abef25496c62/collectors/pod.go#L38 so it should be fine. If we add pods and nodes as tags we should be able to group/filter across the cluster quite easily. I'm open to other options though @CharlyF & @xvello
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaj I liked the option of not adding reason in the metric name so that one could monitor the metric reason and be alerted if there are more than X pods waiting because of {{reason}} on {{node.name}}.
Less cumbersome to set for a user imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point, then what do you think of removing .reason
from the metric name? pod.containers_waiting
with the reason
tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think reason
should remain in the name to distinguish the metric from
'kube_pod_container_status_waiting': 'container.waiting', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm shouldn't we just add the reason tag to the existing metric then? Does the new one add any value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually get the point - Indeed the .reason will be at one, on the tag representing the reason whereas the regular is at 1 if the container is waiting.
https://github.com/kubernetes/kube-state-metrics/blob/master/collectors/pod_test.go#L284-L292.
Now, I can't just remove kube_pod_container_status_waiting
and I don't know if using kube_pod_container_status_waiting_reason
to override or even replace container.waiting
is a good idea because this metric became available with 1.1.0 so people monitoring the metric already might start missing it. Thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we keep the existing metric, but instead of adding the new one we match its pod with the existing metric and just add the reason as a tag to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best solution I came up with is adding a meta crawl first DataDog/dd-agent#3583
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did I implement it this way:
The prometheus process_metric method goes through all the items in ignored, metrics_mapper and then getattr (i.e. methods in k8s-state check with the metric name as a method name).
While processing the waiting_reason metric, we don't have access to the message of waiting (i.e. pod name).
The approach is: If we see the new waiting_reason, populate the old metric name and add the reason as a tag.
We can't use a boolean because we can't trust the order in which the metrics are coming from prometheus and hence, might miss the _reason and compute the old one before.
The solution:
We can't use a boolean for all the new metrics because the waiting_reason might be exposed but terminated_reason might not and we don't want to miss terminated.
Secondly, from my testings, the decision of whether to compute the new/old metrics or not has to be before processing the metrics in the kube-state check because getattr is non blocking and metrics are missed.
Using a dict of meta metrics will make this re-usable should we want to process several times the prometheus output to associate more metadata with the underlying pods.
cc @xvello and @hkaj
97d75ed
to
da26a02
Compare
check_basename = self.NAMESPACE + '.cronjob.on_schedule_check' | ||
curr_time = int(time.time()) | ||
for metric in message.metric: | ||
on_schedule = int(metric.gauge.value) - curr_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure: there's no TZ issue with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be a TZ issue as we rely on the epoch timestamp
kubernetes_state/check.py
Outdated
@@ -326,6 +376,22 @@ def kube_node_spec_unschedulable(self, message, **kwargs): | |||
else: | |||
self.log.error("Metric type %s unsupported for metric %s" % (message.type, message.name)) | |||
|
|||
def kube_pod_container_status_waiting_reason(self, message, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the pod name, node name and reason for this metric. One reason a pod can be waiting is ErrImagePull
, as a user I'd like to know if I fat-fingered the image name in my pod spec and all the pods of a same family are failing because of that.
I would also like to know if a node is making all its containers wait because it can't pull images or its containers are stuck in creation.
Now how we expose this info is up to discussion. I think the metric name should be self.NAMESPACE + '.pod_container_status_waiting.' + reason
. Reasons are finite and hardcoded here: https://github.com/kubernetes/kube-state-metrics/blob/b26cfb26fe5e858f8033a641a321abef25496c62/collectors/pod.go#L38 so it should be fine. If we add pods and nodes as tags we should be able to group/filter across the cluster quite easily. I'm open to other options though @CharlyF & @xvello
kubernetes_state/check.py
Outdated
def kube_pod_container_status_waiting_reason(self, message, **kwargs): | ||
""" Reason a container is in a waiting state """ | ||
# Capturing the pod name so the metric can be compared to kube_pod_container_status_waiting | ||
metric_name = self.NAMESPACE + '.pod_container_status_waiting.reason' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this metric name, see my comment above
e158e97
to
c53fa39
Compare
Sister PR: DataDog/dd-agent#3583 |
@hkaj I have rolled back my work on adding the reason tag to the legacy metric because of the following.
Run at T1 + 30seconds
As you can see, the .reason will be at 1 but only for a few seconds, but then will be back to 0. Whereas the pod is still waiting so the waiting metric is at 1. I am implementing these back into the check as counts so they can be used along with the regular .waiting metrics. The point being to see across the last hour, how many OOM have we had (with the termination.reason metric) etc: cc: @xvello |
kubernetes_state/metadata.csv
Outdated
@@ -34,6 +36,9 @@ kubernetes_state.limitrange.memory.max_limit_request_ratio,gauge,,,,Maximum memo | |||
kubernetes_state.node.cpu_capacity,gauge,,cpu,,The total CPU resources of the node,0,kubernetes,k8s_state.node.cpu_capacity | |||
kubernetes_state.node.memory_capacity,gauge,,byte,,The total memory resources of the node,0,kubernetes,k8s_state.node.memory_capacity | |||
kubernetes_state.node.pods_capacity,gauge,,,,The total pod resources of the node,0,kubernetes,k8s_state.node.pods_capacity | |||
kubernetes_state.node.gpu.cards_allocatable,gauge,,,The Nvidia GPU resources of a node that are available for scheduling,0,kubernetes,k8s_state.node.gpu.cards_allocatable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the chameleon from our descriptions
@@ -273,6 +297,32 @@ def kube_pod_status_phase(self, message, **kwargs): | |||
status = self.pod_phase_to_status.get(phase, self.UNKNOWN) | |||
self.service_check(check_basename + phase, status, tags=tags) | |||
|
|||
def kube_pod_container_status_waiting_reason(self, message, **kwargs): | |||
metric_name = self.NAMESPACE + '.container.status_report.count.waiting' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name LGTM, need to add it to the metadata though
kubernetes_state/check.py
Outdated
def kube_pod_container_status_waiting_reason(self, message, **kwargs): | ||
metric_name = self.NAMESPACE + '.container.status_report.count.waiting' | ||
for metric in message.metric: | ||
tags = [self._format_tag(label.name, label.value) for label in metric.label] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least remove the container
tag from these two metrics, see https://github.com/kubernetes/kube-state-metrics/blob/master/Documentation/pod-metrics.md
I'm still thinking even pod cardinality is overkill for that metric, I'd remove the pod
label too, especially as we'll be submitting counts even on normal operations (terminated OK, waiting on containercreating), that'll spam a lot of contexts.
@hkaj can you chime in on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename container
in kube_container_name
to be consistent with the kubernetes
integration
Let's not submit the pod tag, and add a //TODO
in agent6, use the tagger to get pod creator tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//todo
should also be here so users know that we purposefully skipped it but it's coming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimizations/nitpick, plus don't forget https://github.com/DataDog/integrations-core/pull/853/files#r151357487
Then we're good to go!
kubernetes_state/check.py
Outdated
@@ -172,6 +173,8 @@ def check(self, instance): | |||
self.job_succeeded_count = defaultdict(int) | |||
self.job_failed_count = defaultdict(int) | |||
|
|||
self.whitelisted_reasons = {"waiting":["ErrImagePull"],"terminated":["OOMKilled","ContainerCannotRun","Error"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They'll be created at every run. you can put them as module constants at the top, like METRIC_TYPES
. You should assign two separate arrays instead of a dict for performance.
kubernetes_state/check.py
Outdated
@@ -91,6 +91,7 @@ def __init__(self, name, init_config, agentConfig, instances=None): | |||
'kube_replicationcontroller_status_replicas': 'replicationcontroller.replicas', | |||
'kube_statefulset_replicas': 'statefulset.replicas_desired', | |||
'kube_statefulset_status_replicas': 'statefulset.replicas', | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove that empty line
kubernetes_state/check.py
Outdated
tags.append(self._format_tag(label.name, label.value)) | ||
else: | ||
reason = False | ||
report = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean skip_metric = True
?
You can also start with skip_metric = True
then put it to false when encountering a valid reason. That way you don't need the else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:chameleon:
What does this PR do?
Adding new metrics from Kubernetes_state.
Motivation
We have received requests from the community.
Testing Guidelines
I will add unitests if the methods to collects data (e.g. terminated/waiting reason) are the way we chose to proceed.
Versioning
manifest.json
CHANGELOG.md
. Please useUnreleased
as the date in the titlefor the new section.