Kubernetes: Add CPU and memory capacity reporting #2935

markine · 2016-10-19T16:34:49Z

What does this PR do?

Add CPU and memory capacity reporting to the Kubernetes check.

This is a rebase of #2766 with a part of the code moved to KubeUtil.

Motivation

Monitoring of cluster resources from a scheduling/allocation perspective is important. E.g. over-allocation of CPU resources to deployments may cause pod scheduling failures even if actual CPU usage in the cluster is low. We want to get ahead of these errors by monitoring capacity data in DataDog.

Testing

Unit tests updated.
Live-tested on Kubernetes 1.2.4 in AWS with this diff against dd-agent 5.9.1.

markine · 2016-10-19T16:51:19Z

FYI I'll update the tests.

markine · 2016-10-19T17:57:00Z

Update: tests are ready, please review. I am not familiar with your build system but I don't think the Travis CI failure is related to my change.

masci

This LGTM, just one comment on metric names

masci · 2016-10-21T14:12:09Z

checks.d/kubernetes.py

+
+        tags = instance.get('tags', [])
+        self.publish_gauge(self, NAMESPACE + '.cpu.capacity', float(num_cores), tags)
+        self.publish_gauge(self, NAMESPACE + '.memory.capacity', float(memory_capacity), tags)


Could you add node. prefix to those metrics, so that we have kubernetes.node.*?

Thanks for the review, @masci . Are you sure about this? The current name goes nicely with the other metrics:

kubernetes.cpu.limits
kubernetes.cpu.requests
kubernetes.cpu.capacity
kubernetes.memory.limits
kubernetes.memory.requests
kubernetes.memory.capacity

@markine you're right, thanks for pointing out, I underestimated the number of metrics regarding nodes we already have. Hope we'll be able to refactor the naming scheme eventually, for now let's keep those as they are in your code!

markine · 2016-10-21T20:38:24Z

Thank you @masci

* Add patch from DataDog#2908 DataDog#2908 to beter handle units. * Port change from DataDog/dd-agent DataDog#2766 * Move machine info URL management into kubeutil * Update kubernetes tests for capacity data.

markine added 4 commits October 18, 2016 13:00

Add patch from DataDog#2908 DataDog#2908 to beter handle units.

67726f6

Port change from DataDog/dd-agent DataDog#2766

453a2ac

Merge branch 'master' into feature/capacity-details

fe21097

Move machine info URL management into kubeutil

0cde805

markine mentioned this pull request Oct 19, 2016

RFC: Kubernetes: Add cpu and memory requests, limits, and capacity info. #2766

Closed

Update kubernetes tests for capacity data.

f0b6133

masci self-assigned this Oct 21, 2016

masci suggested changes Oct 21, 2016

View reviewed changes

masci added checks community improvement labels Oct 21, 2016

masci added this to the 5.10.0 milestone Oct 21, 2016

masci approved these changes Oct 21, 2016

View reviewed changes

masci merged commit de381f7 into DataDog:master Oct 21, 2016

markine deleted the feature/capacity-details branch March 22, 2017 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes: Add CPU and memory capacity reporting #2935

Kubernetes: Add CPU and memory capacity reporting #2935

markine commented Oct 19, 2016 •

edited

Loading

markine commented Oct 19, 2016

markine commented Oct 19, 2016

masci left a comment

masci Oct 21, 2016

markine Oct 21, 2016

masci Oct 21, 2016

markine commented Oct 21, 2016

Kubernetes: Add CPU and memory capacity reporting #2935

Kubernetes: Add CPU and memory capacity reporting #2935

Conversation

markine commented Oct 19, 2016 • edited Loading

What does this PR do?

Motivation

Testing

markine commented Oct 19, 2016

markine commented Oct 19, 2016

masci left a comment

Choose a reason for hiding this comment

masci Oct 21, 2016

Choose a reason for hiding this comment

markine Oct 21, 2016

Choose a reason for hiding this comment

masci Oct 21, 2016

Choose a reason for hiding this comment

markine commented Oct 21, 2016

markine commented Oct 19, 2016 •

edited

Loading