-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes: Add CPU and memory capacity reporting #2935
Conversation
FYI I'll update the tests. |
Update: tests are ready, please review. I am not familiar with your build system but I don't think the Travis CI failure is related to my change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, just one comment on metric names
|
||
tags = instance.get('tags', []) | ||
self.publish_gauge(self, NAMESPACE + '.cpu.capacity', float(num_cores), tags) | ||
self.publish_gauge(self, NAMESPACE + '.memory.capacity', float(memory_capacity), tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add node.
prefix to those metrics, so that we have kubernetes.node.*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @masci . Are you sure about this? The current name goes nicely with the other metrics:
kubernetes.cpu.limits
kubernetes.cpu.requests
kubernetes.cpu.capacity
kubernetes.memory.limits
kubernetes.memory.requests
kubernetes.memory.capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markine you're right, thanks for pointing out, I underestimated the number of metrics regarding nodes we already have. Hope we'll be able to refactor the naming scheme eventually, for now let's keep those as they are in your code!
Thank you @masci |
* Add patch from DataDog#2908 DataDog#2908 to beter handle units. * Port change from DataDog/dd-agent DataDog#2766 * Move machine info URL management into kubeutil * Update kubernetes tests for capacity data.
What does this PR do?
Add CPU and memory capacity reporting to the Kubernetes check.
This is a rebase of #2766 with a part of the code moved to KubeUtil.
Motivation
Monitoring of cluster resources from a scheduling/allocation perspective is important. E.g. over-allocation of CPU resources to deployments may cause pod scheduling failures even if actual CPU usage in the cluster is low. We want to get ahead of these errors by monitoring capacity data in DataDog.
Testing