Support consumers to access the metrics through Kubernetes metrics API #279

ihac · 2019-05-20T10:43:30Z

Based on the proposal you shared recently, I'd like to share a typical use case in Alibaba. Hopefully it will help to perfect your design.

We all agree that reporting metrics in NPD has become an inelastic demand for production use. NPD should be able to transform the node status into metrics and expose them to Prometheus or any other external monitoring systems. What's more, we expect to do more things in the upper layer based on the metrics. One typical scenario is that when something wrong happens on a node, we want to remove the node from Kubernetes cluster and then repair it with our internal IDC management system. To solve this kind of problem, we'd like to extend the NPD to collect and expose custom metrics, and also develop a specific-to-NPD implementation of Kubernetes metrics API client (named npd-metrics-adapter) which allows consumers (e.g. remedy system) to access the metrics through Kubernetes apiserver.

One thing to note though, sometimes we want to report custom metrics which describe a Kubernetes object (not only node), so a higher demand will be placed on the abstraction of underlying data model in NPD. I'm not sure whether your design is able to cover our need.

In fact, we've already implemented a simple but workable version of extended-NPD and also the npd-metrics-adapter. Our design is in close agreement with your proposal: We proposed an abstract data model in NPD to cover all kinds of node statuses (events, conditions and simple metrics) and a unified interface exporter to adapt to multiple different upstream systems (e.g. Kubernetes, Prometheus). But I think you've done a remarkable job by utilizing OpenCensus lib to support flexible metrics collection and aggregation. Actually it is apparently like the ability that we want to allow users to collect and export arbitrary metrics with NPD, so that our upstream systems (e.g. HPA controller, remedy system) will be able to accomplish more complex tasks.

FYI, the figure below demonstrates how our system works based on the extended NPD and npd-metrics-adapter. We would be greatly appreciated if you could kindly give us some feedback.

The text was updated successfully, but these errors were encountered:

xueweiz · 2019-05-20T18:44:15Z

@ihac Hi An, thanks for reviewing our proposal! I noted down some thoughts below.

(One thing I should mention first, is that my experience is mostly on node OS side (Container-Optimized OS). Lantao has much more experience with Kubernetes ecosystem.

we'd like to extend the NPD to collect and expose custom metrics, and also develop a specific-to-NPD implementation of Kubernetes metrics API client (named npd-metrics-adapter) which allows consumers (e.g. remedy system) to access the metrics through Kubernetes apiserver.

Seems that npd-metrics-adapter sits on a different node with NPD. I wonder why not let NPD directly report to apiserver using metrics API? That means less dependency of NPD and less single-point-of-failure.

I'm happy to see that the metrics exported by NPD can be consumed by apiserver as well. I totally agree that there could/should be an exporter, that can export NPD metrics to apiserver somehow. That means even for users that do not have third-party metrics-based monitoring solutions (Stackdriver, Datadog, etc), they can still have metrics-based monitoring support from Kubernetes.

I'm not sure what's the right way to implement that. But if we can find a good way, I think this kind of exporter will be valuable.

One thing to note though, sometimes we want to report custom metrics which describe a Kubernetes object (not only node), so a higher demand will be placed on the abstraction of underlying data model in NPD.

I'm not sure if this is right purpose of NPD. NPD is designed to monitor only node health for many reasons. One of them I really care about is, allowing NPD to monitor other stuff (e.g. pods) may make NPD too big and complex, which means it's harder to guarantee the resource consumption and reliability of it.

Can you describe what kind of object do you have in mind? In my opinion:

If it's "disk", "GPU", then we may want to cover them (although I doubt it that this requires special changes to NPD core data representation).
If it's "pod", then probably no.

I also would like to hear opinion from current maintainers: @dchen1107 @andyxning @Random-Liu

we've already implemented a simple but workable version of extended-NPD and also the npd-metrics-adapter.

If the code is open sourced, could you share the extended-NPD code? I'd like to learn from the the abstract data model and the unified exporter interface. They may help me improve my current PR #275 . Thanks!

ihac · 2019-05-21T15:27:33Z

Hi @xueweiz Thanks for the feedback.

Seems that npd-metrics-adapter sits on a different node with NPD. I wonder why not let NPD directly report to apiserver using metrics API? That means less dependency of NPD and less single-point-of-failure.

We're using npd-metrics-adapter to store metrics in memory, to save apiserver from handling such a heavy traffic. I believe that's also the reason why Metrics Server and Microsoft Azure Adapter are developed.

I totally agree that there could/should be an exporter, that can export NPD metrics to apiserver somehow. That means even for users that do not have third-party metrics-based monitoring solutions (Stackdriver, Datadog, etc), they can still have metrics-based monitoring support from Kubernetes.

Couldn't agree more.

Can you describe what kind of object do you have in mind? In my opinion:

Sorry for not being clear. We agree NPD specializes in detecting and reporting the node problems and we're not going to handle metrics of pod with NPD. However, some problems are related to the pod/container running on the node, and we want to expose some basic information of pod/container if possible. Of course, the problem itself still lies within the area of node ecosystem, but it'd be better we could attach other information to it.

If the code is open sourced, could you share the extended-NPD code? I'd like to learn from the the abstract data model and the unified exporter interface. They may help me improve my current PR #275 . Thanks!

Currently it is for internal use and we're still trying to improve it (actually, we discussed briefly about it with @Random-Liu a few days ago). Anyway, we do have a plan to share it soon (hopefully it will help), along with the npd-metrics-adapter.

CodeJuan · 2019-05-22T01:55:07Z

I'm not sure if this is right purpose of NPD. NPD is designed to monitor only node health for many reasons. One of them I really care about is, allowing NPD to monitor other stuff (e.g. pods) may make NPD too big and complex, which means it's harder to guarantee the resource consumption and reliability of it.

@xueweiz Thank you for your prompt reply, I totally agree that the goal of NPD is detecting the problem of node.
There is a case, I found the container end of veth pair leak to host caused by a kernel bug. So if NPD detect it, and report the name of Pod to upstream layer system would be more helpful.

xueweiz · 2019-05-23T04:18:24Z

@ihac

However, some problems are related to the pod/container running on the node, and we want to expose some basic information of pod/container if possible. Of course, the problem itself still lies within the area of node ecosystem, but it'd be better we could attach other information to it.

I see. Thanks for the explanation! I think NPD has the capability to support this.
For example, I'm attaching the disk device name (e.g. "/dev/sda1") as a metrics label for three metrics (disk/avg_queue_len, disk/weighted_io and disk/io_time), under PR #275

Currently it is for internal use and we're still trying to improve it...we do have a plan to share it soon (hopefully it will help), along with the npd-metrics-adapter.

That would be great :) Thanks a lot!

And just for the reference. In my new proposal, I plan to allow flexible problem daemon registration in NPD. See the "More Pluggable Problem Daemon" section.

So in case your code has private API that is not suitable for open source NPD project, there is still an option to maintain a private pluggable problem daemon in a downstream-NPD while keeping rebasing pretty easy.

xueweiz · 2019-05-23T04:28:40Z

@CodeJuan
Hi Juan,

There is a case, I found the container end of veth pair leak to host caused by a kernel bug. So if NPD detect it, and report the name of Pod to upstream layer system would be more helpful.

Sounds fair. In today's NPD model, I guess you would report an event, and provide the Pod name in the message field.

I don't know whether you want to report metrics on this problem as well? If so, how do you want the metric to look like?

In my mind, the metric could look like this:
problem_counter{"problem": "veth-pair-leak"} 17

Or it could look like this:
problem_counter{"problem": "veth-pair-leak", "pod-name": "hello-pod"} 3
problem_counter{"problem": "veth-pair-leak", "pod-name": "good-bye-pod"} 14

By the way, just curious, could you share some context about this "container end of veth pair leak to host caused by a kernel bug"? Thanks! For example, how do you detect it (by watching kernel log? docker log?)? Is it temporary or permanent?

CodeJuan · 2019-05-23T06:34:51Z

Hi xuewei,

By the way, just curious, could you share some context about this "container end of veth pair leak to host caused by a kernel bug"? Thanks! For example, how do you detect it (by watching kernel log? docker log?)? Is it temporary or permanent?

I'm sorry for my uncorrect description. There are two exceptions.
I don't know how to describe it with my poor English. Just show some pseudocode.

leaked veth

# iterate all running container
for c in containers
    # get all host end veth, append to veths_in_use

# get all veth which attached to bridge
for v in `brctl show | grep veth`
    # append to veths_host    

# compare two arrays
# if [veth existed in veths_host] and [not existed in veths_in_use], it should be a leaked veth.

unattached veth

# got all veth
veths=`ip a | grep veth`

for v in $veths; do
    if [[ `brctl show | grep $v | wc -l` == 0 ]]
        # unattached veth
    fi

In my mind, the metric could look like this:
problem_counter{"problem": "veth-pair-leak"} 17

Or it could look like this:
problem_counter{"problem": "veth-pair-leak", "pod-name": "hello-pod"} 3
problem_counter{"problem": "veth-pair-leak", "pod-name": "good-bye-pod"} 14

The second type is more clearly. Then remedy system would re-attach veth to container or migrate container by the pod-name. But I'm not sure counter is appropriate type to represent problem, how about gauge? The defination of gauge by Prometheus is a metric that represents a single numerical value that can arbitrarily go up and down. If the problem was detected, the metric should be 1, If the problem was remedied by the remedy system, the metric should be 0.

xueweiz · 2019-05-23T07:09:51Z

But I'm not sure counter is appropriate type to represent problem, how about gauge?

We do plan to support Gauge typed metrics. See the "Extending Today’s Problem Daemon to Report Metrics" section of the proposal.

Specifically:

And we plan to support some more customized behaviors using the config file. For example:

Report the metric as gauge, rather than counter.

Allow user to change the metric name.

...

CodeJuan · 2019-05-23T07:16:53Z

@xueweiz
Aha, it's a bit difficult to me to access the google docs right now. I would read the proposal again later.

xueweiz · 2019-05-23T07:28:48Z

Oh didn't realize that :)
Attached the current version of the proposal as pdf in this comment. Hopefully that can help.

If you have any ideas, it's best if you could comment on the Google Doc. So that everyone can participate in the discussion.

Node-Problem-Detector_ Metrics Exporting Option (EXTERNAL).pdf

wangzhen127 · 2019-05-31T17:37:46Z

/cc @wangzhen127

fejta-bot · 2019-08-29T17:43:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-09-28T18:27:06Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-10-28T19:11:28Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-10-28T19:11:36Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 28, 2019

k8s-ci-robot closed this as completed Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support consumers to access the metrics through Kubernetes metrics API #279

Support consumers to access the metrics through Kubernetes metrics API #279

ihac commented May 20, 2019 •

edited

Loading

xueweiz commented May 20, 2019

ihac commented May 21, 2019

CodeJuan commented May 22, 2019

xueweiz commented May 23, 2019 •

edited

Loading

xueweiz commented May 23, 2019

CodeJuan commented May 23, 2019 •

edited

Loading

xueweiz commented May 23, 2019

CodeJuan commented May 23, 2019

xueweiz commented May 23, 2019

wangzhen127 commented May 31, 2019

fejta-bot commented Aug 29, 2019

fejta-bot commented Sep 28, 2019

fejta-bot commented Oct 28, 2019

k8s-ci-robot commented Oct 28, 2019

Support consumers to access the metrics through Kubernetes metrics API #279

Support consumers to access the metrics through Kubernetes metrics API #279

Comments

ihac commented May 20, 2019 • edited Loading

xueweiz commented May 20, 2019

ihac commented May 21, 2019

CodeJuan commented May 22, 2019

xueweiz commented May 23, 2019 • edited Loading

xueweiz commented May 23, 2019

CodeJuan commented May 23, 2019 • edited Loading

xueweiz commented May 23, 2019

CodeJuan commented May 23, 2019

xueweiz commented May 23, 2019

wangzhen127 commented May 31, 2019

fejta-bot commented Aug 29, 2019

fejta-bot commented Sep 28, 2019

fejta-bot commented Oct 28, 2019

k8s-ci-robot commented Oct 28, 2019

ihac commented May 20, 2019 •

edited

Loading

xueweiz commented May 23, 2019 •

edited

Loading

CodeJuan commented May 23, 2019 •

edited

Loading