Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Add component node-problem-detector #1384

Merged
merged 6 commits into from
Feb 25, 2021

Conversation

knrt10
Copy link
Member

@knrt10 knrt10 commented Feb 18, 2021

This adds component node-problem-detector. For more information
refer to https://github.com/kubernetes/node-problem-detector.

closes: #39
Signed-off-by: knrt10 [email protected]

TODO

  • Add tests
  • Add component to CI

How to use

Pass the following in your cluster config. To use service_monitor feature, make sure Prometheus Operator component is installed

component "node-problem-detector" {
  service_monitor = true
}

@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch 2 times, most recently from e6fceea to 5e6659e Compare February 19, 2021 05:41
@knrt10 knrt10 changed the title [WIP] Add component node-problem-detector Add component node-problem-detector Feb 19, 2021
@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from 5e6659e to ecb0fff Compare February 19, 2021 05:43
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, otherwise LGTM.

I wonder if we could have some e2e test to ensure that this detector is doing it's thing and has privileges to modify Node object etc.

pkg/components/node-problem-detector/component.go Outdated Show resolved Hide resolved
pkg/components/node-problem-detector/component.go Outdated Show resolved Hide resolved
pkg/components/node-problem-detector/template.go Outdated Show resolved Hide resolved
docs/concepts/components.md Outdated Show resolved Hide resolved
@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch 2 times, most recently from 4840333 to 48184bf Compare February 22, 2021 06:31
@knrt10
Copy link
Member Author

knrt10 commented Feb 22, 2021

Updated the PR, writing e2e test, to test the working of component

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we could squash almost everything into single commit. Maybe we could keep asset import in a separate one, as it's a lot of files.

@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from 48184bf to 88efcb9 Compare February 23, 2021 04:32
@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch 2 times, most recently from 8206269 to 3cf6629 Compare February 23, 2021 09:12
invidian
invidian previously approved these changes Feb 23, 2021
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we have a way of testing custom monitors? Did you test it @knrt10 ?

Copy link
Member

@surajssd surajssd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this integrating with the prometheus stack?

I am trying to understand once this is installed now what? How does the user get any alerts?

@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from 3cf6629 to 7a4854c Compare February 23, 2021 11:25
@knrt10
Copy link
Member Author

knrt10 commented Feb 23, 2021

@surajssd the component has Prometheus exporter. I will test it with our Prometheus component and update the PR

@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch 2 times, most recently from 10611af to 4ff9aff Compare February 23, 2021 12:34
@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from 4ff9aff to 7fce619 Compare February 23, 2021 16:25
@knrt10
Copy link
Member Author

knrt10 commented Feb 23, 2021

@surajssd @invidian I have added a new knob service_monitor. I have tested the working of metrics, it works fine. So the use of components will be like this

  • User installs the component with service_monitor enabled then they can scrape metrics from prometheus

Screenshot 2021-02-23 at 9 56 27 PM

@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from 7fce619 to b7e1bcb Compare February 23, 2021 16:44
@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from b7e1bcb to ae9271a Compare February 23, 2021 16:46
@invidian
Copy link
Member

We can test that this metric can be scraped via e2e tests then.

This add monitoring test for node-problem-detector component.

Signed-off-by: knrt10 <[email protected]>
@knrt10 knrt10 force-pushed the knrt10/new-component-node-problem-detector branch from ae9271a to 99d4024 Compare February 24, 2021 07:53
@knrt10 knrt10 requested review from surajssd and invidian February 24, 2021 07:53
@surajssd surajssd added this to the v0.7.0 milestone Feb 24, 2021
Copy link
Member

@surajssd surajssd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :-)

@knrt10 knrt10 merged commit 87e9c0d into master Feb 25, 2021
@knrt10 knrt10 deleted the knrt10/new-component-node-problem-detector branch February 25, 2021 12:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add component node problem detector
3 participants