Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve metrics independently of manager.Start #336

Closed
lilic opened this issue Feb 25, 2019 · 8 comments
Closed

Serve metrics independently of manager.Start #336

lilic opened this issue Feb 25, 2019 · 8 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@lilic
Copy link

lilic commented Feb 25, 2019

Currently metrics are only started to be served when manager.Start() is called. In operator-sdk we have a use case as we implemented our own leader election, so we block Start() until that pod becomes a leader. Due to this problem we are not getting metrics for all the pods. We do want to still like to collect metrics for all the running pods even if the manager is not started.

Proposed solution:

Add MetricsServingDisabled option to ovveride serving of metrics. Expose the serveMetrics

func (cm *controllerManager) serveMetrics(stop <-chan struct{}) {
that can be used by users who want to start serving metrics independently of the Start() method.

@lilic
Copy link
Author

lilic commented Feb 25, 2019

Opened #337 to demonstrate the above proposed solution.

@DirectXMan12
Copy link
Contributor

For reference, any particular reason why you implemented your own leader election vs CR's leader election? If there's something it doesn't do, that'd be good to know :-).

If you want to disable metrics serving, you should just be able to do MetricsPort: "0", though.

@lilic
Copy link
Author

lilic commented Feb 27, 2019

For reference, any particular reason why you implemented your own leader election vs CR's leader election? If there's something it doesn't do, that'd be good to know :-).

cc @shawn-hurley think he knows more about the reason behind that.

If you want to disable metrics serving, you should just be able to do MetricsPort: "0", though.

We want to use metrics serving from the controller-runtime, but want to start it decoupled from starting the whole thing.

@shawn-hurley
Copy link

For reference, any particular reason why you implemented your own leader election vs CR's leader election? If there's something it doesn't do, that'd be good to know :-).

We implemented what we are calling a "Leader For Life" approach. This uses the built-in GC of k8s so that when the leader dies it will remove the lock and then the next leader can be selected by grabbing the lock.

The use case that we're trying to solve IIRC is:

  1. A Leased approach cannot guarantee that two controllers are acting for some (admittedly small) amount of time (from my understanding).
  2. A simple way to see which pod is the leader by exposing the lock as a k8s resource that is easily understandable (points to the pod that is the leader).
  3. We were ok with some amount of downtime in certain scenarios that a leased approach would recover more gracefully.

@mhrivnak is this your understanding as well?

@mhrivnak
Copy link
Contributor

mhrivnak commented Mar 5, 2019

That's all correct. I wrote up a summary and basic implementation here: https://github.com/mhrivnak/leaderelection

I've been meaning to propose it for controller-runtime, but just haven't gotten around to it. So here we are! The implementation changed slightly when I added it to operator-sdk, but it's fundamentally the same.

From the README, I'll just point out this excerpt, which I think highlights the key benefits: "There is no possibility for multiple pods to think they are the leader at the same time. The leader does not need to renew a lease, consider stepping down, or do anything related to election activity once it becomes the leader."

@lilic
Copy link
Author

lilic commented Apr 8, 2019

As suggested in the #367 (comment) PR comment we want to instead use the .Runnable approach.

@DirectXMan12 you mentioned you want to have metrics enabled by default, so how do we want to disable them then. Any prefrence on the approach there?

@DirectXMan12
Copy link
Contributor

Set MetricsListenAddress to 0 on the mnnager, and then just pass the desired actual listen address directly to the runnable.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants