Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from encouraging Deployment for additional schedulers #12785

Closed
Adirio opened this issue Feb 22, 2019 · 8 comments
Closed

Move away from encouraging Deployment for additional schedulers #12785

Adirio opened this issue Feb 22, 2019 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@Adirio
Copy link

Adirio commented Feb 22, 2019

This is a...
/sig scheduling
/kind feature

Problem:
Currently it is suggested in Tasks/Administer a Cluster/Configure multiple schedulers (source) to deploy schedulers with a Deployment and disabling leader election.

Rolling updates of Deployments may imply that multiple replicas of the scheduler are running concurrently and thus they can interfere each other as leader election was turned off.

Proposed Solution:
StatefulSets or Deployments with Recreate deployment strategy type would solve this issue. I personally prefer a StatefulSet as it:

  1. Doesn't allow to swap the deployment strategy type afterwards.
  2. Avoids creating an additional abstraction layer (ReplicaSet).
  3. Enforces to use persistent volumes in case some volume wants to be used, which I think should be encouraged.

Page to Update:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/#define-a-kubernetes-deployment-for-the-scheduler

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/feature Categorizes issue or PR as related to a new feature. labels Feb 22, 2019
@sftim
Copy link
Contributor

sftim commented Feb 26, 2019

If this guide were to recommend enabling leader election, and explained why, would that make using a Deployment suitable / acceptable?

@Adirio
Copy link
Author

Adirio commented Feb 26, 2019

In my opinion, leader election being false with a single replica is the right approach as the other way round would only mean additional overhead without any gain. In this case I don't see a Deployment fitting.

On the other hand if multiple replicas were to be deployed of the scheduler, leader election should be set to true. Leader election would mean the Recreate strategy argument is no longer valid. The second and third argument are not that important and thus the difference from a StatefulSet and a Deployment would be greatly reduced.

So in a HA scheduler with multiple replicas both StatefulSet and Deployment would be valid while in single instance schedulers without leader election StatefulSet should be prefered. I would also like to mention that for a HA replicated scheduler, a PodAntiAffinity to itself should be set in order to avoid collocating multiple replicas.

@sftim
Copy link
Contributor

sftim commented Feb 26, 2019

It sounds like we could decide between:

  • illustrating using multiple kinds of scheduler, leaving aside topics such as HA so as to keep the page simple
    • and maybe signpost the reader to the HA topic for further reading
  • demonstrating a recommended / good practise approach: multiple replicas for each controller type, leader elections, PodAntiAffinity, etc

and then tweak the document based on which approach we favor. Does that sound reasonable?

@Adirio
Copy link
Author

Adirio commented Feb 26, 2019

I was sticking to the simple approach as that document is usually the starting point to extra-scheduler setups. In the simplest case a single instance with leader election disabled as it is now is enough. But with leader election turned off deployment's default rolling updates are dangerous and thats why I suggested to change to StatefulSet.

We could actually also add a comment at the end mentioning that to achieve scheduler resillience multiple replicas with leader election turned on and pod anti affinity to itself. In this complex case StatefulSets do not offer a lot over Deployemnts but they are also a nice fit so I would not mention switching to Deployment in this case (KIS).

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 26, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

4 participants