-
Notifications
You must be signed in to change notification settings - Fork 264
The design doc of PodGroup Phase in Status. #533
The design doc of PodGroup Phase in Status. #533
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: k82cn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@k82cn: GitHub didn't allow me to request PR reviews from the following users: bsalamat, Zyqsempai, JeffWan, MaciekPytel. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
doc/design/podgroup-status.md
Outdated
* there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes. | ||
|
||
For first scenario, the Cluster-Autoscaler need to check PodGroup's `status.State` to know whether there're | ||
sufficient resources to run the whole group of pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how CA works. It doesn't compute required resources at all.
Very roughly the way it works is:
0. (Import scheduler predicates from k/k repo)
- If there are pending pods (pods wit PodScheduled condition set to false with reason given as unschedulable):
1.1 Create an in-memory node object that represents a new node that would result from scale-up.
1.2 Run imported scheduler predicates on pending pods to "schedule" them on in-memory node.
1.3 If some pods remain pending goto 1.1.
This way autoscaler automatically takes into account all scheduling constraints: resources, taints, affinity, storage, etc. Unfortunately it also means it's fundamentally incompatible with any form of custom scheduling. If you want CA to work with kube-batch you need to either encapsulate kube-batch logic in a form of predicates or else duplicate it in CA somehow. It certainly won't be a simple fix.
It would get even worse for things like gang scheduling. You can swap predicates for something different, but ultimately CA expects a black box that takes (pod, node) pair and tells whether the pod could be scheduled on a given node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if new node can not be scheduled because of customized plugin? Will it continue to create new nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking whether CA can keep adding new nodes (with some rate & restriction) if PodGroup
is pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CA will check what default scheduler would do with the pending pods. If CA thinks the pod can be scheduled on existing nodes it will not add new nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be hard to work on plugins that can be compatible with CA in the future. In other thread, you mentioned make pod unschedulable is just tip of an iceber. Do you think we can extract some common criterions in CA for other custom scheduler to meet? For example, podCondition is just one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
en. I'm thinking to make CA support as out-of-scope for now; as CA need to work together with default scheduler. We can consider to enhance that in later release.
Signed-off-by: Da K. Ma <[email protected]>
/lgtm |
Added PodGroup Phase in Status.
Signed-off-by: Da K. Ma [email protected]
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Part of #521
Release note: