-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache assigned pod count #708
base: master
Are you sure you want to change the base?
Conversation
Hi @KunWuLuan. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KunWuLuan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
ed85ccd
to
07beae1
Compare
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help fix the CI failures?
switch t := obj.(type) { | ||
case *corev1.Pod: | ||
pod := t | ||
pgMgr.Unreserve(context.Background(), pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PodDelete event consists of 3 types of events:
- Pod failed
- Pod completed (successfully)
- Pod get deleted
but for completed Pod, we should still count them as part of gang, right? could you also help if integration test covers this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When pod completed, it will be removed from NodeInfo. CalculateAssignedPods will count pods in NodeInfo, so we did not count completed pods previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will see if integration test covers this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we did not count completed pods previously.
True. I'm wondering if we fix this glitch in this PR - in DeleteFunc(), additionally check if the Pod is completed, if so, do NOT invalidate it from the assignedPodsByPG
cache. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed in this issues about whether we should count completed pods.
Is there new situation to count completed pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It seems restart the whole Job is more conventional for now, then let's postpone the idea until new requirement emerges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Signed-off-by: KunWuLuan <[email protected]>
07beae1
to
e728afd
Compare
7814f35
to
bd79ef7
Compare
Signed-off-by: KunWuLuan <[email protected]>
bd79ef7
to
9bfafc9
Compare
dd1b7f3
to
d7b2a45
Compare
@Huang-Wei Hi, I have fix the CI failures. Please have a look when you have time, thanks |
switch t := obj.(type) { | ||
case *corev1.Pod: | ||
pod := t | ||
pgMgr.Unreserve(context.Background(), pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we did not count completed pods previously.
True. I'm wondering if we fix this glitch in this PR - in DeleteFunc(), additionally check if the Pod is completed, if so, do NOT invalidate it from the assignedPodsByPG
cache. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot one thing about the cache's consistency during one scheduling cycle - we will need to:
- snapshot the pg->podNames map at the beginning of the scheduling cycle (PreFilter), so that we can treat it as source of truth during the whole scheduling cycle
- support preemption
- implement the Clone() function
- for each PodAddition dryrun, if the pod is hit, add it
- for each PodDeletion dryrun, if the pod is hit, remove it
We only check the number of pods assigned in Permit, so I think there is no inconsistency during one scheduling cycle. And postFilter will not check Permit plugin, so implementation of PodAddition and PodDeletion will have no effect on preemption, right? What we can do is return framework.Unschedulable if the PodDeletion will make a podgroup rejected, but I think it is not enought for preemption of coscheduling. I think support preemption for coscheduling is complecated, maybe in another issue. |
Yes, the current preemption skeleton code assumes each plugin only use PreFilter to pre-calculate state. But for coscheduling, PreFilter can fail early (upon inadequate quorum). I think scheduler framework should open up a hook for out-of-tree plugin to choose whether or not to run PreFilter as part of the preemption; otherwise, out-of-tree plugin has to rewrite the PostFilter impl. to hack that part.
Let's consolidate all the cases and use a new PR to try to tackle it. Thanks. |
@KunWuLuan are you ok with postpone this PR's merge after I cut release for v0.28, so that we have more time for soak testing. And could you add a release-note to highlight it's a performance enhancement? |
Ok, no problem. |
Ok. I will try to design a preemption framework in postFilter, and if implementation in postFilter is enough, I will create a new pr to track the kep. Otherwise I will try to open a discuss in kubernetes/scheduling-sigs. |
/cc |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR will enhance the speed of the Coscheduling plugin in counting Pods that have already been assumed.
Which issue(s) this PR fixes:
Fix #707
Special notes for your reviewer:
Does this PR introduce a user-facing change?