-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TAS: TopologyUngater #3266
TAS: TopologyUngater #3266
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
621c469
to
1671521
Compare
} | ||
|
||
func (h *podHandler) queueReconcileForPod(pod *corev1.Pod, q workqueue.TypedRateLimitingInterface[reconcile.Request]) { | ||
if pod == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if pod == nil { | |
if pod == nil || !utilpod.HasGate(pod, kueuealpha.TopologySchedulingGate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored this a bit:
- passing object instead of pod to the function to commonize between event handlers
- I don't need to check the scheduling gate (it was just a workaround to check "is it a TAS pod")
- proposed a new Label to use here: TAS: introduce dedicated TAS label #3271
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in #3271 (comment), we will go with the TAS label solution during the Alpha phase.
if wl.Status.Admission == nil { | ||
log.Info("workload is not admitted") | ||
return reconcile.Result{}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if wl.Status.Admission == nil { | |
log.Info("workload is not admitted") | |
return reconcile.Result{}, nil | |
} |
Duplicated check with the predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep the duplicate checks to avoid panicing on race conditions:
- the predicates schedule reconcile by workload key , but then the workload is evicted
- the workload is recreated with the same key, but admitted without TAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented above, after refactoring, these verifications seem to be needed.
1671521
to
a7569a3
Compare
eafdc7f
to
3d54a9a
Compare
# Conflicts: # go.mod
3d54a9a
to
c00c9ed
Compare
c00c9ed
to
bff9583
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole mechanism looks excellent. But I left trivial comments.
@mimowo Could you open a follow-up PR?
/lgtm
/approve
|
||
allToUngate := make([]podWithUngateInfo, 0) | ||
for _, psa := range wl.Status.Admission.PodSetAssignments { | ||
if psa.TopologyAssignment != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. After refactoring, podHandler has never been checked if the workload has topologyAssignment.
} | ||
} | ||
var err error | ||
if len(allToUngate) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can consolidate expectations and podWithUngateInfo into the TAS dedicated cache.
And then, we can decouple TASController including reconciler, eventHandler, and Predicator, and TASUngator including this ungating processes.
Becuase ideally, in general, we should not spawn the separate go routine in the reconciler. The separate Go routine could delay the next Reconcile and, the object information in reconcile queue could older.
if wl.Status.Admission == nil { | ||
log.Info("workload is not admitted") | ||
return reconcile.Result{}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented above, after refactoring, these verifications seem to be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing unit testing for the exposed function should be better.
LGTM label has been added. Git tree hash: 2fe42177c76b465861ca7a645f7b794f28f8d6c5
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks, sure. |
Though thinking about it I think the refactoring to split TopologyUngater will not be that trivial imo. Please also note that we already were spawning new goroutines in the pod group integration, and goroutines are also spawned by the k8s code job controller when creating new pods. I agree that ideally we can avoid it, but I would prefer to prioritize the functional follow ups first and e2e tests. |
I agree. I do not claim decoupling should be done before 0.9. |
* TAS: TopologyUngater # Conflicts: # go.mod * review comments
* TAS: TopologyUngater # Conflicts: # go.mod * review comments
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Part of #2724
Special notes for your reviewer:
Done since the initial version:
I have also tested the handlers already in the e2e prototype PR: #3218
Does this PR introduce a user-facing change?