-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pod groups #1319
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
cc @trasc |
/test all |
d59e250
to
9e63490
Compare
/test all |
}); err != nil { | ||
return err | ||
} | ||
|
||
return jobframework.SetupWorkloadOwnerIndex(ctx, indexer, gvk) | ||
} | ||
|
||
func (p *Pod) Finalize(ctx context.Context, c client.Client) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finalizers are tricky.
What sometimes happens is that you can enter this "Finalize" function, but not all the pods are there, and we are left with a Pod with a stuck finalizer.
But if we see this happens in integration tests, we can fix in a follow up.
|
||
var resultPodSets []kueue.PodSet | ||
|
||
for _, podInGroup := range podsInGroup.Items { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to ignore the Pods with phase Failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we can't fully ignore the Pods with phase Failed, otherwise we would think that the Workload object no longer matches.
One possible workaround is that we ignore the Failed objects, but we can change the logic for equivalentToWorkload
a little, to allow for number of pods lower than the saved counter in the Workload object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this behavior in 09c09ce
92a15d7
to
676cecf
Compare
0d878c0
to
4c035ab
Compare
} | ||
|
||
if wl != nil && p.isGroup { | ||
if evCond := apimeta.FindStatusCondition(wl.Status.Conditions, kueue.WorkloadEvicted); evCond != nil && evCond.Status == metav1.ConditionTrue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to check for WorkloadEvicted inside Stop? Wouldn't Stop only be called when the workload is evicted or unadmitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop could be called when:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all of those cases, we should issue Pod deletes, if the Pods don't already have DeletionTimestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what we're doing. On the other hand, if workload is evicted and all pods are stopped, or at least one pod is running, we don't stop the group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the logic is slightly off:
- If the wl is deleted or it doesn't exist: issue deletes for everything, if we didn't already.
- otherwise (eviction): delete anything that isn't suspended. There is no need to return an error if there is a running Pod, as we won't remove the reservation yet
if !job.IsActive() {
return true, nil | ||
} | ||
} else { | ||
if p.isStopped() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't remove the Workload or Pod finalizers even if all the pods have deletion timestamp.
Otherwise the Workload would be removed and it wouldn't be possible to send replacement pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The finalizers are removed here if workload is nil and all the pods have deletion timestamp. There's nothing to replace if workload has been already removed.
But this code needs a change anyway. FindMatchingWorkloads
could return some workloads to delete, in this case we shouldn't finalize the pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. However, it looks like we are duplicating code. We already finalize the job in the following scenarios:
- If the workload is finished:
if err := r.finalizeJob(ctx, job); err != nil { - If there is no matching Workload:
if err := r.finalizeJob(ctx, job); err != nil {
So it looks like this logic is unnecessary here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is there for the case, when wl is deleted.
First reconcile will hit this branch, stop the job, finalize the wl and return:
if wl != nil && !wl.DeletionTimestamp.IsZero() { |
Second reconcile will finalize the pods after ComposableJob.Load
call.
We could finalize the pods on the first reconcile, but I don't think it's right from the interface perspective. Stop != Delete for the generic job. It's only true for pods. That's why the logic to finalize the pods if wl is nil is in ComposableJob.Load
and not in the generic reconciler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could finalize the pods on the first reconcile.
Please let me know what do you think about this idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, for the case when the workload is deleted. However, the case when the Workload is finished is already covered.
Then, we need a single line:
return wl != nil && p.isStopped(), nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And accompany it with a comment about why we need to finalize in that case.
}, | ||
workloadCmpOpts: defaultWorkloadCmpOpts, | ||
}, | ||
"workload is not deleted if one of the pods in the finished group is deleted": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ALL of the pods are deleted (have a finalizer and a deletionTimestamp)
wantWorkloads: []kueue.Workload{}, | ||
workloadCmpOpts: defaultWorkloadCmpOpts, | ||
deleteWorkloads: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the added tests are not accurate. Left some comments.
gomega.Consistently(func(g gomega.Gomega) { | ||
g.Expect(k8sClient.Get(ctx, pod1LookupKey, createdPod1)).To(gomega.Succeed()) | ||
g.Expect(k8sClient.Get(ctx, pod2LookupKey, createdPod2)).To(gomega.Succeed()) | ||
}, util.ConsistentDuration, util.Interval).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this. But we might want to check that the Pods get a DeletionTimestamp.
ginkgo.By("creating the replacement pod and readmitting the workload will unsuspended the replacement", func() { | ||
gomega.Expect(k8sClient.Create(ctx, replacementPod)).Should(gomega.Succeed()) | ||
|
||
gomega.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).To(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth checking that this is the same Workload that was initially created. The UID should match
} | ||
|
||
if wl != nil && p.isGroup { | ||
if evCond := apimeta.FindStatusCondition(wl.Status.Conditions, kueue.WorkloadEvicted); evCond != nil && evCond.Status == metav1.ConditionTrue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the logic is slightly off:
- If the wl is deleted or it doesn't exist: issue deletes for everything, if we didn't already.
- otherwise (eviction): delete anything that isn't suspended. There is no need to return an error if there is a running Pod, as we won't remove the reservation yet
if !job.IsActive() {
/approve Ready for squash :) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achernevskii, alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achernevskii, alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ca3354b
to
c264f4c
Compare
/lgtm |
LGTM label has been added. Git tree hash: 9ddd6799a46ce66840939683fe2b40f79063d2ec
|
* Introduce new ComposableJob interface for jobs which has to be composed of different API objects. * Add custom get. A composable job can get all it's elements at the beginning of the reconcile. * Add ComposableJob implementation for pod groups. * Add webhook checks for pod group labels and annotations. * Update Finished method for pod group * IsSuspended and Stop methods of the pod controller now interact with all the pods at once. * Update IsActive function to check if at least one pod in the group is running. * Change podSuspended method. * Add stop skip for pods in group that already have a delition timestamp. * Add IsComposableJobActive * Add UnretryableError error, that doesn't require reconcile retry. * Add ValidateLabelAsCRDName call for the pod-group, make pod-group label immutable. * Add unit tests for pod group integration
c264f4c
to
4f3eb4f
Compare
/lgtm |
LGTM label has been added. Git tree hash: d7e27626543786f8d09179a504306d418cb4d139
|
* Introduce new ComposableJob interface for jobs which has to be composed of different API objects. * Add custom get. A composable job can get all it's elements at the beginning of the reconcile. * Add ComposableJob implementation for pod groups. * Add webhook checks for pod group labels and annotations. * Update Finished method for pod group * IsSuspended and Stop methods of the pod controller now interact with all the pods at once. * Update IsActive function to check if at least one pod in the group is running. * Change podSuspended method. * Add stop skip for pods in group that already have a delition timestamp. * Add IsComposableJobActive * Add UnretryableError error, that doesn't require reconcile retry. * Add ValidateLabelAsCRDName call for the pod-group, make pod-group label immutable. * Add unit tests for pod group integration
What type of PR is this?
/kind feature
What this PR does / why we need it:
Introduce new ComposableJob interface for jobs which has to be composed of different API objects.
Add ComposableJob implementation for pod groups.
Add webhook checks for pod group labels and annotations.
Which issue(s) this PR fixes:
Related issue: #976
Special notes for your reviewer:
Does this PR introduce a user-facing change?