Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pod groups #1319

Merged
merged 1 commit into from
Nov 24, 2023

Conversation

achernevskii
Copy link
Contributor

@achernevskii achernevskii commented Nov 10, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

  • Introduce new ComposableJob interface for jobs which has to be composed of different API objects.

  • Add ComposableJob implementation for pod groups.

  • Add webhook checks for pod group labels and annotations.

Which issue(s) this PR fixes:

Related issue: #976

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add support for groups of plain Pods.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 10, 2023
Copy link

netlify bot commented Nov 10, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 4f3eb4f
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6561207714dcdb00088ca4f8

@k8s-ci-robot k8s-ci-robot requested a review from mimowo November 10, 2023 03:33
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 10, 2023
@alculquicondor
Copy link
Contributor

cc @trasc

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2023
@trasc
Copy link
Contributor

trasc commented Nov 14, 2023

/test all

@trasc trasc force-pushed the feature/pod_groups branch from d59e250 to 9e63490 Compare November 14, 2023 11:38
@trasc
Copy link
Contributor

trasc commented Nov 14, 2023

/test all

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 14, 2023
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/interface.go Show resolved Hide resolved
pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
}); err != nil {
return err
}

return jobframework.SetupWorkloadOwnerIndex(ctx, indexer, gvk)
}

func (p *Pod) Finalize(ctx context.Context, c client.Client) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finalizers are tricky.

What sometimes happens is that you can enter this "Finalize" function, but not all the pods are there, and we are left with a Pod with a stuck finalizer.

But if we see this happens in integration tests, we can fix in a follow up.

pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Show resolved Hide resolved
pkg/controller/jobs/pod/pod_webhook.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_webhook.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2023
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved

var resultPodSets []kueue.PodSet

for _, podInGroup := range podsInGroup.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to ignore the Pods with phase Failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can't fully ignore the Pods with phase Failed, otherwise we would think that the Workload object no longer matches.

One possible workaround is that we ignore the Failed objects, but we can change the logic for equivalentToWorkload a little, to allow for number of pods lower than the saved counter in the Workload object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this behavior in 09c09ce

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 17, 2023
@achernevskii achernevskii changed the title Draft: Add support for pod groups Add support for pod groups Nov 17, 2023
@achernevskii achernevskii marked this pull request as ready for review November 17, 2023 04:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2023
@k8s-ci-robot k8s-ci-robot requested a review from trasc November 17, 2023 04:45
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 17, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2023
}

if wl != nil && p.isGroup {
if evCond := apimeta.FindStatusCondition(wl.Status.Conditions, kueue.WorkloadEvicted); evCond != nil && evCond.Status == metav1.ConditionTrue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check for WorkloadEvicted inside Stop? Wouldn't Stop only be called when the workload is evicted or unadmitted?

Copy link
Contributor Author

@achernevskii achernevskii Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of those cases, we should issue Pod deletes, if the Pods don't already have DeletionTimestamps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what we're doing. On the other hand, if workload is evicted and all pods are stopped, or at least one pod is running, we don't stop the group.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the logic is slightly off:

  • If the wl is deleted or it doesn't exist: issue deletes for everything, if we didn't already.
  • otherwise (eviction): delete anything that isn't suspended. There is no need to return an error if there is a running Pod, as we won't remove the reservation yet
    if !job.IsActive() {

return true, nil
}
} else {
if p.isStopped() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't remove the Workload or Pod finalizers even if all the pods have deletion timestamp.

Otherwise the Workload would be removed and it wouldn't be possible to send replacement pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finalizers are removed here if workload is nil and all the pods have deletion timestamp. There's nothing to replace if workload has been already removed.

But this code needs a change anyway. FindMatchingWorkloads could return some workloads to delete, in this case we shouldn't finalize the pods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. However, it looks like we are duplicating code. We already finalize the job in the following scenarios:

So it looks like this logic is unnecessary here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is there for the case, when wl is deleted.

First reconcile will hit this branch, stop the job, finalize the wl and return:

if wl != nil && !wl.DeletionTimestamp.IsZero() {

Second reconcile will finalize the pods after ComposableJob.Load call.

We could finalize the pods on the first reconcile, but I don't think it's right from the interface perspective. Stop != Delete for the generic job. It's only true for pods. That's why the logic to finalize the pods if wl is nil is in ComposableJob.Load and not in the generic reconciler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could finalize the pods on the first reconcile.

Please let me know what do you think about this idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, for the case when the workload is deleted. However, the case when the Workload is finished is already covered.

Then, we need a single line:

return wl != nil && p.isStopped(), nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And accompany it with a comment about why we need to finalize in that case.

},
workloadCmpOpts: defaultWorkloadCmpOpts,
},
"workload is not deleted if one of the pods in the finished group is deleted": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ALL of the pods are deleted (have a finalizer and a deletionTimestamp)

wantWorkloads: []kueue.Workload{},
workloadCmpOpts: defaultWorkloadCmpOpts,
deleteWorkloads: true,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the added tests are not accurate. Left some comments.

Comment on lines 632 to 635
gomega.Consistently(func(g gomega.Gomega) {
g.Expect(k8sClient.Get(ctx, pod1LookupKey, createdPod1)).To(gomega.Succeed())
g.Expect(k8sClient.Get(ctx, pod2LookupKey, createdPod2)).To(gomega.Succeed())
}, util.ConsistentDuration, util.Interval).Should(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this. But we might want to check that the Pods get a DeletionTimestamp.

ginkgo.By("creating the replacement pod and readmitting the workload will unsuspended the replacement", func() {
gomega.Expect(k8sClient.Create(ctx, replacementPod)).Should(gomega.Succeed())

gomega.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).To(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth checking that this is the same Workload that was initially created. The UID should match

pkg/controller/jobs/pod/pod_controller.go Outdated Show resolved Hide resolved
}

if wl != nil && p.isGroup {
if evCond := apimeta.FindStatusCondition(wl.Status.Conditions, kueue.WorkloadEvicted); evCond != nil && evCond.Status == metav1.ConditionTrue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the logic is slightly off:

  • If the wl is deleted or it doesn't exist: issue deletes for everything, if we didn't already.
  • otherwise (eviction): delete anything that isn't suspended. There is no need to return an error if there is a running Pod, as we won't remove the reservation yet
    if !job.IsActive() {

pkg/controller/jobframework/reconciler.go Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

/approve

Ready for squash :)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achernevskii, alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 24, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achernevskii, alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alculquicondor
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9ddd6799a46ce66840939683fe2b40f79063d2ec

* Introduce new ComposableJob interface for jobs which
  has to be composed of different API objects.

* Add custom get. A composable job can get all it's elements at the
  beginning of the reconcile.

* Add ComposableJob implementation for pod groups.

* Add webhook checks for pod group labels and
  annotations.

* Update Finished method for pod group

* IsSuspended and Stop methods of the pod controller now
  interact with all the pods at once.

* Update IsActive function to check if at least one pod in
  the group is running.

* Change podSuspended method.

* Add stop skip for pods in group that already have a
  delition timestamp.

* Add IsComposableJobActive

* Add UnretryableError error, that doesn't require reconcile
  retry.

* Add ValidateLabelAsCRDName call for the pod-group,
  make pod-group label immutable.

* Add unit tests for pod group integration
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2023
@alculquicondor
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d7e27626543786f8d09179a504306d418cb4d139

@k8s-ci-robot k8s-ci-robot merged commit 879d798 into kubernetes-sigs:main Nov 24, 2023
3 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Nov 24, 2023
@trasc trasc deleted the feature/pod_groups branch March 12, 2024 08:55
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
* Introduce new ComposableJob interface for jobs which
  has to be composed of different API objects.

* Add custom get. A composable job can get all it's elements at the
  beginning of the reconcile.

* Add ComposableJob implementation for pod groups.

* Add webhook checks for pod group labels and
  annotations.

* Update Finished method for pod group

* IsSuspended and Stop methods of the pod controller now
  interact with all the pods at once.

* Update IsActive function to check if at least one pod in
  the group is running.

* Change podSuspended method.

* Add stop skip for pods in group that already have a
  delition timestamp.

* Add IsComposableJobActive

* Add UnretryableError error, that doesn't require reconcile
  retry.

* Add ValidateLabelAsCRDName call for the pod-group,
  make pod-group label immutable.

* Add unit tests for pod group integration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants