KEP update: Allow replacement pods in groups of pods #1338

alculquicondor · 2023-11-16T20:46:44Z

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #976

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Change-Id: I894785325d44ff3cc3287f9717e96add499a2b48

k8s-ci-robot · 2023-11-16T20:46:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2023-11-16T20:46:51Z

cc @achernevskii

netlify · 2023-11-16T20:46:52Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`054a843`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/656510676749ff0008bc3cb8

Change-Id: I5b07d1ad71c99a23bdb612f1662df7dc8b991bec

Change-Id: Ie0281d6ebd6a022abc85ae245a22b3282dd23881

keps/976-plain-pods/README.md

alculquicondor · 2023-11-17T17:27:44Z

/hold

I'm rethinking whether all Pods owning the Workload is the best idea.

ahaysx · 2023-11-18T03:18:23Z

keps/976-plain-pods/README.md

+
+### Retrying Failed Pods
+
+The Pod group will generally only be considered finished if all the Pods finish


What about pod groups that do not replace failed pods? We have this use case where a user does not allow retries (and I imagine is a common batch case). So a pod group would be finished if all pods have exited, regardless of success or failure.

I think one way to do this is for every pod in the group to have the kueue.x-k8s.io/last-in-group: true annotation, if "group finished" does not mean the Workload is cleaned up or other running pods in the group are affected.

This seems weird given the name, but would this be the recommendation? Is it worth some kind of alternative annotation configuring this, like pod-group-mode: Batch or pod-group-retry: false?

Just having one pod with last-in-group has the semantics you want: The pod group will be considered Finished when there are no more Running Pods and there is at least one pod with the annotation.

But I agree that the name is maybe not the best for this use case. I'm thinking of alternatives.

Definitely pod-group-mode: Batch is not accurate, as batch doesn't imply that retries are not possible.

pod-group-retry doesn't fit the semantics I originally wanted that well.

I decided to go with retriable-in-group: false, similar to your proposal.
But I also added another mechanism for termination: just delete the Workload.

Change-Id: Ia36c32fdf6862881fe0002d3b565c11da5978a54

keps/976-plain-pods/README.md

achernevskii · 2023-11-20T17:49:45Z

keps/976-plain-pods/README.md

+Note that we are only removing Pod finalizers once the Workload is finished or if the Pods are
+Failed. This is a simple way of managing finalizers, but it might lead to too many Pods lingering


Suggested change

Note that we are only removing Pod finalizers once the Workload is finished or if the Pods are

Failed. This is a simple way of managing finalizers, but it might lead to too many Pods lingering

Note that we are only removing Pod finalizers once the Workload is finished or if the Pods are

Failed and replaced. This is a simple way of managing finalizers, but it might lead to too many Pods lingering

Oops, I actually wanted to say that we remove finalizers only when the workload is finished.

Change-Id: If5cf5dc60dfbf1f7afb45beee974e92bffa979b8

Change-Id: I34fbfdac3e056f559167795938c3c2cab1f41977

alculquicondor · 2023-11-22T15:28:31Z

@tenzen-y @ahaysx any additional feedback?

alculquicondor · 2023-11-22T15:28:45Z

oh, and @nstogner

tenzen-y · 2023-11-23T19:02:58Z

/assign

ahaysx · 2023-11-23T22:47:41Z

lgtm, thanks Aldo

tenzen-y

I'm wondering if it is possible to configurable conditions to say that the job failed.
In a group of pods, users may want to mark the job as a failure if the driver pod fails to start.

However, I'm not sure if it would be worth it. Maybe we should suggest users migrate to batch/job or custom job.

keps/976-plain-pods/README.md

tenzen-y · 2023-11-24T04:02:10Z

keps/976-plain-pods/README.md

+Note that fields like `env` and `command` can sometimes change among all the pods of a group and
+they don't influence scheduling, so they are safe to skip. `volumes` can influence scheduling, but
+they can be parameterized, like in StatefulSets, so we will ignore them for now.


Currently, we compare the entire containers to verify if the existing workload matches the desired workload:

kueue/pkg/util/equality/podset.go

Lines 29 to 34 in 1ecd79f

func comparePodTemplate(a, b *corev1.PodSpec) bool {

if !equality.Semantic.DeepEqual(a.InitContainers, b.InitContainers) {

return false

}

return equality.Semantic.DeepEqual(a.Containers, b.Containers)

}

So, I'm wondering if we should update envs and commands. WDYT?

We could, but I'm not sure about the usefulness.

In Pod groups, it is useful because each Pod might have a slightly different spec. But in Jobs, there is only one template.

But that is a separate discussion nevertheless.

It sounds reasonable. I may be able to have discussions about this.
Thanks.

tenzen-y · 2023-11-24T04:13:55Z

keps/976-plain-pods/README.md

+Workload. If there is an existing Workload in the cache and it has smaller Pod counters than the
+in-memory Workload, then it is considered unmatching and the Workload is evicted.


What happens if a cached existing Workload has larger than pod counters than an in-memory Workload?
Will The reconciler evict the Workload the same as smaller case?

We only evict if the counters in the Workload are smaller, not larger.

It makes sense. Thanks.

keps/976-plain-pods/README.md

Change-Id: I971e0e87fb09ad9fba8a3805a61d2cc267b29bda

alculquicondor · 2023-11-27T21:55:59Z

/hold cancel

keps/976-plain-pods/README.md

alculquicondor · 2023-11-29T15:46:06Z

@tenzen-y anything to add before merging?

tenzen-y

Thanks!
/lgtm

k8s-ci-robot · 2023-11-29T21:05:11Z

LGTM label has been added.

Git tree hash: d235a6b17f4db2dfeab923b528fcb2a1a63778d8

…#1338) * KEP: Simpler algorithm for admitting groups of Pods Change-Id: I894785325d44ff3cc3287f9717e96add499a2b48 * Clarify that workload is automatically cleaned up Change-Id: I5b07d1ad71c99a23bdb612f1662df7dc8b991bec * Last attempt annotation and reclaimable quota Change-Id: Ie0281d6ebd6a022abc85ae245a22b3282dd23881 * Simplify design Change-Id: Ia36c32fdf6862881fe0002d3b565c11da5978a54 * fix note about failed pods Change-Id: If5cf5dc60dfbf1f7afb45beee974e92bffa979b8 * Clarify that some fields will be excluded Change-Id: I34fbfdac3e056f559167795938c3c2cab1f41977 * Add dynamic reclaim for non retriable groups Change-Id: I971e0e87fb09ad9fba8a3805a61d2cc267b29bda

KEP: Simpler algorithm for admitting groups of Pods

6dccd99

Change-Id: I894785325d44ff3cc3287f9717e96add499a2b48

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/documentation Categorizes issue or PR as related to documentation. labels Nov 16, 2023

k8s-ci-robot requested review from mimowo and trasc November 16, 2023 20:46

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 16, 2023

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 16, 2023

Clarify that workload is automatically cleaned up

634e43d

Change-Id: I5b07d1ad71c99a23bdb612f1662df7dc8b991bec

alculquicondor changed the title ~~WIP KEP: Allow replacement pods in groups of pods~~ KEP update: Allow replacement pods in groups of pods Nov 16, 2023

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 16, 2023

alculquicondor force-pushed the pod-group-failures branch from 4913c48 to 88aa7d5 Compare November 16, 2023 21:13

Last attempt annotation and reclaimable quota

f1b5f78

Change-Id: Ie0281d6ebd6a022abc85ae245a22b3282dd23881

alculquicondor force-pushed the pod-group-failures branch from 88aa7d5 to f1b5f78 Compare November 16, 2023 21:17

tenzen-y reviewed Nov 16, 2023

View reviewed changes

keps/976-plain-pods/README.md Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2023

ahaysx reviewed Nov 18, 2023

View reviewed changes

Simplify design

414f071

Change-Id: Ia36c32fdf6862881fe0002d3b565c11da5978a54

achernevskii reviewed Nov 20, 2023

View reviewed changes

fix note about failed pods

e0a3af3

Change-Id: If5cf5dc60dfbf1f7afb45beee974e92bffa979b8

alculquicondor force-pushed the pod-group-failures branch from c0f87ba to e0a3af3 Compare November 20, 2023 18:08

Clarify that some fields will be excluded

1cc9f41

Change-Id: I34fbfdac3e056f559167795938c3c2cab1f41977

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 21, 2023

k8s-ci-robot assigned tenzen-y Nov 23, 2023

tenzen-y reviewed Nov 24, 2023

View reviewed changes

Add dynamic reclaim for non retriable groups

054a843

Change-Id: I971e0e87fb09ad9fba8a3805a61d2cc267b29bda

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 27, 2023

ahaysx reviewed Nov 28, 2023

View reviewed changes

keps/976-plain-pods/README.md Show resolved Hide resolved

tenzen-y reviewed Nov 29, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2023

k8s-ci-robot merged commit 8431cbd into kubernetes-sigs:main Nov 29, 2023
3 checks passed

k8s-ci-robot added this to the v0.6 milestone Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP update: Allow replacement pods in groups of pods #1338

KEP update: Allow replacement pods in groups of pods #1338

alculquicondor commented Nov 16, 2023

k8s-ci-robot commented Nov 16, 2023

alculquicondor commented Nov 16, 2023

netlify bot commented Nov 16, 2023 •

edited

Loading

alculquicondor commented Nov 17, 2023

ahaysx Nov 18, 2023 •

edited

Loading

alculquicondor Nov 20, 2023

alculquicondor Nov 20, 2023

achernevskii Nov 20, 2023

alculquicondor Nov 20, 2023

alculquicondor commented Nov 22, 2023

alculquicondor commented Nov 22, 2023

tenzen-y commented Nov 23, 2023

ahaysx commented Nov 23, 2023

tenzen-y left a comment

tenzen-y Nov 24, 2023

alculquicondor Nov 29, 2023

tenzen-y Nov 29, 2023

tenzen-y Nov 24, 2023

alculquicondor Nov 29, 2023

tenzen-y Nov 29, 2023

alculquicondor commented Nov 27, 2023

alculquicondor commented Nov 29, 2023

tenzen-y left a comment

k8s-ci-robot commented Nov 29, 2023


		### Retrying Failed Pods

		The Pod group will generally only be considered finished if all the Pods finish

		Note that we are only removing Pod finalizers once the Workload is finished or if the Pods are
		Failed. This is a simple way of managing finalizers, but it might lead to too many Pods lingering

	func comparePodTemplate(a, b *corev1.PodSpec) bool {
	if !equality.Semantic.DeepEqual(a.InitContainers, b.InitContainers) {
	return false
	}
	return equality.Semantic.DeepEqual(a.Containers, b.Containers)
	}

		Workload. If there is an existing Workload in the cache and it has smaller Pod counters than the
		in-memory Workload, then it is considered unmatching and the Workload is evicted.

KEP update: Allow replacement pods in groups of pods #1338

KEP update: Allow replacement pods in groups of pods #1338

Conversation

alculquicondor commented Nov 16, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Nov 16, 2023

alculquicondor commented Nov 16, 2023

netlify bot commented Nov 16, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

alculquicondor commented Nov 17, 2023

ahaysx Nov 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Nov 22, 2023

alculquicondor commented Nov 22, 2023

tenzen-y commented Nov 23, 2023

ahaysx commented Nov 23, 2023

tenzen-y left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Nov 27, 2023

alculquicondor commented Nov 29, 2023

tenzen-y left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 29, 2023

netlify bot commented Nov 16, 2023 •

edited

Loading

ahaysx Nov 18, 2023 •

edited

Loading