Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1282: Requeue Strategy #1311

Merged
merged 4 commits into from
Dec 27, 2023
Merged

Conversation

nstogner
Copy link
Contributor

@nstogner nstogner commented Nov 6, 2023

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Document how we could add support for configurable queue ordering strategies upon eviction.

Which issue(s) this PR fixes:

Part of #1282

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 6, 2023
Copy link

netlify bot commented Nov 6, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit dc9c32e
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65824a42916aaa00087887ac

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 6, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 6, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @nstogner. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 6, 2023
@mimowo
Copy link
Contributor

mimowo commented Nov 15, 2023

/ok-to-test
/assign

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 15, 2023

Possible settings:

* `UseEvictionTimestamp` (Back of queue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is't it the other way round? UseEvictionTimestamp -> front of the queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant UseEvictionTimestamp as the time in which to consider for entry to queue. Because the eviction would have occurred more recently than the initial .metadata.creationTimestamp, this would put the Workload at the "back of the queue" immediately following eviction. If you read this differently, perhaps it would be good to change the terminology.

keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 15, 2023
> The case of preemption might be more tricky: we want the preempted jobs to be readmitted as soon as possible. But if a job is waiting for more than one job to be preempted and the queueing strategy is BestEffortFIFO, we don't want the preempted pods to take the head of the queue.
> Maybe we need to hold them until the preemptor job is admitted, and then they should use the regular priority.

The `pkg/queue` package could have the existing `queueOrderingFunc()` modified to add sorting based on who is the preemptor (might need to add a condition to the Workload to track this).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like my concern was unfounded.
We handle this problem here:

return cq.requeueIfNotPresent(wInfo, reason == RequeueReasonFailedAfterNomination || reason == RequeueReasonPendingPreemption)

Basically, if the workload has preempted any workloads, it's put back in the queue immediately.

Since nothing changes about it and it will have a higher priority than the preempted workloads, it should be at the head of the queue.

keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
# ...
requeueStrategy:
priorityPreemption: UseEvictionTimestamp
podsReadyTimeout: UseCreationTimestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We consider multiple aspects for sorting, the timestamp being one of them.

Having UseCreationTimestamp as the name of the option seems to imply that we ignore any other aspects.

Maybe it should be something like:

requeueStrategy:
  podsReadyTimeout:
    timestampSource: Creation | Eviction

That is assuming we are going to offer configuration for every eviction mechanism, otherwise, I would put the setting under the existing waitForPodsReady:

waitForPodsReady:
  requeuingTimestamp: Creation | Eviction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to grasp how varying eviction methods could result in distinct types of queue sorting. Can you share some cases?

keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
#### Integration tests

- Add integration test that matches user story 1.
- Add an integration test to detect if flapping associated with preempted workloads being readmitted before the preemptor workload when `priorityPreemption: UseEvictionTimestamp` is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add an integration test to detect if flapping associated with preempted workloads being readmitted before the preemptor workload when `priorityPreemption: UseEvictionTimestamp` is set.
- Add an integration test to detect if flapping associated with preempted workloads being readmitted before the preemptor workload when `requeuingTimestamp: Creation` is set.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated.


## Drawbacks

When used with `StrictFIFO`, the `requeuingTimestamp: Creation` (front of queue) policy could lead to a blocked queue. This was called out in the issue that set the hardcoded [back-of-queue behavior](https://github.com/kubernetes-sigs/kueue/issues/599). This could be mitigated by recommending administrators select `BestEffortFIFO` when using this setting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other drawbacks, for example, if you want to use waitForPodsReady to prevent jobs with invalid images to block the queue, then you would continuosly block the queue.
Until we have a proper mechanism to fail pods in this situation.
kubernetes/kubernetes#122300

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, added an bullet point.

keps/1282-requeue-strategy/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 27, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b0a5586f1b484b10a88ae5316c68c95ca66dce39

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, nstogner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 27, 2023
@k8s-ci-robot k8s-ci-robot merged commit 542da4f into kubernetes-sigs:main Dec 27, 2023
13 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Dec 27, 2023
@tenzen-y
Copy link
Member

@alculquicondor @nstogner I apologize for being too late in my comment.
Can we add a knob to set the limit on the number of requeueing?

I guess that infinite requeueing could happen when node resources are balanced (which means not bin-packing).
Not bin-packing means there are enough node resources in the cluster, although there aren't enough node resources in a single node.

@alculquicondor
Copy link
Contributor

What would you like to happen when the threshold is reached? In a way, it seems independent from what place in the queue the workload should have after being requeued.

@tenzen-y
Copy link
Member

What would you like to happen when the threshold is reached? In a way, it seems independent from what place in the queue the workload should have after being requeued.

When we use requeuingTimestamp=Creation, I meant the reached threshold job is 1. evicted and abandoned or 2. put on the tail of the queue.

@alculquicondor
Copy link
Contributor

I think it's simpler to abandon the job, but again, it seems like a separate feature.

But if you have a proposal based on a user story, maybe you can send an update to this KEP?

@tenzen-y
Copy link
Member

I think it's simpler to abandon the job, but again, it seems like a separate feature.

But if you have a proposal based on a user story, maybe you can send an update to this KEP?

Yes,sure. I can update this KEP.

@alculquicondor
Copy link
Contributor

alculquicondor commented Jan 3, 2024

@tenzen-y I was talking about this with @mwielgus, and we came to the conclusion that it might be better to introduce a form of backoff: for some period of time, the workload doesn't enter the queue. And the backoff could be exponential.

This might be better than reducing priority, as that would make the workload more susceptible to preemptions once admitted.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 3, 2024

@tenzen-y I was talking about this with @mwielgus, and we came to the conclusion that it might be better to introduce a form of backoff: for some period of time, the workload doesn't enter the queue. And the backoff could be exponential.

This might be better than reducing priority, as that would make the workload more susceptible to preemptions once admitted.

@alculquicondor Totally SGTM. Will we work on the backoff mechanism in the first iteration?

Also, I still think we should introduce a timeout mechanism, as I said above. For example, we could avoid jobs without proper image credentials staying in the queue forever. WDYT?

@alculquicondor
Copy link
Contributor

Will we work on the backoff mechanism in the first iteration?

I think it's doable. Let me get back to you tomorrow.

Also, I still think we should introduce a timeout mechanism, as I said above. For example, we could avoid jobs without proper image credentials staying in the queue forever. WDYT?

It becomes less important once we have an exponential backoff. Maybe we can add a max number of retries to the backoff, after which the Workload is considered failed.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 3, 2024

It becomes less important once we have an exponential backoff. Maybe we can add a max number of retries to the backoff, after which the Workload is considered failed.

That makes sense. I prefer a max number of retries to the backoff rather than timeout.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 5, 2024

Will we work on the backoff mechanism in the first iteration?

I think it's doable. Let me get back to you tomorrow.

@alculquicondor Is there any progress about iterations for the backoff mechanism?

@alculquicondor
Copy link
Contributor

@nstogner will first work on what is already designed and then he will work on a backoff mechanism. Hopefully we can have both for the release.

If you have some time to finalize the design, feel free to open a PR.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 5, 2024

@nstogner will first work on what is already designed and then he will work on a backoff mechanism. Hopefully we can have both for the release.

If you have some time to finalize the design, feel free to open a PR.

That makes sense. Thanks!

@tenzen-y
Copy link
Member

tenzen-y commented Jan 5, 2024

I will update this KEP for the backoff mechanism.

kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
* Add kep-1282

* Update toc

* Update based on comments

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants