-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1282: Requeue Strategy #1311
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Hi @nstogner. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
keps/1282-requeue-strategy/README.md
Outdated
|
||
Possible settings: | ||
|
||
* `UseEvictionTimestamp` (Back of queue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is't it the other way round? UseEvictionTimestamp -> front of the queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant UseEvictionTimestamp
as the time in which to consider for entry to queue. Because the eviction would have occurred more recently than the initial .metadata.creationTimestamp
, this would put the Workload at the "back of the queue" immediately following eviction. If you read this differently, perhaps it would be good to change the terminology.
/release-note-none |
keps/1282-requeue-strategy/README.md
Outdated
> The case of preemption might be more tricky: we want the preempted jobs to be readmitted as soon as possible. But if a job is waiting for more than one job to be preempted and the queueing strategy is BestEffortFIFO, we don't want the preempted pods to take the head of the queue. | ||
> Maybe we need to hold them until the preemptor job is admitted, and then they should use the regular priority. | ||
|
||
The `pkg/queue` package could have the existing `queueOrderingFunc()` modified to add sorting based on who is the preemptor (might need to add a condition to the Workload to track this). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like my concern was unfounded.
We handle this problem here:
return cq.requeueIfNotPresent(wInfo, reason == RequeueReasonFailedAfterNomination || reason == RequeueReasonPendingPreemption) |
Basically, if the workload has preempted any workloads, it's put back in the queue immediately.
Since nothing changes about it and it will have a higher priority than the preempted workloads, it should be at the head of the queue.
keps/1282-requeue-strategy/README.md
Outdated
# ... | ||
requeueStrategy: | ||
priorityPreemption: UseEvictionTimestamp | ||
podsReadyTimeout: UseCreationTimestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We consider multiple aspects for sorting, the timestamp being one of them.
Having UseCreationTimestamp
as the name of the option seems to imply that we ignore any other aspects.
Maybe it should be something like:
requeueStrategy:
podsReadyTimeout:
timestampSource: Creation | Eviction
That is assuming we are going to offer configuration for every eviction mechanism, otherwise, I would put the setting under the existing waitForPodsReady
:
waitForPodsReady:
requeuingTimestamp: Creation | Eviction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to grasp how varying eviction methods could result in distinct types of queue sorting. Can you share some cases?
keps/1282-requeue-strategy/README.md
Outdated
#### Integration tests | ||
|
||
- Add integration test that matches user story 1. | ||
- Add an integration test to detect if flapping associated with preempted workloads being readmitted before the preemptor workload when `priorityPreemption: UseEvictionTimestamp` is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add an integration test to detect if flapping associated with preempted workloads being readmitted before the preemptor workload when `priorityPreemption: UseEvictionTimestamp` is set. | |
- Add an integration test to detect if flapping associated with preempted workloads being readmitted before the preemptor workload when `requeuingTimestamp: Creation` is set. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated.
keps/1282-requeue-strategy/README.md
Outdated
|
||
## Drawbacks | ||
|
||
When used with `StrictFIFO`, the `requeuingTimestamp: Creation` (front of queue) policy could lead to a blocked queue. This was called out in the issue that set the hardcoded [back-of-queue behavior](https://github.com/kubernetes-sigs/kueue/issues/599). This could be mitigated by recommending administrators select `BestEffortFIFO` when using this setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other drawbacks, for example, if you want to use waitForPodsReady to prevent jobs with invalid images to block the queue, then you would continuosly block the queue.
Until we have a proper mechanism to fail pods in this situation.
kubernetes/kubernetes#122300
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, added an bullet point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: b0a5586f1b484b10a88ae5316c68c95ca66dce39
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, nstogner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@alculquicondor @nstogner I apologize for being too late in my comment. I guess that infinite requeueing could happen when node resources are balanced (which means not bin-packing). |
What would you like to happen when the threshold is reached? In a way, it seems independent from what place in the queue the workload should have after being requeued. |
When we use |
I think it's simpler to abandon the job, but again, it seems like a separate feature. But if you have a proposal based on a user story, maybe you can send an update to this KEP? |
Yes,sure. I can update this KEP. |
@tenzen-y I was talking about this with @mwielgus, and we came to the conclusion that it might be better to introduce a form of backoff: for some period of time, the workload doesn't enter the queue. And the backoff could be exponential. This might be better than reducing priority, as that would make the workload more susceptible to preemptions once admitted. |
@alculquicondor Totally SGTM. Will we work on the backoff mechanism in the first iteration? Also, I still think we should introduce a timeout mechanism, as I said above. For example, we could avoid jobs without proper image credentials staying in the queue forever. WDYT? |
I think it's doable. Let me get back to you tomorrow.
It becomes less important once we have an exponential backoff. Maybe we can add a max number of retries to the backoff, after which the Workload is considered failed. |
That makes sense. I prefer |
@alculquicondor Is there any progress about iterations for the backoff mechanism? |
@nstogner will first work on what is already designed and then he will work on a backoff mechanism. Hopefully we can have both for the release. If you have some time to finalize the design, feel free to open a PR. |
That makes sense. Thanks! |
I will update this KEP for the backoff mechanism. |
* Add kep-1282 * Update toc * Update based on comments * Address comments
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Document how we could add support for configurable queue ordering strategies upon eviction.
Which issue(s) this PR fixes:
Part of #1282
Special notes for your reviewer:
Does this PR introduce a user-facing change?
NONE