-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When preempt, the released resources cannot meet the scheduling needs #1309
Comments
cc @B1F030 can you help to take a look? |
Could you please provide yaml of ResourceFlavor, ClusterQueue, and Job? |
gpu-a-cq clusterqueue: : spec:
gpu-b-cq clusterqueue: : spec:
resourceflavor: apiVersion: kueue.x-k8s.io/v1beta1
resource pool has tow 4 GPUS nodes,i submit 2 GPUs * 1 pod job to gpu-a-cq, however the resources released after preemption are one machine has 0.5 GPUs and the other has 1.5 GPUs, can not meet the job scheduling needs |
Could you also paste the Job yaml too? |
ok, just sample pod yaml, resource is: kueue preemption strategy is to preempt those with lower priorities first, and those with the same priority that have a shorter startup time are preempted first. |
Since your ResourceFlavor has nil spec, I recommend to use default-flavor. Would you like to try these yamls below, and see if that problem happens again?
|
i had solved this problem by adding node resource scheduling in our internal environment,can i solve this problem ? gratefully @kerthcet |
Does the node resource scheduling means the nodeResourceFit plugin in kube-scheduler? |
yes,we can add this plugin to solve the problem of the released resources cannot meet the scheduling needs after preempted |
Generally, kueue can not solve this problem as kueue and kube-scheduler are two different components, they can't aware of each other, so let's close this so far. Thanks for your feedbacks. And |
@kerthcet: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
I create two clusterqueue and set reclaimWithinCohort = Any, when preempt, the released resources cannot meet the scheduling needs.
For example, when i submit one (1 * 2GPUs) task,I want this task to get 2 GPUs on the same machine,however, the resources released after preemption are one machine has 0.5 GPUs and the other has 1.5 GPUs
What you expected to happen:
I want this task to get 2 GPUs on the same machine,
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):git describe --tags --dirty --always
): v 0.4.1cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: