This repository has been archived by the owner on May 25, 2023. It is now read-only.
[preemption/reclaim] preemption/reclaim not work properly when there is gang job. #446
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/scheduling
Categorizes an issue or PR as relevant to SIG Scheduling.
Milestone
Is this a BUG REPORT or FEATURE REQUEST?:
What happened:
preemption/reclaim not work properly.
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
ENV: 60 cores
===>>> From above, we can see job1 was preempted by job2, 30 cores free-ed by job1, but job2 can not go due to minMember restriction. The expected behavior is: job should not preempt other job if it can not go after preemption
wait for a while, after 30 pod completed, worse conditons happens. After 30 pods of job1 completed, the released cores should can be re-used by the other 30 pods, but actually, only 1 pod continue run. Wired.
>>> the left 30 cores either can be used by job2, or job3. Resource is idle there.
Anything else we need to know?:
Environment:
kubectl version
): v1.11.3uname -a
):The text was updated successfully, but these errors were encountered: