[preemption/reclaim] preemption/reclaim not work properly when there is gang job. #446

runqch · 2018-10-17T07:06:17Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:
preemption/reclaim not work properly.

What you expected to happen:

job should not preempt other job if it still can not run after preemption.
after resource released, it should can be used by others

How to reproduce it (as minimally and precisely as possible):
ENV: 60 cores

submit 1st job to occupy 60 cores, with minMember=1
submit 2nd job, requiring 60 cores, with minMember=60 (gang job)
continue to submit 3rd job, requiring 60 cores, with minMember=1

1st job:

 runqch@ib22b10-534: cat job1.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: qj-1
spec:
  backoffLimit: 60
  completions: 60
  parallelism: 60
  template:
    metadata:
      annotations:
        scheduling.k8s.io/group-name: qj-1
    spec:
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: busybox
        command:
           - sleep
           - "300"
        resources:
          requests:
            cpu: "1"
      restartPolicy: Never
      schedulerName: kube-batch
---
apiVersion: scheduling.incubator.k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: qj-1
spec:
  minMember: 1   

 runqch@ib22b10-547: kubectl create -f ./job1.yaml
job.batch/qj-1 created
podgroup.scheduling.incubator.k8s.io/qj-1 created

 runqch@ib22b10-557: kubectl get pods | grep qj-1 | wc -l
60
 runqch@ib22b10-558: kubectl get pods | grep qj-1 | grep Running | wc -l   **<== all 60 pods running**
60

2nd job:

 runqch@ib22b10-559: cat job2.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: qj-2
spec:
  backoffLimit: 60
  completions: 60
  parallelism: 60
  template:
    metadata:
      annotations:
        scheduling.k8s.io/group-name: qj-2
    spec:
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: busybox
        command:
           - sleep
           - "2000"
        resources:
          requests:
            cpu: "1"
      restartPolicy: Never
      schedulerName: kube-batch
---
apiVersion: scheduling.incubator.k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: qj-2
spec:
  minMember: 60

 runqch@ib22b10-560: kubectl create -f ./job2.yaml
job.batch/qj-2 created
podgroup.scheduling.incubator.k8s.io/qj-2 created


 runqch@ib22b10-564: kubectl get pods | grep qj-2 | grep Running | wc -l    
0
 runqch@ib22b10-565: kubectl get pods | grep qj-1 | grep Running | wc -l  
30
 runqch@ib22b10-563: kubectl get pods | grep qj-2 | wc -l   
60

===>>> From above, we can see job1 was preempted by job2, 30 cores free-ed by job1, but job2 can not go due to minMember restriction. The expected behavior is： job should not preempt other job if it can not go after preemption

wait for a while, after 30 pod completed, worse conditons happens. After 30 pods of job1 completed, the released cores should can be re-used by the other 30 pods, but actually, only 1 pod continue run. Wired.

 runqch@ib22b10-573: kubectl get pods | grep qj-1 | grep Running | wc -l
1
 runqch@ib22b10-574: kubectl get pods | grep qj-1 | grep Completed | wc -l
30
 runqch@ib22b10-575: kubectl get pods | grep qj-1 | grep Pending | wc -l
29
 runqch@ib22b10-576: kubectl get pods | grep qj-2 | grep Pending | wc -l
60

3rd job:

 runqch@ib22b10-586: cat job3.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: qj-3
spec:
  backoffLimit: 60
  completions: 60
  parallelism: 60
  template:
    metadata:
      annotations:
        scheduling.k8s.io/group-name: qj-3
    spec:
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: busybox
        command:
           - sleep
           - "2000"
        resources:
          requests:
            cpu: "1"
      restartPolicy: Never
      schedulerName: kube-batch
---
apiVersion: scheduling.incubator.k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: qj-3
spec:
  minMember: 1

 runqch@ib22b10-577: kubectl create -f ./job3.yaml
job.batch/qj-3 created
podgroup.scheduling.incubator.k8s.io/qj-3 created

 runqch@ib22b10-578: kubectl get pods | grep qj-3 | grep Pending | wc -l
60
 runqch@ib22b10-582: kubectl get pods | grep qj-2 | grep Pending | wc -l
60
 runqch@ib22b10-583: kubectl get pods | grep qj-1 | grep Running | wc -l
1

>>> the left 30 cores either can be used by job2, or job3. Resource is idle there.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.11.3
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): Linux ib22b10 3.10.0-862.6.3.el7.x86_64 typo fixes #1 SMP Fri Jun 15 17:57:37 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

k82cn · 2018-10-17T23:41:21Z

thanks very for your report :) That's an issue that we did not handle well, let me fix it.

/assign

k82cn · 2018-10-17T23:44:44Z

/kind bug
/sig scheduling
/milestone v0.3

k82cn · 2018-12-21T06:27:34Z

fixed by #457 #505

k8s-ci-robot assigned k82cn Oct 17, 2018

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Oct 17, 2018

k82cn added this to the v0.3 milestone Oct 18, 2018

k82cn mentioned this issue Oct 20, 2018

Added Statement for eviction in batch. #457

Merged

k82cn mentioned this issue Nov 24, 2018

WIP: Added Statement. #481

Closed

k82cn mentioned this issue Dec 20, 2018

Added e2e test for Statement. #505

Merged

k82cn closed this as completed Dec 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[preemption/reclaim] preemption/reclaim not work properly when there is gang job. #446

[preemption/reclaim] preemption/reclaim not work properly when there is gang job. #446

runqch commented Oct 17, 2018 •

edited by k82cn

Loading

k82cn commented Oct 17, 2018

k82cn commented Oct 17, 2018

k82cn commented Dec 21, 2018

[preemption/reclaim] preemption/reclaim not work properly when there is gang job. #446

[preemption/reclaim] preemption/reclaim not work properly when there is gang job. #446

Comments

runqch commented Oct 17, 2018 • edited by k82cn Loading

k82cn commented Oct 17, 2018

k82cn commented Oct 17, 2018

k82cn commented Dec 21, 2018

runqch commented Oct 17, 2018 •

edited by k82cn

Loading