Pod resource leak when delete jobs #969

Garrybest · 2021-11-15T16:00:37Z

What happened:

Create a Job in karmada and propagate it into member clusters.
Then delete the job from karmada control plane: kubectl delete job pi.

You will see the pod in member clusters would not be deleted. Their ownerRef will be removed, while it has pointed to the job in member cluster before.
It is noticed that if you delete the job from member cluster, the pod garbage collection would not have any problems.
This issue is a little strange. May have something to do with the deletion approach of karmada.

What you expected to happen:
The job deletions will not cause any resource leak.

How to reproduce it (as minimally and precisely as possible):

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  parallelism: 10
  template:
    spec:
      containers:
        - name: pi
          image: perl:latest
          command: [ "perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)" ]
          resources:
            requests:
              cpu: "250m"
              memory: "64Mi"
            limits:
              cpu: "500m"
              memory: "128Mi"
      restartPolicy: Never
  backoffLimit: 4
  completions: 30
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx
spec:
  resourceSelectors:
    - apiVersion: batch/v1
      kind: Job
      name: pi
  placement:
    clusterAffinity:
      clusterNames:
        - member1
        - member2
        - member3
    replicaScheduling:
      replicaSchedulingType: Divided
      replicaDivisionPreference: Weighted
      weightPreference:
        dynamicWeight: AvailableReplicas

Anything else we need to know?:

Environment:

Karmada version:
Others:

The text was updated successfully, but these errors were encountered:

Garrybest · 2021-11-15T16:01:02Z

/cc @mrlihanbo @RainbowMango

mrlihanbo · 2021-11-16T01:18:09Z

when you delete job via client-go, set a propagation policy in the delete options if you want the child pods to be deleted as well.
when you use kubectl delete command, the default propagation policy will be background, so the child pods will be deleted as well. See：https://github.com/kubernetes/kubernetes/blob/16227cf09dcb6d1a71733d9fa20335007b0ca3d2/staging/src/k8s.io/kubectl/pkg/cmd/delete/delete_flags.go#L161
or: kubectl delete -h

--cascade='background': Must be "background", "orphan", or "foreground". Selects the deletion cascading strategy
for the dependents (e.g. Pods created by a ReplicationController). Defaults to background.

But when deleted by karmada, we didn't set the delete option in objectwacther.

err = dynamicClusterClient.DynamicClientSet.Resource(gvr).Namespace(desireObj.GetNamespace()).Delete(context.TODO(), desireObj.GetName(), metav1.DeleteOptions{})
if apierrors.IsNotFound(err) {
    err = nil
}
if err != nil {
    klog.Errorf("Failed to delete resource %v in cluster %s, err is %v ", desireObj.GetName(), clusterName, err)
    return err
}

The delete options will be orphan which won't delete child pods.

RainbowMango · 2021-11-16T01:21:22Z

What kind of delete option we are talking about?
I guess the case @Garrybest mentioned is deleting Job from Karmada control plane by command kubectl delete Job xxx.

Garrybest · 2021-11-16T01:28:15Z

Right,I use kubectl delete job pi

mrlihanbo · 2021-11-16T01:44:55Z

If the delete option is empty, the default option will be orphan. I found the code here: https://github.com/kubernetes/kubernetes/blob/16227cf09dcb6d1a71733d9fa20335007b0ca3d2/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L742

// DefaultGarbageCollectionPolicy returns OrphanDependents for batch/v1 for backwards compatibility,
// and DeleteDependents for all other versions.
func (jobStrategy) DefaultGarbageCollectionPolicy(ctx context.Context) rest.GarbageCollectionPolicy {
	var groupVersion schema.GroupVersion
	if requestInfo, found := genericapirequest.RequestInfoFrom(ctx); found {
		groupVersion = schema.GroupVersion{Group: requestInfo.APIGroup, Version: requestInfo.APIVersion}
	}
	switch groupVersion {
	case batchv1.SchemeGroupVersion:
		// for back compatibility
		return rest.OrphanDependents
	default:
		return rest.DeleteDependents
	}
}

Garrybest · 2021-11-16T02:11:00Z

Hey @mrlihanbo, these tips are interesting, but I have not yet figured out why the deletion is normal in member cluster but have problems in karmada controll plane.

RainbowMango · 2021-11-16T02:33:29Z

I have just done testing on my side with the patch here. Hard to say if this is the final solution, but it can explain something.

When you delete Jobs by kubectl, the default cascading deletion option is background, which ensures all the dependents(Pods) are deleted as well as the Owner(Job).

But, Karmada leaves the cascading deletion option empty when deleting the Job, as mentioned by @mrlihanbo that defaults to Orphan, so the Pods becomes to Orphan after Job has gone.

mrlihanbo · 2021-11-16T02:37:49Z

Hey @mrlihanbo, these tips are interesting, but I have not yet figured out why the deletion is normal in member cluster but have problems in karmada controll plane.

hi, @Garrybest There are three DeletionPropagation policies in kubernetes: Orphan, Background, Foreground. When you delete jobs in member cluster by kubectl delete command，kubectl will set delete option as Background automatically.

Garrybest · 2021-11-16T02:48:17Z

Thanks you guys a lot, I think I got it. @RainbowMango @mrlihanbo
The cmd kubectl delete sets CascadingStrategy to Background here.
Meanwhile, karmada will delete object in member clusters by using an empty delete options here. That means the delete options will be treated as default according to #969 (comment) as @mrlihanbo mentioned. So the delete option will be treated as Orphan, right?

Garrybest · 2021-11-16T03:06:52Z

I have just done testing on my side with the patch here. Hard to say if this is the final solution, but it can explain something.

It works and does not seem to be very incompatible. I think this patch could be a hot fix. @RainbowMango

RainbowMango · 2021-11-16T03:14:39Z

/assign

Garrybest added the kind/bug Categorizes issue or PR as related to a bug. label Nov 15, 2021

karmada-bot assigned RainbowMango Nov 16, 2021

RainbowMango mentioned this issue Nov 16, 2021

Set background cacscade deletion by default #970

Merged

karmada-bot closed this as completed in #970 Nov 16, 2021

yanfeng1992 mentioned this issue Jan 26, 2024

The resourcebinding of the job has not been deleted. #4467

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod resource leak when delete jobs #969

Pod resource leak when delete jobs #969

Garrybest commented Nov 15, 2021

Garrybest commented Nov 15, 2021

mrlihanbo commented Nov 16, 2021 •

edited

Loading

RainbowMango commented Nov 16, 2021

Garrybest commented Nov 16, 2021

mrlihanbo commented Nov 16, 2021

Garrybest commented Nov 16, 2021

RainbowMango commented Nov 16, 2021

mrlihanbo commented Nov 16, 2021

Garrybest commented Nov 16, 2021

Garrybest commented Nov 16, 2021

RainbowMango commented Nov 16, 2021

Pod resource leak when delete jobs #969

Pod resource leak when delete jobs #969

Comments

Garrybest commented Nov 15, 2021

Garrybest commented Nov 15, 2021

mrlihanbo commented Nov 16, 2021 • edited Loading

RainbowMango commented Nov 16, 2021

Garrybest commented Nov 16, 2021

mrlihanbo commented Nov 16, 2021

Garrybest commented Nov 16, 2021

RainbowMango commented Nov 16, 2021

mrlihanbo commented Nov 16, 2021

Garrybest commented Nov 16, 2021

Garrybest commented Nov 16, 2021

RainbowMango commented Nov 16, 2021

mrlihanbo commented Nov 16, 2021 •

edited

Loading