Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The resourcebinding of the job has not been deleted. #4467

Closed
chaunceyjiang opened this issue Dec 22, 2023 · 9 comments
Closed

The resourcebinding of the job has not been deleted. #4467

chaunceyjiang opened this issue Dec 22, 2023 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@chaunceyjiang
Copy link
Member

chaunceyjiang commented Dec 22, 2023

What happened:

The resourcebinding of the job has not been deleted.

I have no name!@debug-network-pod:/tmp$ kubectl get jobs --kubeconfig kubeconfig  -n default
NAME   COMPLETIONS   DURATION   AGE
xxx    0/1           97s        98s
I have no name!@debug-network-pod:/tmp$ kubectl get jobs --kubeconfig kubeconfig  -n default xxx -oyaml
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    batch.kubernetes.io/job-tracking: ""
    propagationpolicy.karmada.io/name: xxx-pp-attes
    propagationpolicy.karmada.io/namespace: default
  creationTimestamp: "2023-12-22T08:41:12Z"
  generation: 1
  labels:
    app: xxx
    controller-uid: c10ca564-e0f7-4b40-8e4a-7df1ca0a077c
    job-name: xxx
    propagationpolicy.karmada.io/name: xxx-pp-attes
    propagationpolicy.karmada.io/namespace: default
    propagationpolicy.karmada.io/uid: 5043ba38-8615-450d-a057-66569adec0e0
  name: xxx
  namespace: default
  resourceVersion: "4551700"
  uid: c10ca564-e0f7-4b40-8e4a-7df1ca0a077c
I have no name!@debug-network-pod:/tmp$ kubectl get resourcebindings --kubeconfig kubeconfig  -n default
NAME                SCHEDULED   FULLYAPPLIED   AGE
xxx-job             True        True           2m24s
I have no name!@debug-network-pod:/tmp$ kubectl get resourcebindings --kubeconfig kubeconfig  -n default  xxx-job -oyaml
apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
metadata:
  annotations:
    policy.karmada.io/applied-placement: '{"clusterAffinities":[{"affinityName":"default","clusterNames":["wawa-dev"]}],"clusterTolerations":[{"key":"cluster.karmada.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":30},{"key":"cluster.karmada.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":30}],"replicaScheduling":{"replicaSchedulingType":"Duplicated"}}'
    propagationpolicy.karmada.io/name: xxx-pp-attes
    propagationpolicy.karmada.io/namespace: default
    resourcebinding.karmada.io/dependencies: "null"
  creationTimestamp: "2023-12-22T08:41:12Z"
  finalizers:
  - karmada.io/binding-controller
  generation: 3
  labels:
    propagationpolicy.karmada.io/name: xxx-pp-attes
    propagationpolicy.karmada.io/namespace: default
    propagationpolicy.karmada.io/uid: 5043ba38-8615-450d-a057-66569adec0e0
  name: xxx-job
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: xxx
    uid: c10ca564-e0f7-4b40-8e4a-7df1ca0a077c
  resourceVersion: "4551698"
  uid: 56c9158d-2fa1-4776-8e7d-ec84f0d0d46d
spec:
  clusters:
  - name: wawa-dev
    replicas: 1
  conflictResolution: Abort
  placement:
    clusterAffinities:
    - affinityName: default
      clusterNames:
      - wawa-dev
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 30
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 30
    replicaScheduling:
      replicaSchedulingType: Duplicated
  propagateDeps: true
  replicaRequirements:
    resourceRequest:
      cpu: 250m
      memory: 512Mi
  replicas: 1
  resource:
    apiVersion: batch/v1
    kind: Job
    name: xxx
    namespace: default
    resourceVersion: "4551639"
    uid: c10ca564-e0f7-4b40-8e4a-7df1ca0a077c
  schedulerName: default-scheduler

Delete job xxx through client-go.
I have no name!@debug-network-pod:/tmp$ kubectl get resourcebindings  --kubeconfig kubeconfig  -n default
NAME                SCHEDULED   FULLYAPPLIED   AGE
xxx-job             True        True           14m
I have no name!@debug-network-pod:/tmp$
I have no name!@debug-network-pod:/tmp$ kubectl get jobs --kubeconfig kubeconfig  -n default
No resources found in default namespace.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The delete event seems to have triggered rematching policies.
ownerReferences has been removed

│ I1222 08:54:43.049281       1 detector.go:217] Reconciling object: batch/v1, kind=Job, default/xxx
│ I1222 08:54:43.049585       1 detector.go:380] Applying policy(default/xxx-pp-attes) for object: batch/v1, kind=Job, default/xxx
│ I1222 08:54:43.049631       1 configurable.go:68] Get replicas for object: batch/v1, Kind=Job default/xxx with configurable interpreter.
│ I1222 08:54:43.049650       1 customized.go:77] Get replicas for object: batch/v1, Kind=Job default/xxx with webhook interpreter.
│ I1222 08:54:43.049667       1 thirdparty.go:54] Get replicas for object: batch/v1, Kind=Job default/xxx with thirdparty configurable interpreter.
│ I1222 08:54:43.049680       1 default.go:78] Get replicas for object: batch/v1, Kind=Job default/xxx with build-in interpreter.
│ I1222 08:54:43.176048       1 detector.go:449] Update ResourceBinding(default/xxx-job) successfully.
│ I1222 08:54:43.176215       1 binding_controller.go:55] Reconciling ResourceBinding default/xxx-job.
│ I1222 08:54:43.176386       1 recorder.go:104] "events: Apply policy(default/xxx-pp-attes) succeed" type="Normal" object={Kind:Job Namespace:default Name:xxx UID:c10ca564-e0f7-4b40-8e4a-7df1ca0a077c APIVersion:batch/v1 ResourceVersion:4553571 FieldPath:} reason="ApplyPolicySucceed"
│ I1222 08:54:43.176484       1 dependencies_distributor.go:210] Start to reconcile ResourceBinding(default/xxx-job)
│ I1222 08:54:43.176590       1 configurable.go:143] Get dependencies of object: batch/v1, Kind=Job default/xxx with configurable interpreter.
│ I1222 08:54:43.176615       1 thirdparty.go:129] Get dependencies of object: batch/v1, Kind=Job default/xxx with thirdparty configurable interpreter.
│ I1222 08:54:43.176630       1 default.go:118] Get dependencies of object: batch/v1, Kind=Job default/xxx with build-in interpreter.
│ I1222 08:54:43.176670       1 overridemanager.go:162] No override policy for resource(default/xxx)
│ I1222 08:54:43.177585       1 recorder.go:104] "events: Get dependencies([]) succeed." type="Normal" object={Kind:Job Namespace:default Name:xxx UID:c10ca564-e0f7-4b40-8e4a-7df1ca0a077c APIVersion:batch/v1 ResourceVersion:4553571 FieldPath:} reason="GetDependenciesSucceed"
│ I1222 08:54:43.177623       1 recorder.go:104] "events: Sync schedule results to dependencies succeed." type="Normal" object={Kind:ResourceBinding Namespace:default Name:xxx-job UID:9da0c48f-2138-4f9b-97c9-d8bf1ac068e2 APIVersion:work.karmada.io/v1alpha2 ResourceVersion:4553572 FieldPath:} reason="SyncScheduleR
│ I1222 08:54:43.249978       1 dependencies_distributor.go:583] Dropping resource binding(default/xxx-job) as the Generation is not changed.
│ I1222 08:54:43.336335       1 service_export_controller.go:68] Reconciling Work karmada-es-wawa-dev/xxx-796b65b785
│ I1222 08:54:43.337198       1 work.go:79] Update work karmada-es-wawa-dev/xxx-796b65b785 successfully.
│ I1222 08:54:43.337234       1 binding_controller.go:123] Sync work of resourceBinding(default/xxx-job) successful.
│ I1222 08:54:43.337433       1 work_status_controller.go:65] Reconciling status of Work karmada-es-wawa-dev/xxx-796b65b785.
│ I1222 08:54:43.337873       1 recorder.go:104] "events: Sync work of resourceBinding(default/xxx-job) successful." type="Normal" object={Kind:ResourceBinding Namespace:default Name:xxx-job UID:9da0c48f-2138-4f9b-97c9-d8bf1ac068e2 APIVersion:work.karmada.io/v1alpha2 ResourceVersion:4553572 FieldPath:} reason="Sy
│ I1222 08:54:43.338078       1 recorder.go:104] "events: Sync work of resourceBinding(default/xxx-job) successful." type="Normal" object={Kind:Job Namespace:default Name:xxx UID:c10ca564-e0f7-4b40-8e4a-7df1ca0a077c APIVersion:batch/v1 ResourceVersion:4553571 FieldPath:} reason="SyncWorkSucceed"
│ I1222 08:54:43.416061       1 dependencies_distributor.go:583] Dropping resource binding(default/xxx-job) as the Generation is not changed.
│ I1222 08:54:43.636341       1 detector.go:217] Reconciling object: batch/v1, kind=Job, default/xxx
│ E1222 08:54:43.668206       1 detector.go:604] Failed to get object(batch/v1, kind=Job, default/xxx), error: jobs.batch "xxx" not found


I have no name!@debug-network-pod:/tmp$ kubectl get resourcebindings  --kubeconfig kubeconfig  -n default xxx-job -oyaml
apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
metadata:
  annotations:
    policy.karmada.io/applied-placement: '{"clusterAffinities":[{"affinityName":"default","clusterNames":["wawa-dev"]}],"clusterTolerations":[{"key":"cluster.karmada.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":30},{"key":"cluster.karmada.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":30}],"replicaScheduling":{"replicaSchedulingType":"Duplicated"}}'
    propagationpolicy.karmada.io/name: xxx-pp-attes
    propagationpolicy.karmada.io/namespace: default
    resourcebinding.karmada.io/dependencies: "null"
  creationTimestamp: "2023-12-22T08:41:12Z"
  finalizers:
  - karmada.io/binding-controller
  generation: 4
  labels:
    propagationpolicy.karmada.io/name: xxx-pp-attes
    propagationpolicy.karmada.io/namespace: default
    propagationpolicy.karmada.io/uid: 5043ba38-8615-450d-a057-66569adec0e0
  name: xxx-job
  namespace: default
  resourceVersion: "4757798"
  uid: 56c9158d-2fa1-4776-8e7d-ec84f0d0d46d
spec:
  clusters:
  - name: wawa-dev
    replicas: 1
  conflictResolution: Abort
  placement:
    clusterAffinities:
    - affinityName: default
      clusterNames:
      - wawa-dev
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 30
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 30
    replicaScheduling:
      replicaSchedulingType: Duplicated
  propagateDeps: true
  replicaRequirements:
    resourceRequest:
      cpu: 250m
      memory: 512Mi
  replicas: 1
  resource:
    apiVersion: batch/v1
    kind: Job
    name: xxx
    namespace: default
    resourceVersion: "4553571"
    uid: 0925623f-bee7-4645-90a9-853fcbef376d
  schedulerName: default-scheduler
  schedulerName: default-scheduler
status:
  aggregatedStatus:
  - applied: true
    clusterName: wawa-dev
    health: Unknown
    status:
      active: 1
      startTime: "2023-12-22T08:54:10Z"
  conditions:
  - lastTransitionTime: "2023-12-22T08:53:25Z"
    message: Binding has been scheduled successfully.
    reason: Success
    status: "True"
    type: Scheduled
  - lastTransitionTime: "2023-12-22T08:53:37Z"
    message: All works have been successfully applied
    reason: FullyAppliedSuccess
    status: "True"
    type: FullyApplied
  schedulerObservedGeneration: 4
  schedulerObservingAffinityName: defaul

Environment:

  • Karmada version:
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:
@chaunceyjiang chaunceyjiang added the kind/bug Categorizes issue or PR as related to a bug. label Dec 22, 2023
@whitewindmills
Copy link
Member

Did you use an orphan deletion strategy?

@chaunceyjiang
Copy link
Member Author

Did you use an orphan deletion strategy?

image

This is my code for deleting a job.

@whitewindmills
Copy link
Member

Are you still able to reproduce this problem? I can't reproduce it. Only when I use an orphan deletion strategy, the phenomenon that occurs is consistent with the issue description.

@whitewindmills
Copy link
Member

Since this field ownerReferences has been removed from the resourcebinding object, that proves that the garbage collector has worked. But the resourcebinding object still exists, which looks like an orphan deletion strategy was used.

This is my code for deleting a job.

From your code, you are not using orphan deletion strategy. So we’d better take a look at the detailed audit log of the deleted Job.

@chaunceyjiang
Copy link
Member Author

Since this field ownerReferences has been removed from the resourcebinding object,

Yes.

which looks like an orphan deletion strategy was used.

I don't really understand the orphan deletion strategy. I noticed the generation changed from 3 to 4. I feel like the GC isn't working properly. It seems the 'ownerReferences' were accidentally deleted.

@whitewindmills
Copy link
Member

I feel like the GC isn't working properly. It seems the 'ownerReferences' were accidentally deleted.

Maybe, but it has nothing to do with Karmada.

@yanfeng1992
Copy link
Member

yanfeng1992 commented Jan 26, 2024

Looks similar to #969 @chaunceyjiang

try delete job with background

@chaunceyjiang
Copy link
Member Author

@yanfeng1992 Thanks for the reminder, I'll go check this out #969.

@chaunceyjiang
Copy link
Member Author

@yanfeng1992 @whitewindmills Thank you all, the problem has been resolved. It indeed was the situation as you described.

@RainbowMango RainbowMango moved this to Planned In Release 1.9 in Karmada Overall Backlog Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Planned In Release 1.9
Development

No branches or pull requests

3 participants