Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add option to leave image pull/delete Jobs running #101

Closed
senthilrch opened this issue Aug 14, 2021 · 3 comments
Closed

Feature: Add option to leave image pull/delete Jobs running #101

senthilrch opened this issue Aug 14, 2021 · 3 comments
Assignees
Labels
feature New feature
Milestone

Comments

@senthilrch
Copy link
Owner

When any imagecache operation is performed (create, modify, refresh, purge etc.) if some jobs failed, kube-fledged fetches error information from the corresponding pod and updates in imagecache status section. And deletes all the jobs that were created for the operation.

In several situations, leaving the failed jobs without deleting them will help further troubleshooting the exact cause for the failure. Add a new flag in the controller (and surface the same to helm values.yaml and the operator CR) that will allow the controller to leave failed jobs running. Default value: delete

See logs of imagecache modify operation that results in two jobs getting expired:-

I0814 06:01:57.297028       1 controller.go:430] Starting to sync image cache imagecache1(update)
I0814 06:01:57.351708       1 controller.go:633] Completed sync actions for image cache imagecache1(update)
I0814 06:01:57.370201       1 image_manager.go:428] Job imagecache1-dlxbz created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0814 06:01:57.378886       1 image_manager.go:428] Job imagecache1-clz99 created (pull:- quay.io/non-existent-job:latest --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0814 06:01:57.386584       1 image_manager.go:428] Job imagecache1-nthpm created (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0814 06:01:57.396789       1 image_manager.go:428] Job imagecache1-g8qjz created (pull:- quay.io/non-existent-job:latest --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0814 06:01:57.408205       1 image_manager.go:428] Job imagecache1-hrsfs created (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0814 06:01:57.408391       1 image_manager.go:430] Job not created (image-already-present:- quay.io/bitnami/mariadb:10.5.11 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0814 06:02:00.497296       1 image_manager.go:179] Job imagecache1-dlxbz succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0814 06:02:00.515951       1 image_manager.go:179] Job imagecache1-hrsfs succeeded (pull:- quay.io/bitnami/redis:6.2.5 --> aks-si03c8m32-81246184-vmss000009, runtime: docker://19.3.14)
I0814 06:02:00.569377       1 image_manager.go:179] Job imagecache1-nthpm succeeded (pull:- quay.io/bitnami/nginx:1.21.1 --> aks-si03c8m32-81246184-vmss000000, runtime: docker://19.3.14)
I0814 06:06:57.409247       1 image_manager.go:223] Job imagecache1-g8qjz expired (pull: quay.io/non-existent-job:latest --> aks-si03c8m32-81246184-vmss000000)
I0814 06:06:57.426260       1 image_manager.go:223] Job imagecache1-clz99 expired (pull: quay.io/non-existent-job:latest --> aks-si03c8m32-81246184-vmss000009)
I0814 06:06:57.500211       1 controller.go:430] Starting to sync image cache imagecache1(statusupdate)
I0814 06:06:57.638035       1 controller.go:633] Completed sync actions for image cache imagecache1(statusupdate)
I0814 06:06:57.638160       1 event.go:282] Event(v1.ObjectReference{Kind:"ImageCache", Namespace:"kube-fledged", Name:"imagecache1", UID:"80953422-7060-418f-a6ad-c24b403010b1", APIVersion:"kubefledged.io/v1alpha2", ResourceVersion:"131524659", FieldPath:""}): type: 'Warning' reason: 'ImageCacheUpdate' Image pull failed for some images. Please see "failures" section
^C
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get ic imagecache1 -o yaml
apiVersion: kubefledged.io/v1alpha2
kind: ImageCache
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubefledged.io/v1alpha2","kind":"ImageCache","metadata":{"annotations":{},"labels":{"app":"kubefledged","component":"imagecache"},"name":"imagecache1","namespace":"kube-fledged"},"spec":{"cacheSpec":[{"images":["quay.io/bitnami/nginx:1.21.1","quay.io/bitnami/tomcat:10.0.8"]},{"images":["quay.io/bitnami/redis:6.2.5","quay.io/bitnami/mariadb:10.5.11"],"nodeSelector":{"tier":"backend"}}],"imagePullSecrets":[{"name":"myregistrykey"}]}}
  creationTimestamp: "2021-08-13T15:27:34Z"
  generation: 129
  labels:
    app: kubefledged
    component: imagecache
  managedFields:
  - apiVersion: kubefledged.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:labels:
          .: {}
          f:app: {}
          f:component: {}
      f:spec:
        .: {}
        f:imagePullSecrets: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-08-13T15:27:34Z"
  - apiVersion: kubefledged.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:cacheSpec: {}
    manager: kubectl-edit
    operation: Update
    time: "2021-08-13T15:29:08Z"
  - apiVersion: kubefledged.io/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:completionTime: {}
        f:failures:
          .: {}
          f:quay.io/non-existent-job:latest: {}
        f:message: {}
        f:reason: {}
        f:startTime: {}
        f:status: {}
    manager: kubefledged-controller
    operation: Update
    time: "2021-08-14T06:06:57Z"
  name: imagecache1
  namespace: kube-fledged
  resourceVersion: "131527064"
  selfLink: /apis/kubefledged.io/v1alpha2/namespaces/kube-fledged/imagecaches/imagecache1
  uid: 80953422-7060-418f-a6ad-c24b403010b1
spec:
  cacheSpec:
  - images:
    - quay.io/bitnami/nginx:1.21.1
    - quay.io/non-existent-job:latest
  - images:
    - quay.io/bitnami/redis:6.2.5
    - quay.io/bitnami/mariadb:10.5.11
    nodeSelector:
      tier: backend
  imagePullSecrets:
  - name: myregistrykey
status:
  completionTime: "2021-08-14T06:06:57Z"
  failures:
    quay.io/non-existent-job:latest:
    - message: 'Back-off pulling image "quay.io/non-existent-job:latest":Failed to
        pull image "quay.io/non-existent-job:latest": rpc error: code = Unknown desc
        = Error response from daemon: error parsing HTTP 404 response body: invalid
        character ''<'' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD
        HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The
        requested URL was not found on the server. If you entered the URL manually
        please check your spelling and try again.</p>\n":Error: ErrImagePull:Error:
        ImagePullBackOff'
      node: aks-si03c8m32-81246184-vmss000009
      reason: ImagePullBackOff
    - message: 'Back-off pulling image "quay.io/non-existent-job:latest":Failed to
        pull image "quay.io/non-existent-job:latest": rpc error: code = Unknown desc
        = Error response from daemon: error parsing HTTP 404 response body: invalid
        character ''<'' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD
        HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The
        requested URL was not found on the server. If you entered the URL manually
        please check your spelling and try again.</p>\n":Error: ErrImagePull:Error:
        ImagePullBackOff'
      node: aks-si03c8m32-81246184-vmss000000
      reason: ImagePullBackOff
  message: Image pull failed for some images. Please see "failures" section
  reason: ImageCacheUpdate
  startTime: "2021-08-14T06:01:57Z"
  status: Failed
eechens@EMB-Q6BUMD6N kube-fledged % kubectl get jobs
No resources found in kube-fledged namespace.
eechens@EMB-Q6BUMD6N kube-fledged %
@senthilrch senthilrch added the feature New feature label Aug 14, 2021
@senthilrch senthilrch self-assigned this Sep 1, 2021
@senthilrch senthilrch added this to the v0.9.0 milestone Sep 1, 2021
@senthilrch senthilrch changed the title Add option to leave failed image pull/delete Jobs running Feature: Add option to leave failed image pull/delete Jobs running Dec 27, 2021
@senthilrch senthilrch removed this from the v0.9.0 milestone Jan 14, 2022
@senthilrch
Copy link
Owner Author

SODACODE22: Raise PR against "develop" branch.

@niladrih
Copy link
Contributor

@senthilrch -- I'd like to work on this issue.
Are you going for retention of the Job object for only 'failed' Image Manager work items? Or can it be extended to successful items as well?

Here's the implementation I have in mind:
--job-retention-policy flag for the controller. This translates to a CR spec JSON string called 'jobRetentionPolicy'.
The values for this could be 'Retain' and 'Delete'.

@senthilrch senthilrch assigned niladrih and unassigned senthilrch Apr 10, 2022
@senthilrch senthilrch added this to the v0.10.0 milestone Apr 10, 2022
@senthilrch
Copy link
Owner Author

@niladrih It's preferrable to have the retention for both successful and failed jobs.
The new flag --job-retention-policy is applicable for all ImageCache CRs so no need to map this to a field in the CR spec. The value of this flag should be passed to Image manager when starting it...Based on this value, Image manager would either delete or retain the jobs.
This new flag would not affect the pre-flight-checks existing within the controller start-up. i.e. when the controller is restarted all existing jobs should be deleted

@senthilrch senthilrch changed the title Feature: Add option to leave failed image pull/delete Jobs running Feature: Add option to leave image pull/delete Jobs running Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

2 participants