Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finally tasks not triggered in case of task failing due to missing results #7330

Closed
ppitonak opened this issue Nov 3, 2023 · 19 comments · Fixed by #8314
Closed

Finally tasks not triggered in case of task failing due to missing results #7330

ppitonak opened this issue Nov 3, 2023 · 19 comments · Fixed by #8314
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ppitonak
Copy link
Contributor

ppitonak commented Nov 3, 2023

Expected Behavior

I have a pipeline with two tasks - first one producing a result, second one using the result and finally task not using the result at all. Finally task should always run.

Actual Behavior

When first task doesn't produce the result, second one correctly fails. Finally task is skipped.

Steps to Reproduce the Problem

  1. Run the below pipeline
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: finally-pipeline
spec:
  tasks:
  - name: task-a
    taskSpec:
      steps:
      - name: hello
        image: registry.access.redhat.com/ubi8/ubi-minimal
        script: |
          #!/usr/bin/env bash
          echo "Hello from task A"
          # uncomment to fix the issue
          #echo "foo" > $(results.foo.path)
          sleep 5
      results:
        - name: foo
  - name: task-b
    params:
      - name: input
        value: $(tasks.task-a.results.foo)
    taskSpec:
      params:
        - name: input
      steps:
      - name: hello
        image: registry.access.redhat.com/ubi8/ubi-minimal
        script: |
          #!/usr/bin/env bash
          echo "Hello from task B"
          echo "Content of param: $(params.input)"
          sleep 5
  finally:
  - name: final-task
    taskSpec:
      steps:
      - name: finally
        image: registry.access.redhat.com/ubi8/ubi-minimal
        script: |
          #!/usr/bin/env bash
          echo "Hello from finally"
          sleep 5

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.9", GitCommit:"d1483fdf7a0578c83523bc1e2212a606a44fd71d", GitTreeState:"archive", BuildDate:"2023-09-16T00:00:00Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.9+636f2be", GitCommit:"e782f8ba0e57d260867ea108b671c94844780ef2", GitTreeState:"clean", BuildDate:"2023-10-20T19:28:29Z", GoVersion:"go1.19.13 X:strictfipsruntime", Compiler:"gc", Platform:"linux/arm64"}
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

Client version: 0.32.2
Chains version: v0.17.1
Pipeline version: v0.50.3
Triggers version: v0.25.2
Operator version: v0.68.1

Similar to #4438

@ppitonak ppitonak added the kind/bug Categorizes issue or PR as related to a bug. label Nov 3, 2023
@Yongxuanzhang
Copy link
Member

Thanks! I will take a look

@Yongxuanzhang
Copy link
Member

/assign

@Yongxuanzhang
Copy link
Member

I think this is caused by this pr #6792

@dmitry-mightydevops
Copy link

dmitry-mightydevops commented Dec 29, 2023

using tekton v0.54.0.

Same issue. In my case failed task had no image digest pushed in the result, because the build failed. Pipelinerun finished on the build-buildpack task.

buildpack-pipeline-build-buildpack-pod   0/3     Init:0/3            0          2s
buildpack-pipeline-build-buildpack-pod   0/3     Init:1/3            0          3s
buildpack-pipeline-build-buildpack-pod   0/3     Init:2/3            0          4s
buildpack-pipeline-build-buildpack-pod   0/3     PodInitializing     0          5s
buildpack-pipeline-build-buildpack-pod   3/3     Running             0          6s
buildpack-pipeline-build-buildpack-pod   3/3     Running             0          6s
buildpack-pipeline-build-buildpack-pod   2/3     NotReady            0          8s
buildpack-pipeline-build-buildpack-pod   0/3     Error               0          46s
affinity-assistant-a5f6edee78-0          1/1     Terminating         0          65s
affinity-assistant-a5f6edee78-0          0/1     Terminating         0          66s
affinity-assistant-a5f6edee78-0          0/1     Terminating         0          66s
affinity-assistant-a5f6edee78-0          0/1     Terminating         0          66s
affinity-assistant-a5f6edee78-0          0/1     Terminating         0          66s
buildpack-pipeline-build-buildpack-pod   0/3     Error               0          47s
buildpack-pipeline-build-buildpack-pod   0/3     Error               0          48s

output of the build-buildpack task:

[build-buildpack : build] Adding label 'org.opencontainers.image.source'
[build-buildpack : build] Adding label 'org.opencontainers.image.vendor'
[build-buildpack : build] Adding label 'org.opencontainers.image.version'
[build-buildpack : build] Setting default process type 'web'
[build-buildpack : build] Saving 11111111.dkr.ecr.us-east-1.amazonaws.com/project/prod/backend:fca18e...
[build-buildpack : build] *** Images (sha256:3a8aac28491b40f6709be20995013ec4dd2647643ec51d7116b54424326f4d01):
[build-buildpack : build]       11111111.dkr.ecr.us-east-1.amazonaws.com/project/prod/backend:fca18e - PUT https://11111111.dkr.ecr.us-east-1.amazonaws.com/v2/project/prod/backend/manifests/fca18e: TAG_INVALID: The image tag 'fca18e' already exists in the 'project/prod/backend' repository and cannot be overwritten because the repository is immutable.
[build-buildpack : build] ERROR: failed to export: failed to write image to the following tags: [11111111.dkr.ecr.us-east-1.amazonaws.com/project/prod/backend:fca18e: PUT https://11111111.dkr.ecr.us-east-1.amazonaws.com/v2/project/prod/backend/manifests/fca18e: TAG_INVALID: The image tag 'fca18e' already exists in the 'project/prod/backend' repository and cannot be overwritten because the repository is immutable.]

[build-buildpack : results] 2023/12/29 01:29:14 Skipping step because a previous step failed

the final task is not executed

  finally:
    # ┌─────────────────────────────────────────────────────────────────────
    # │ ⚫ Final Notify                                                     
    # └─────────────────────────────────────────────────────────────────────
    - name: notify
      displayName: "Notification from $(context.pipeline.name)"
      when:
        - input: $(params.workflow.notify)
          operator: in
          values: ['true']
      taskRef:
        name: notify
      workspaces:
        - name: source
          workspace: source
      params:
        - name:  working_directory
          value: $(params.git.checkout_directory)
        - name:  environment
          value: $(params.environment)
        - name:  application
          value:
            project:           $(params.application.project)
            component:         $(params.application.component)
            url:               $(params.application.url)
            argocd_app:        $(params.application.argocd_app)
            argocd_url:        $(params.application.argocd_url)
            tekton_url:        $(params.application.tekton_url)
            jira_url:          $(params.application.jira_url)
            jira_prefix:       $(params.application.jira_prefix)
            grafana_url:       $(params.application.grafana_url)
            karma_url:         $(params.application.karma_url)
            alertmanager_url:  $(params.application.alertmanager_url)
            sentry_url:        $(params.application.sentry_url)
            documentation_url: $(params.application.documentation_url)
            devops_team:       $(params.application.devops_team)
            pm_team:           $(params.application.pm_team)
            tm_team:           $(params.application.tm_team)
            lead_developers:   $(params.application.lead_developers)
        - name:  git
          value:
            repository_url:      $(params.git.repository_url)
            checkout_branch:     $(params.git.checkout_branch)
            checkout_revision:   $(params.git.checkout_revision)
            commit_author:       $(params.git.commit_author)
            commit_author_url:   $(params.git.commit_author_url)
            commit_message:      $(params.git.commit_message)
        - name:  status
          value: $(tasks.status)
        - name:  pipelinerun
          value:
            name:      $(context.pipelineRun.name)
            namespace: $(context.pipelineRun.namespace)
            uid:       $(context.pipelineRun.uid)
        - name: image_digest
          value: $(tasks.build-$(params.build.use).results.app_image_digest)

@Yongxuanzhang
Copy link
Member

Thanks! Just come back from holidays, will work on this issue

@dmitry-mightydevops
Copy link

@Yongxuanzhang any updates?

@Yongxuanzhang
Copy link
Member

@Yongxuanzhang any updates?

Sorry I'm kinda busy recently, 😢 , don't have enough time to come up with a solution yet.

@ppitonak
Copy link
Contributor Author

Is there any chance that anybody would work on this?

@divyansh42
Copy link
Member

divyansh42 commented May 21, 2024

@Yongxuanzhang If you are not working, shall I assign this issue to myself?

@chitrangpatel
Copy link
Contributor

@Yongxuanzhang is on leave for now so please go ahead @divyansh42. I don't think he will mind 😄

@ppitonak
Copy link
Contributor Author

ppitonak commented Aug 9, 2024

Does anybody plan to work on this?

@divyansh42
Copy link
Member

/assign
I will work on this

@divyansh42
Copy link
Member

I attempted to solve this bug but I think this is not actually a bug but more of design enhancement, therefore I would need others opinion as well.
With reference to the above pipeline example, taskrun for task-b is not getting created as task-b is failing at the validation level where it is trying to verify the result references. Looking at the code, we can't define the status of a task (even if it is failed at the validation step). We can only have the status (Success, Failure or Skip) when taskrun is created and in this case, taskrun is not getting created.
For finally to execute, DAG needs to be completed but it is not getting into the case where DAG is completed as task-b doesn't have any status.

So, I am considering this as more of a design enhancement where we want to know what should be the status when taskrun is not created and the task fails at the runtime validation.
To deal with this, the solution that I can think of is to create another status in parallel to Skipped, Failure, and Success. It can be named something like validationFailed where it will store the number of all the tasks which is failed at the runtime validation. We will mark these kinds of tasks as validation failed and check this condition when we are checking whether DAG is complete or not.

@pritidesai @afrittoli @vdemeester please let me know your thoughts on this. Also, please suggest if you think any other approach is feasible. Thanks.

@ppitonak
Copy link
Contributor Author

We can only have the status (Success, Failure or Skip) when taskrun is created and in this case, taskrun is not getting created.
For finally to execute, DAG needs to be completed but it is not getting into the case where DAG is completed as task-b doesn't have any status.

How is this different from tasks that are skipped because of when expressions?

@divyansh42
Copy link
Member

We can only have the status (Success, Failure or Skip) when taskrun is created and in this case, taskrun is not getting created.
For finally to execute, DAG needs to be completed but it is not getting into the case where DAG is completed as task-b doesn't have any status.

How is this different from tasks that are skipped because of when expressions?

@ppitonak In the case of the when expression we are marking the respective task as skipped and not failing the pipelinerun.
But we can't skip the task in this case as it is a failure.

@afrittoli
Copy link
Member

@divyansh42 thanks for looking into this. Regarding your comment:

With reference to the above pipeline example, taskrun for task-b is not getting created as task-b is failing at the validation level where it is trying to verify the result references. Looking at the code, we can't define the status of a task (even if it is failed at the validation step). We can only have the status (Success, Failure or Skip) when taskrun is created and in this case, taskrun is not getting created.

How Tekton is representing internally the inability to execute task-b should not affect the behaviour of the pipeline orchestration. The Tekton controller has reconciled all tasks: task-a has been executed, task-b has been evaluated but not executed because of the missing result, thus the DAG is complete and the finally task should be executed.

This definitely seems to me like a bug in Tekton, which we should address.

The only documented case where a finally task is not executed, is when it depends directly on a result which was not produced. Changing that would require implementing TEP-0048.

@divyansh42
Copy link
Member

Thank you @afrittoli for taking a look at this.

Initially, I had the same conclusion that task-b has been evaluated so DAG should be complete.
According to the implementation, DAG will be marked as complete if all the task under DAG is marked as Success, Failed, or Skipped. In the case of task-b, it is not marked as failed or Skipped.

By looking at the implementation it seems we can only mark this as failed when the corresponding Taskrun is created. In this case, taskrun is not getting created.

@afrittoli
Copy link
Member

Thank you @afrittoli for taking a look at this.

Initially, I had the same conclusion that task-b has been evaluated so DAG should be complete. According to the implementation, DAG will be marked as complete if all the task under DAG is marked as Success, Failed, or Skipped. In the case of task-b, it is not marked as failed or Skipped.

By looking at the implementation it seems we can only mark this as failed when the corresponding Taskrun is created. In this case, taskrun is not getting created.

There are two options to solve this that I see:

  1. Mark task-b as skipped instead of failed. This is user-facing, since the status will change, but I think it would be more precise than the current validation failure
  2. Keep the validation failure, and change the implementation so that the DAG may complete in that scenario

My preferred would be (1), although I have not checked the code to see what it would take to implement it.

@divyansh42
Copy link
Member

divyansh42 commented Sep 23, 2024

Mark task-b as skipped instead of failed. This is user-facing, since the status will change, but I think it would be more precise than the current validation failure

If we mark that task-b as skipped then as per the current logic PipelineRun will be marked as success.
I am not sure if we want to do this way, as there is some error due to which result was not produced and if we mark the pipeline as success then from the user perspective this might get hard to track the exact problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants