Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineRun Cancellation is not working #2337

Closed
ghost opened this issue Apr 6, 2020 · 10 comments
Closed

PipelineRun Cancellation is not working #2337

ghost opened this issue Apr 6, 2020 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ghost
Copy link

ghost commented Apr 6, 2020

Expected Behavior

Cancelling a PipelineRun should result in its TaskRuns being put into a cancelled state.

Our integration test, test/cancel_test.go should pass.

Actual Behavior

Cancelling a PipelineRun no longer seems to cancel the underlying TaskRun. It appears that each attempt to cancel the TaskRuns hit errors like this:

error(s) from cancelling TaskRun(s) from PipelineRun pear: Operation cannot be fulfilled on taskruns.tekton.dev \"pear-foo-cb9rg\": the object has been modified; please apply your changes to the latest version and try again

The e2e test/cancel_test.go fails.

Steps to Reproduce the Problem

  1. go test -v -count=1 -tags=e2e -timeout=20m -run ".*PipelineRunCancel.*" ./test
  2. Watch the PipelineRuns - they are cancelled almost immediately.
  3. Look at the TaskRuns - these never get cancelled and the test times out.

Additional Info

  • Kubernetes version:

    Output of kubectl version:

I've tested against a 1.18 server and a 1.15 server. Both exhibit the same issue.

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

Latest master

@ghost
Copy link
Author

ghost commented Apr 6, 2020

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 6, 2020
@ghost ghost added this to the Pipelines 0.11.1 🐱 bug-fix milestone Apr 6, 2020
@vincent-pli
Copy link
Member

@sbwsg I checked the code:

rprt.TaskRun.Spec.Status = v1alpha1.TaskRunSpecStatusCancelled
if _, err := clientSet.TektonV1alpha1().TaskRuns(pr.Namespace).UpdateStatus(rprt.TaskRun); err != nil {
errs = append(errs, err.Error())
}
if _, err := clientSet.TektonV1alpha1().TaskRuns(pr.Namespace).Update(rprt.TaskRun); err != nil {
errs = append(errs, err.Error())
}

It update the status then the spec, it will raise the error you mentioned.
Why we need update the status of taskrun which will be cancelled? since I'm not found other such update in reconcile of pipelinerun.

@ghost
Copy link
Author

ghost commented Apr 7, 2020

Ahhh I see. In order to make this work the TaskRun would have to be re-fetched after UpdateStatus to get the latest version?

Why we need update the status of taskrun which will be cancelled? since I'm not found other such update in reconcile of pipelinerun.

This is a good question. If a TaskRun is put into a cancelled state I would expect that the TaskRun reconciler would be responsible for this.

@ghost
Copy link
Author

ghost commented Apr 7, 2020

Actually now I'm confused - I would expect the PipelineRun reconciler to be responsible for putting each TaskRun into a Cancelled state. But I would expect the TaskRun reconciler to perform any non-Status Update that is necessary in response to this.

@ghost
Copy link
Author

ghost commented Apr 7, 2020

Sidenote: I find it weird that any PRs are passing in Pipelines right now given this...

@vdemeester
Copy link
Member

If you cancel a PipelineRun, the pipelinerun reconcilier needs to update the "wanted" status of all TaskRuns to cancel.. So for we the Update make sense, the UpdateStatus doesn't 🤔

@vincent-pli
Copy link
Member

I'm not sure, but I think the status update of taskrun should be take charge by reconcile of taskrun, pipelinerun just make the change...

@vdemeester
Copy link
Member

@vincent-pli yep, the PipelineRun controller should update the spec.status of TaskRun and let the TaskRun reconcilier handle the rest (hence we shouldn't update status.xxx of TaskRun from the PipelineRun controller).

@ghost ghost self-assigned this Apr 10, 2020
@ghost ghost mentioned this issue Apr 10, 2020
2 tasks
@ghost ghost removed their assignment Apr 10, 2020
@ghost
Copy link
Author

ghost commented Apr 10, 2020

I've been trying to fix this today but haven't made very much progress. I think I've managed to reduce the number of failures but I cannot get this test to pass consistently 100% of the time. I've opened a PR with my work-in-progress changes. If anyone else has an opportunity to look at it I'd be very appreciative, otherwise I will pick it up again on Monday.

@ghost
Copy link
Author

ghost commented Apr 16, 2020

Now that #2369 is closed I'm going to call this done. Let's reopen if it starts appearing again.

@ghost ghost closed this as completed Apr 16, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants