Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineRun Cancellation is not working #2369

Merged
merged 1 commit into from Apr 14, 2020
Merged

PipelineRun Cancellation is not working #2369

merged 1 commit into from Apr 14, 2020

Conversation

ghost
Copy link

@ghost ghost commented Apr 10, 2020

Changes

Cancelling a PipelineRun is supposed to also cancel the TaskRuns spawned by that PipelineRun. The way that we do this is to issue Update and UpdateStatus calls on each TaskRun. Unfortunately this can (and does) fail often because modifications to the TaskRun race each other. This has resulted in many many failed integration tests, giving the PipelineRun cancellation e2e tests the appearance of being flakey. In fact they were actually catching real problems!

This commit updates the PipelineRun reconciler's behaviour to PATCH TaskRuns associated with a PipelineRun. This updates the TaskRun's spec.status regardless of its current resourceVersion / generation.

A similar related issue was happening in the PipelineRun cancel test itself. When submitting the cancellation status to the test's PipelineRun the API server was sometimes rejecting that update and failing the test. This has also been replaced with a PATCH.

The PipelineRun cancellation e2e tests now appear to be passing consistently.

Co-authored by @bobcatfish

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

@ghost ghost requested review from vdemeester and vincent-pli April 10, 2020 18:35
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2020
@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Apr 10, 2020
@tekton-robot tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 10, 2020
@ghost ghost removed request for dlorenc and afrittoli April 10, 2020 18:38
@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 13, 2020
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no and removed cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit labels Apr 13, 2020
@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 13, 2020
@ghost ghost changed the title WIP PipelineRun Cancellation is not working PipelineRun Cancellation is not working Apr 13, 2020
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2020
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit and removed cla: no labels Apr 13, 2020
@ghost ghost requested a review from bobcatfish April 13, 2020 17:20
Cancelling a PipelineRun is supposed to also cancel the TaskRuns
spawned by that PipelineRun. The way that we do this is to issue
`Update` and `UpdateStatus` calls on each TaskRun. Unfortunately
this can (and does) fail often because modifications to the TaskRun
race each other.  This has resulted in many many failed integration
tests, giving the PipelineRun cancellation e2e tests the appearance
of being flakey.  In fact they were actually catching real problems!

This commit updates the PipelineRun reconciler's behaviour to
PATCH TaskRuns associated with a PipelineRun. This updates the
TaskRun's `spec.status` regardless of its current resourceVersion /
generation.

A similar related issue was happening in the PipelineRun cancel test
itself. When submitting the cancellation status to the test's
PipelineRun the API server was sometimes rejecting that update
and failing the test. This has also been replaced with a PATCH.

The PipelineRun cancellation e2e tests now appear to be passing
consistently.

coauthored with @bobcatfish
@vincent-pli
Copy link
Member

This updates the TaskRun's spec.status regardless of its current resourceVersion / generation.

thanks.
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2020
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2020
@tekton-robot tekton-robot merged commit 5818d59 into tektoncd:master Apr 14, 2020
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Apr 17, 2020
While working on tektoncd#2369 (flakey tests around cancellation that actually
revealed underlying bugs) we were running into a case where trying to
cancel a PipelineRun's TaskRuns was failing (due to a race condition).
In that case, the PipelineRun would be marked as cancelled and done,
even though the PipelineRun was actually still executing in that one or
more TaskRuns were actually still running.

Now when that happens we will indicate that the PipelineRun is still
running and return an error, which will mean that on the next reconcile,
the Reconciler will try to reconcile again, and the PipelineRun
conditions will reflect what is actually happening.

Co-authored-by: Sharon Jerop Kipruto <[email protected]>

Fixes tektoncd#2381
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Apr 27, 2020
While working on tektoncd#2369 (flakey tests around cancellation that actually
revealed underlying bugs) we were running into a case where trying to
cancel a PipelineRun's TaskRuns was failing (due to a race condition).
In that case, the PipelineRun would be marked as cancelled and done,
even though the PipelineRun was actually still executing in that one or
more TaskRuns were actually still running.

Now when that happens we will indicate that the PipelineRun is still
running and return an error, which will mean that on the next reconcile,
the Reconciler will try to reconcile again, and the PipelineRun
conditions will reflect what is actually happening.

Co-authored-by: Sharon Jerop Kipruto <[email protected]>

Fixes tektoncd#2381
@afrittoli afrittoli added the kind/bug Categorizes issue or PR as related to a bug. label Apr 30, 2020
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request May 11, 2020
While working on tektoncd#2369 (flakey tests around cancellation that actually
revealed underlying bugs) we were running into a case where trying to
cancel a PipelineRun's TaskRuns was failing (due to a race condition).
In that case, the PipelineRun would be marked as cancelled and done,
even though the PipelineRun was actually still executing in that one or
more TaskRuns were actually still running.

Now when that happens we will indicate that the PipelineRun is still
running and return an error, which will mean that on the next reconcile,
the Reconciler will try to reconcile again, and the PipelineRun
conditions will reflect what is actually happening.

Co-authored-by: Sharon Jerop Kipruto <[email protected]>

Fixes tektoncd#2381
tekton-robot pushed a commit that referenced this pull request May 11, 2020
While working on #2369 (flakey tests around cancellation that actually
revealed underlying bugs) we were running into a case where trying to
cancel a PipelineRun's TaskRuns was failing (due to a race condition).
In that case, the PipelineRun would be marked as cancelled and done,
even though the PipelineRun was actually still executing in that one or
more TaskRuns were actually still running.

Now when that happens we will indicate that the PipelineRun is still
running and return an error, which will mean that on the next reconcile,
the Reconciler will try to reconcile again, and the PipelineRun
conditions will reflect what is actually happening.

Co-authored-by: Sharon Jerop Kipruto <[email protected]>

Fixes #2381
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants