-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PipelineRun Cancellation is not working #2369
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
Cancelling a PipelineRun is supposed to also cancel the TaskRuns spawned by that PipelineRun. The way that we do this is to issue `Update` and `UpdateStatus` calls on each TaskRun. Unfortunately this can (and does) fail often because modifications to the TaskRun race each other. This has resulted in many many failed integration tests, giving the PipelineRun cancellation e2e tests the appearance of being flakey. In fact they were actually catching real problems! This commit updates the PipelineRun reconciler's behaviour to PATCH TaskRuns associated with a PipelineRun. This updates the TaskRun's `spec.status` regardless of its current resourceVersion / generation. A similar related issue was happening in the PipelineRun cancel test itself. When submitting the cancellation status to the test's PipelineRun the API server was sometimes rejecting that update and failing the test. This has also been replaced with a PATCH. The PipelineRun cancellation e2e tests now appear to be passing consistently. coauthored with @bobcatfish
thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
While working on tektoncd#2369 (flakey tests around cancellation that actually revealed underlying bugs) we were running into a case where trying to cancel a PipelineRun's TaskRuns was failing (due to a race condition). In that case, the PipelineRun would be marked as cancelled and done, even though the PipelineRun was actually still executing in that one or more TaskRuns were actually still running. Now when that happens we will indicate that the PipelineRun is still running and return an error, which will mean that on the next reconcile, the Reconciler will try to reconcile again, and the PipelineRun conditions will reflect what is actually happening. Co-authored-by: Sharon Jerop Kipruto <[email protected]> Fixes tektoncd#2381
While working on tektoncd#2369 (flakey tests around cancellation that actually revealed underlying bugs) we were running into a case where trying to cancel a PipelineRun's TaskRuns was failing (due to a race condition). In that case, the PipelineRun would be marked as cancelled and done, even though the PipelineRun was actually still executing in that one or more TaskRuns were actually still running. Now when that happens we will indicate that the PipelineRun is still running and return an error, which will mean that on the next reconcile, the Reconciler will try to reconcile again, and the PipelineRun conditions will reflect what is actually happening. Co-authored-by: Sharon Jerop Kipruto <[email protected]> Fixes tektoncd#2381
While working on tektoncd#2369 (flakey tests around cancellation that actually revealed underlying bugs) we were running into a case where trying to cancel a PipelineRun's TaskRuns was failing (due to a race condition). In that case, the PipelineRun would be marked as cancelled and done, even though the PipelineRun was actually still executing in that one or more TaskRuns were actually still running. Now when that happens we will indicate that the PipelineRun is still running and return an error, which will mean that on the next reconcile, the Reconciler will try to reconcile again, and the PipelineRun conditions will reflect what is actually happening. Co-authored-by: Sharon Jerop Kipruto <[email protected]> Fixes tektoncd#2381
While working on #2369 (flakey tests around cancellation that actually revealed underlying bugs) we were running into a case where trying to cancel a PipelineRun's TaskRuns was failing (due to a race condition). In that case, the PipelineRun would be marked as cancelled and done, even though the PipelineRun was actually still executing in that one or more TaskRuns were actually still running. Now when that happens we will indicate that the PipelineRun is still running and return an error, which will mean that on the next reconcile, the Reconciler will try to reconcile again, and the PipelineRun conditions will reflect what is actually happening. Co-authored-by: Sharon Jerop Kipruto <[email protected]> Fixes #2381
Changes
Cancelling a PipelineRun is supposed to also cancel the TaskRuns spawned by that PipelineRun. The way that we do this is to issue
Update
andUpdateStatus
calls on each TaskRun. Unfortunately this can (and does) fail often because modifications to the TaskRun race each other. This has resulted in many many failed integration tests, giving the PipelineRun cancellation e2e tests the appearance of being flakey. In fact they were actually catching real problems!This commit updates the PipelineRun reconciler's behaviour to PATCH TaskRuns associated with a PipelineRun. This updates the TaskRun's
spec.status
regardless of its current resourceVersion / generation.A similar related issue was happening in the PipelineRun cancel test itself. When submitting the cancellation status to the test's PipelineRun the API server was sometimes rejecting that update and failing the test. This has also been replaced with a PATCH.
The PipelineRun cancellation e2e tests now appear to be passing consistently.
Co-authored by @bobcatfish
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them: