-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576
Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576
Conversation
/assign @piyush-garg @wlynch @vdemeester @jerop |
The following is the coverage report on the affected files.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cherry-pick release-v0.40.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth doing a deeper comparison here instead of just checking that the reconcile succeeded? 🤔
pipeline/pkg/reconciler/taskrun/taskrun_test.go
Lines 1316 to 1326 in 3e6e4dd
tr, err := clients.Pipeline.TektonV1beta1().TaskRuns(tc.taskRun.Namespace).Get(testAssets.Ctx, tc.taskRun.Name, metav1.GetOptions{}) | |
if err != nil { | |
t.Fatalf("getting updated taskrun: %v", err) | |
} | |
condition := tr.Status.GetCondition(apis.ConditionSucceeded) | |
if condition == nil || condition.Status != corev1.ConditionUnknown { | |
t.Errorf("Expected invalid TaskRun to have in progress status, but had %v", condition) | |
} | |
if condition != nil && condition.Reason != v1beta1.TaskRunReasonRunning.String() { | |
t.Errorf("Expected reason %q but was %s", v1beta1.TaskRunReasonRunning.String(), condition.Reason) | |
} |
workspaceVolumes := workspace.CreateVolumes(tr.Spec.Workspaces) | ||
|
||
ts, err := applyParamsContextsResultsAndWorkspaces(ctx, tr, rtr, workspaceVolumes) | ||
if err != nil { | ||
logger.Errorf("Error updating task spec parameters, contexts, results and workspaces: %s", err) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be computed each time, or does this only need to happen if pod == nil?
This might help clean things up a little bit if we can preserve the old structure so we can keep all the if pod == nil
logic together - i.e.
if pod == nil {
if tr.HasVolumeClaimTemplate() {
...
}
workspaceVolumes := workspace.CreateVolumes(tr.Spec.Workspaces)
...
pod, err = c.createPod(ctx, ts, tr, rtr, workspaceVolumes)
...
}
tr.Status.TaskSpec = ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we need workspaceVolumes
as input to applyParamsContextsResultsAndWorkspaces
, and the whole issue here stemmed from that not getting called and tr.Status.TaskSpec
not getting updated, so I think we do need to the createVolumes
call every time. That's what was being done before 9cf7590
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bleh - that's unfortunate. Might be something worth looking into refactoring (i.e. regrouping things that only need to be done once vs per reconcile), but won't hold up this PR for it.
/lgtm Holding in case you wanted to make any changes to taskrun_test.go before submission, else feel free to remove and submit. |
I think I'm going to write some new tests to exercise all of this properly, but that'll probably take me a day or two, so we can merge this before then if it's urgent enough. But let's see what I can get done in the next hour or two... |
(the reason for new tests is that that particular test, for example, would always pass even without this fix, even if it was actually doing param/etc replacement, because it's only getting reconciled once and is creating a pod - the problem was reconciling when there isn't a pod) |
3e6e4dd
to
637026d
Compare
name: myarg | ||
type: string | ||
steps: | ||
- script: echo $(inputs.params.myarg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only testing for replacement of a param here because that's good enough to prove the point, and is simplest to do. Note that this issue does not show up with an inline tr.Spec.TaskSpec
, interestingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
value: foo | ||
taskRef: | ||
name: test-task-with-replacements | ||
status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried this with a pre-set tr.Status.TaskSpec
, but that kept passing when it shouldn't have. I'm guessing it might have eventually flaked into failure, but it's very possible that the specific conditions that triggered the flaky behavior would never actually show up in a unit test, so I just went with testing to make sure that a nil
tr.Status.TaskSpec
gets written properly.
The following is the coverage report on the affected files.
|
@wlynch Ok, I think I'm good with this test. It's not exercising whatever the specific trigger is that sets off the flaky behavior where |
taskRunWorkspaces := applyVolumeClaimTemplates(tr.Spec.Workspaces, *kmeta.NewControllerRef(tr)) | ||
// This is used by createPod below. Changes to the Spec are not updated. | ||
tr.Spec.Workspaces = taskRunWorkspaces | ||
if pod == nil && tr.HasVolumeClaimTemplate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth adding a comment here on the conditions to avoid the conditions moving around again when the next change happens in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
fixes tektoncd#5574 With tektoncd#5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577). This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation. Signed-off-by: Andrew Bayer <[email protected]>
637026d
to
8bb5e52
Compare
The following is the coverage report on the affected files.
|
@wlynch ping? =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester, wlynch The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wlynch Sorry to bug you again, but I didn't want to remove the hold without your approval. |
/hold cancel Hold was more for you than for me. :) /gogogo |
Yeah, but I just wanted to be polite. =) Thanks! |
/cherry-pick release-v0.40.x |
@abayer: new pull request created: #5584 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Changes
fixes #5574
With #5537, the call to
applyParamsContextsResultsAndWorkspaces
and setting oftr.Status.TaskSpec
to the output of that call was moved from happening with every call toreconcile
to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted intr.Status.TaskSpec
not getting set properly or getting reset tonil
on a subsequentreconcile
run, with the end result that, sometimes,tr.Status.TaskSpec
would contain the originalTaskSpec
without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in theTestPipelineRunStatusSpec/pipeline_status_spec_updated
e2e test, and integration test failures in Chains (tektoncd/chains#577).This change moves the PVC creation out of the
if pod == nil {
block that creates the pod if needed, while still checking ifpod
isnil
before creating the PVC, and brings theapplyParamsContextsResultsAndWorkspaces
call and setting oftr.Status.TaskSpec
back out of the pod creation block, but after the possible PVC creation.Note that I never managed to figure out what conditions resulted in the flaky behavior - I was deep in that rabbit hole when @wlynch flagged the Chains issue and I realized what the root cause was. Since I never nailed down the flake trigger, I don't have a new e2e test to add here - but then again, since I started looking at this in the first place due to
TestPipelineRunStatusSpec/pipeline_status_spec_updated
flaking, we do know that it's covered to some extent. =)/kind bug
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
functionality, content, code)
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes