Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576

abayer · 2022-09-28T18:25:30Z

Changes

With #5537, the call to applyParamsContextsResultsAndWorkspaces and setting of tr.Status.TaskSpec to the output of that call was moved from happening with every call to reconcile to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in tr.Status.TaskSpec not getting set properly or getting reset to nil on a subsequent reconcile run, with the end result that, sometimes, tr.Status.TaskSpec would contain the original TaskSpec without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the TestPipelineRunStatusSpec/pipeline_status_spec_updated e2e test, and integration test failures in Chains (tektoncd/chains#577).

This change moves the PVC creation out of the if pod == nil { block that creates the pod if needed, while still checking if pod is nil before creating the PVC, and brings the applyParamsContextsResultsAndWorkspaces call and setting of tr.Status.TaskSpec back out of the pod creation block, but after the possible PVC creation.

Note that I never managed to figure out what conditions resulted in the flaky behavior - I was deep in that rabbit hole when @wlynch flagged the Chains issue and I realized what the root cause was. Since I never nailed down the flake trigger, I don't have a new e2e test to add here - but then again, since I started looking at this in the first place due to TestPipelineRunStatusSpec/pipeline_status_spec_updated flaking, we do know that it's covered to some extent. =)

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Has Docs included if any changes are user facing
Has Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including
functionality, content, code)
Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Fix TaskRun parameter etc replacement logic to persist in the TaskRun's Status properly

abayer · 2022-09-28T18:26:27Z

/assign @piyush-garg @wlynch @vdemeester @jerop

tekton-robot · 2022-09-28T18:30:08Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/taskrun/taskrun.go	81.3%	81.4%	0.1

vdemeester

/cherry-pick release-v0.40.x

wlynch

Would it be worth doing a deeper comparison here instead of just checking that the reconcile succeeded? 🤔

pipeline/pkg/reconciler/taskrun/taskrun_test.go

Lines 1316 to 1326 in 3e6e4dd

    
           tr, err := clients.Pipeline.TektonV1beta1().TaskRuns(tc.taskRun.Namespace).Get(testAssets.Ctx, tc.taskRun.Name, metav1.GetOptions{}) 
        
           if err != nil { 
        
           	t.Fatalf("getting updated taskrun: %v", err) 
        
           } 
        
           condition := tr.Status.GetCondition(apis.ConditionSucceeded) 
        
           if condition == nil || condition.Status != corev1.ConditionUnknown { 
        
           	t.Errorf("Expected invalid TaskRun to have in progress status, but had %v", condition) 
        
           } 
        
           if condition != nil && condition.Reason != v1beta1.TaskRunReasonRunning.String() { 
        
           	t.Errorf("Expected reason %q but was %s", v1beta1.TaskRunReasonRunning.String(), condition.Reason) 
        
           }

wlynch · 2022-09-28T18:53:11Z

pkg/reconciler/taskrun/taskrun.go

+	workspaceVolumes := workspace.CreateVolumes(tr.Spec.Workspaces)

+	ts, err := applyParamsContextsResultsAndWorkspaces(ctx, tr, rtr, workspaceVolumes)
+	if err != nil {
+		logger.Errorf("Error updating task spec parameters, contexts, results and workspaces: %s", err)
+		return err
+	}


Do these need to be computed each time, or does this only need to happen if pod == nil?

This might help clean things up a little bit if we can preserve the old structure so we can keep all the if pod == nil logic together - i.e.

if pod == nil { if tr.HasVolumeClaimTemplate() { ... } workspaceVolumes := workspace.CreateVolumes(tr.Spec.Workspaces) ... pod, err = c.createPod(ctx, ts, tr, rtr, workspaceVolumes) ... } tr.Status.TaskSpec = ts

The problem is that we need workspaceVolumes as input to applyParamsContextsResultsAndWorkspaces, and the whole issue here stemmed from that not getting called and tr.Status.TaskSpec not getting updated, so I think we do need to the createVolumes call every time. That's what was being done before 9cf7590

bleh - that's unfortunate. Might be something worth looking into refactoring (i.e. regrouping things that only need to be done once vs per reconcile), but won't hold up this PR for it.

wlynch · 2022-09-28T19:17:32Z

/lgtm
/hold

Holding in case you wanted to make any changes to taskrun_test.go before submission, else feel free to remove and submit.

abayer · 2022-09-28T19:27:06Z

Holding in case you wanted to make any changes to taskrun_test.go before submission, else feel free to remove and submit.

I think I'm going to write some new tests to exercise all of this properly, but that'll probably take me a day or two, so we can merge this before then if it's urgent enough. But let's see what I can get done in the next hour or two...

abayer · 2022-09-28T19:29:16Z

(the reason for new tests is that that particular test, for example, would always pass even without this fix, even if it was actually doing param/etc replacement, because it's only getting reconciled once and is creating a pod - the problem was reconciling when there isn't a pod)

abayer · 2022-09-28T19:53:37Z

pkg/reconciler/taskrun/taskrun_test.go

+    name: myarg
+    type: string
+  steps:
+  - script: echo $(inputs.params.myarg)


I'm only testing for replacement of a param here because that's good enough to prove the point, and is simplest to do. Note that this issue does not show up with an inline tr.Spec.TaskSpec, interestingly.

wlynch

/lgtm

abayer · 2022-09-28T19:55:08Z

pkg/reconciler/taskrun/taskrun_test.go

+    value: foo
+  taskRef:
+    name: test-task-with-replacements
+status:


I initially tried this with a pre-set tr.Status.TaskSpec, but that kept passing when it shouldn't have. I'm guessing it might have eventually flaked into failure, but it's very possible that the specific conditions that triggered the flaky behavior would never actually show up in a unit test, so I just went with testing to make sure that a nil tr.Status.TaskSpec gets written properly.

tekton-robot · 2022-09-28T19:57:38Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/taskrun/taskrun.go	81.3%	81.4%	0.1

abayer · 2022-09-28T20:08:07Z

@wlynch Ok, I think I'm good with this test. It's not exercising whatever the specific trigger is that sets off the flaky behavior where tr.Status.TaskSpec ends up nil on reconcile, but it is making sure that the replacements are being applied to tr.Status.TaskSpec when there already is a pod.

afrittoli · 2022-09-28T20:38:51Z

pkg/reconciler/taskrun/taskrun.go

-			taskRunWorkspaces := applyVolumeClaimTemplates(tr.Spec.Workspaces, *kmeta.NewControllerRef(tr))
-			// This is used by createPod below. Changes to the Spec are not updated.
-			tr.Spec.Workspaces = taskRunWorkspaces
+	if pod == nil && tr.HasVolumeClaimTemplate() {


I think it might be worth adding a comment here on the conditions to avoid the conditions moving around again when the next change happens in the code.

fixes tektoncd#5574 With tektoncd#5537, the call to `applyParamsContextsResultsAndWorkspaces` and setting of `tr.Status.TaskSpec` to the output of that call was moved from happening with every call to `reconcile` to only happening if there wasn't already a pod created. This didn't cause any problems with execution, but it sometimes resulted in `tr.Status.TaskSpec` not getting set properly or getting reset to `nil` on a subsequent `reconcile` run, with the end result that, sometimes, `tr.Status.TaskSpec` would contain the original `TaskSpec` without parameter, result, context, and workspace references being replaced with the corresponding value. This caused flaky failures in the `TestPipelineRunStatusSpec/pipeline_status_spec_updated` e2e test, and integration test failures in Chains (tektoncd/chains#577). This change moves the PVC creation out of the `if pod == nil {` block that creates the pod if needed, while still checking if `pod` is `nil` before creating the PVC, and brings the `applyParamsContextsResultsAndWorkspaces` call and setting of `tr.Status.TaskSpec` back out of the pod creation block, but after the possible PVC creation. Signed-off-by: Andrew Bayer <[email protected]>

tekton-robot · 2022-09-29T12:12:59Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/taskrun/taskrun.go	81.3%	81.4%	0.1

abayer · 2022-09-29T15:53:35Z

@wlynch ping? =)

wlynch

/lgtm

tekton-robot · 2022-09-29T15:56:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester, wlynch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abayer · 2022-09-29T15:59:07Z

@wlynch Sorry to bug you again, but I didn't want to remove the hold without your approval.

wlynch · 2022-09-29T15:59:46Z

/hold cancel

Hold was more for you than for me. :)

/gogogo

abayer · 2022-09-29T16:11:03Z

Yeah, but I just wanted to be polite. =) Thanks!

abayer · 2022-09-29T18:18:06Z

/cherry-pick release-v0.40.x

tekton-robot · 2022-09-29T18:18:43Z

@abayer: new pull request created: #5584

In response to this:

/cherry-pick release-v0.40.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

abayer added the kind/bug Categorizes issue or PR as related to a bug. label Sep 28, 2022

tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 28, 2022

tekton-robot requested review from dibyom and lbernick September 28, 2022 18:25

tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 28, 2022

tekton-robot assigned jerop, piyush-garg, vdemeester and wlynch Sep 28, 2022

abayer mentioned this pull request Sep 28, 2022

Integration tests failing tektoncd/chains#577

Closed

abayer added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 28, 2022

vdemeester approved these changes Sep 28, 2022

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2022

wlynch reviewed Sep 28, 2022

View reviewed changes

tekton-robot added lgtm Indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 28, 2022

abayer force-pushed the taskrun-status-taskspec-dont-overwrite branch from 3e6e4dd to 637026d Compare September 28, 2022 19:52

tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 28, 2022

abayer commented Sep 28, 2022

View reviewed changes

wlynch approved these changes Sep 28, 2022

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2022

abayer commented Sep 28, 2022

View reviewed changes

afrittoli reviewed Sep 28, 2022

View reviewed changes

abayer force-pushed the taskrun-status-taskspec-dont-overwrite branch from 637026d to 8bb5e52 Compare September 29, 2022 12:07

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2022

wlynch approved these changes Sep 29, 2022

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2022

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2022

tekton-robot merged commit 4b53ae5 into tektoncd:main Sep 29, 2022

tekton-robot mentioned this pull request Sep 29, 2022

[release-v0.40.x] Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5584

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576

Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576

abayer commented Sep 28, 2022 •

edited

Loading

abayer commented Sep 28, 2022

tekton-robot commented Sep 28, 2022

vdemeester left a comment

wlynch left a comment

wlynch Sep 28, 2022

abayer Sep 28, 2022

wlynch Sep 28, 2022

wlynch commented Sep 28, 2022

abayer commented Sep 28, 2022

abayer commented Sep 28, 2022

abayer Sep 28, 2022

wlynch left a comment

abayer Sep 28, 2022

tekton-robot commented Sep 28, 2022

abayer commented Sep 28, 2022

afrittoli Sep 28, 2022

abayer Sep 29, 2022

tekton-robot commented Sep 29, 2022

abayer commented Sep 29, 2022

wlynch left a comment

tekton-robot commented Sep 29, 2022

abayer commented Sep 29, 2022

wlynch commented Sep 29, 2022

abayer commented Sep 29, 2022

abayer commented Sep 29, 2022

tekton-robot commented Sep 29, 2022

	tr, err := clients.Pipeline.TektonV1beta1().TaskRuns(tc.taskRun.Namespace).Get(testAssets.Ctx, tc.taskRun.Name, metav1.GetOptions{})
	if err != nil {
	t.Fatalf("getting updated taskrun: %v", err)
	}
	condition := tr.Status.GetCondition(apis.ConditionSucceeded)
	if condition == nil \|\| condition.Status != corev1.ConditionUnknown {
	t.Errorf("Expected invalid TaskRun to have in progress status, but had %v", condition)
	}
	if condition != nil && condition.Reason != v1beta1.TaskRunReasonRunning.String() {
	t.Errorf("Expected reason %q but was %s", v1beta1.TaskRunReasonRunning.String(), condition.Reason)
	}

Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576

Write TaskRun.Status.TaskSpec with replaced spec on every reconcile run #5576

Conversation

abayer commented Sep 28, 2022 • edited Loading

Changes

Submitter Checklist

Release Notes

abayer commented Sep 28, 2022

tekton-robot commented Sep 28, 2022

vdemeester left a comment

Choose a reason for hiding this comment

wlynch left a comment

Choose a reason for hiding this comment

wlynch Sep 28, 2022

Choose a reason for hiding this comment

abayer Sep 28, 2022

Choose a reason for hiding this comment

wlynch Sep 28, 2022

Choose a reason for hiding this comment

wlynch commented Sep 28, 2022

abayer commented Sep 28, 2022

abayer commented Sep 28, 2022

abayer Sep 28, 2022

Choose a reason for hiding this comment

wlynch left a comment

Choose a reason for hiding this comment

abayer Sep 28, 2022

Choose a reason for hiding this comment

tekton-robot commented Sep 28, 2022

abayer commented Sep 28, 2022

afrittoli Sep 28, 2022

Choose a reason for hiding this comment

abayer Sep 29, 2022

Choose a reason for hiding this comment

tekton-robot commented Sep 29, 2022

abayer commented Sep 29, 2022

wlynch left a comment

Choose a reason for hiding this comment

tekton-robot commented Sep 29, 2022

abayer commented Sep 29, 2022

wlynch commented Sep 29, 2022

abayer commented Sep 29, 2022

abayer commented Sep 29, 2022

tekton-robot commented Sep 29, 2022

abayer commented Sep 28, 2022 •

edited

Loading