Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort pod container statuses based on Step order in taskSpec #3256

Merged
merged 1 commit into from
Sep 24, 2020

Conversation

Peaorl
Copy link
Contributor

@Peaorl Peaorl commented Sep 18, 2020

Changes

This commit closes #3239

Tekton determines the TaskRun status message of a failed TaskRun based on the results of the first terminated Step (pod container). Until now, Tekton sorted pod container statuses based on the FinishedAt and StartedAt timestamps set by Kubernetes.
Occasionally, a Step terminated in response to the first terminated Step could have the same timestamps as the first terminated Step.
Therefore, Tekton was not always able to correctly determine what the first terminated Step was, and as a result, Tekton may set an incorrect TaskRun status message.

In this commit, pod container statuses are sorted based on the container order as specified by Tekton in the podSpec.
Tekton bases this order on the user provided taskSpec and Steps added internally by Tekton. Therefore, Tekton accounts for internally added Steps when sorting pod container statuses.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Steps in the TaskRun status field are now sorted according to the Step order specified in the taskSpec

@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Sep 18, 2020
@tekton-robot tekton-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 18, 2020
@tekton-robot
Copy link
Collaborator

Hi @Peaorl. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 18, 2020
@vdemeester
Copy link
Member

/kind bug
/ok-to-test
@Peaorl this might need a release note (other than NONE) 😛

@tekton-robot tekton-robot added kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 18, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 87.0% 93.4% 6.4
pkg/termination/parse.go 100.0% 84.2% -15.8

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesnt merit a release note. labels Sep 18, 2020
@GregDritschler
Copy link
Contributor

In this commit, pod container statuses are sorted based on the Step order set in the taskSpec. This order ought to be correct as Tektonenforces Steps to be scheduled in this order. In case Tekton adds extra Steps (such as for pipelineresources), Tekton already updates the taskSpec with these Steps. Therefore, Tekton accounts for these internally added Steps when sorting.

The taskSpec passed to MakeTaskRunStatus does not have the internal steps. It is the user's taskSpec.

Perhaps you are looking at the steps in the TaskRun's status? Those do have the internal steps because the code fabricates steps for them from the container statuses. You'll notice those steps are not in order relative to the user's steps.

I suggest testing a TaskRun with a PipelineResource that has an error (e.g. a git resource with a repo that doesn't exist), to confirm the behavior.

Also this change needs to be separated from your other PR #3138 (or, if it depends on it, you need to hold this one until that one is approved and merged).

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 87.0% 93.4% 6.4
pkg/termination/parse.go 100.0% 84.2% -15.8

@Peaorl
Copy link
Contributor Author

Peaorl commented Sep 22, 2020

Thanks @GregDritschler ! Good catch, I made sure the updated taskSpec is passed on to the appropriate function (MakeTaskRunStatus).

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2020
@GregDritschler
Copy link
Contributor

Thanks @GregDritschler ! Good catch, I made sure the updated taskSpec is passed on to the appropriate function (MakeTaskRunStatus).

So now the updated taskSpec is passed to MakeTaskRunStatus when the pod is created, but what happens on subsequent reconcile calls when the pod exists?

This commit closes tektoncd#3239

Tekton determines the TaskRun status message of a failed TaskRun based
on the results of the first terminated Step (pod container). Until now,
Tekton sorted pod container statuses based on the FinishedAt and
StartedAt timestamps set by Kubernetes. Occasionally, a Step terminated
in response to the first terminated Step could have the same timestamps
as the first terminated Step. Therefore, Tekton was not always able to
correctly determine what the first terminated Step was, and as a result,
Tekton may set an incorrect TaskRun status message.

In this commit, pod container statuses are sorted based on the container
order as specified by Tekton in the podSpec. Tekton bases this order on
the user provided taskSpec and Steps added internally by Tekton.
Therefore, Tekton accounts for internally added Steps when
sorting pod container statuses.
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 22, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 93.3% 93.3% -0.0

@Peaorl
Copy link
Contributor Author

Peaorl commented Sep 22, 2020

I changed it such that the container order in the podSpec is used. Tekton sets this container order when the pod is created and bases the order on the taskSpec and the internally added Steps.

@Peaorl
Copy link
Contributor Author

Peaorl commented Sep 23, 2020

/unhold

@imjasonh
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2020
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2020
@GregDritschler
Copy link
Contributor

/lgtm

@tekton-robot
Copy link
Collaborator

@GregDritschler: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@imjasonh
Copy link
Member

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2020
@tekton-robot tekton-robot merged commit 46f1e00 into tektoncd:master Sep 24, 2020
@Peaorl Peaorl deleted the sortPodContainerStatuses branch November 3, 2020 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect TaskRun status due to different Steps having the same StartedAt and FinishedAt times
5 participants