Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more cloud events #2677

Merged
merged 2 commits into from
Jun 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/pipelineruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ weight: 4
- [Specifying `Workspaces`](#specifying-workspaces)
- [Specifying `LimitRange` values](#specifying-limitrange-values)
- [Configuring a failure timeout](#configuring-a-failure-timeout)
- [Monitoring execution status](#monitoring-execution-status)
- [Cancelling a `PipelineRun`](#cancelling-a-pipelinerun)
- [Events](events.md#pipelineruns)

Expand Down Expand Up @@ -362,6 +363,71 @@ The `timeout` value is a `duration` conforming to Go's
values are `1h30m`, `1h`, `1m`, and `60s`. If you set the global timeout to 0, all `PipelineRuns`
that do not have an idividual timeout set will fail immediately upon encountering an error.

## Monitoring execution status

As your `PipelineRun` executes, its `status` field accumulates information on the execution of each `TaskRun`
as well as the `PipelineRun` as a whole. This information includes the name of the pipeline `Task` associated
to a `TaskRun`, the complete [status of the `TaskrRun`](taskruns.md#monitoring-execution-status) and details
about `Conditions` that may be associated to a `TaskRun`.

The following example shows an extract from the `status` field of a `PipelineRun` that has executed successfully:

```yaml
completionTime: "2020-05-04T02:19:14Z"
conditions:
- lastTransitionTime: "2020-05-04T02:19:14Z"
message: 'Tasks Completed: 4, Skipped: 0'
reason: Succeeded
status: "True"
type: Succeeded
startTime: "2020-05-04T02:00:11Z"
taskRuns:
triggers-release-nightly-frwmw-build-ng2qk:
pipelineTaskName: build
status:
completionTime: "2020-05-04T02:10:49Z"
conditions:
- lastTransitionTime: "2020-05-04T02:10:49Z"
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
podName: triggers-release-nightly-frwmw-build-ng2qk-pod-8vj99
resourcesResult:
- key: commit
resourceRef:
name: git-source-triggers-frwmw
value: 9ab5a1234166a89db352afa28f499d596ebb48db
startTime: "2020-05-04T02:05:07Z"
steps:
- container: step-build
imageID: docker-pullable://golang@sha256:a90f2671330831830e229c3554ce118009681ef88af659cd98bfafd13d5594f9
name: build
terminated:
containerID: docker://6b6471f501f59dbb7849f5cdde200f4eeb64302b862a27af68821a7fb2c25860
exitCode: 0
finishedAt: "2020-05-04T02:10:45Z"
reason: Completed
startedAt: "2020-05-04T02:06:24Z"
```

The following tables shows how to read the overall status of a `PipelineRun`:

`status`|`reason`|`completionTime` is set|Description
:-------|:-------|:---------------------:|--------------:
Unknown|Started|No|The `PipelineRun` has just been picked up by the controller.
Unknown|Running|No|The `PipelineRun` has been validate and started to perform its work.
Unknown|PipelineRunCancelled|No|The user requested the PipelineRun to be cancelled. Cancellation has not be done yet.
True|Succeeded|Yes|The `PipelineRun` completed successfully.
True|Completed|Yes|The `PipelineRun` completed successfully, one or more Tasks were skipped.
False|Failed|Yes|The `PipelineRun` failed because one of the `TaskRuns` failed.
False|\[Error message\]|No|The `PipelineRun` encountered an non-permanent error, but it's still running and it may ultimately succeed.
False|\[Error message\]|Yes|The `PipelineRun` failed with a permanent error (usually validation).
False|PipelineRunCancelled|Yes|The `PipelineRun` was cancelled successfully.
False|PipelineRunTimeout|Yes|The `PipelineRun` timed out.

When a `PipelineRun` changes status, [events](events.md#pipelineruns) are triggered accordingly.

Copy link
Member

@pritidesai pritidesai Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @afrittoli for the table, its very helpful and easier to understand, few comments here:

  1. it will be easier to understand PipelineRunCancelled if both the rows with that reason are in a sequence.
  2. We are missing Completed with status True in this table. I think we should include it since this table represents the overall status of PipelineRun.
  3. May be rename column heading PipelineRun Status to description or something similar.
  4. This is misleading "The PipelineRun encountered an error, but it's still running." This is happening because PipelineRun Status is updated with one failure without waiting for all the tasks in nextRprts to finish. We can add little more to this description until we fix issue (*PipelineRun).IsDone() is incorrect #1680, "The PipelineRun encountered an error, but waiting for already running tasks to finish."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @afrittoli for the table, its very helpful and easier to understand, few comments here:

1. it will be easier to understand `PipelineRunCancelled` if both the rows with that reason are in a sequence.

They are sorted by status. I could sort them by reason, that would make it more readable from a Reason pov, less readable from a status pov.... not sure which is best.
I thought status should go first as it's the primary indicator of the condition.

2. We are missing `Completed` with status `True` in this table. I think we should include it since this table represents the overall status of `PipelineRun`.

Nice catch, thank you

3. May be rename column heading `PipelineRun Status` to description or something similar.

Fixed

4. This is misleading "The PipelineRun encountered an error, but it's still running." This is happening because [PipelineRun Status](https://github.com/tektoncd/pipeline/blob/be9ac3bf6b27f6beb3f4b7c828ef8388f9b323c3/pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go#L437) is updated with one failure without waiting for all the tasks in `nextRprts` to finish. We can add little more to this description until we fix issue #1680, "The PipelineRun encountered an error, but waiting for already running tasks to finish."

That's not the case I was referring to here.
If any non-permanent error happens during reconcile, the reason is set to the error description, the status is set to failed, but it may be switch back to succeeded if the next reconcile cycle clears the error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I checked this again, and there does not seem to be any transient error reported by the pipelinerun controller, so I will remove this one.
I think I will add the point you made as a general notice to the pipeline run docs.

## Cancelling a `PipelineRun`

To cancel a `PipelineRun` that's currently executing, update its definition
Expand Down
21 changes: 19 additions & 2 deletions docs/taskruns.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ point for the `Pod` in which the container images specified in your `Task` will
customize the `Pod` configuration specifically for that `TaskRun`.

In the following example, the `Task` specifies a `volumeMount` (`my-cache`) object, also provided by the `TaskRun`,
using a `PersistentVolumeClaim` volume. A specific scheduler is also configured in the `SchedulerName` field.
using a `PersistentVolumeClaim` volume. A specific scheduler is also configured in the `SchedulerName` field.
The `Pod` executes with regular (non-root) user permissions.

```yaml
Expand Down Expand Up @@ -281,7 +281,7 @@ For more information, see [`ServiceAccount`](auth.md).
## Monitoring execution status

As your `TaskRun` executes, its `status` field accumulates information on the execution of each `Step`
as well as the `TaskRun` as a whole. This information includes start and stop times, exit codes, the
as well as the `TaskRun` as a whole. This information includes start and stop times, exit codes, the
fully-qualified name of the container image, and the corresponding digest.

**Note:** If any `Pods` have been [`OOMKilled`](https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/)
Expand Down Expand Up @@ -311,6 +311,23 @@ steps:
startedAt: "2019-08-12T18:22:54Z"
```

The following tables shows how to read the overall status of a `TaskRun`:

`status`|`reason`|`completionTime` is set|Description
:-------|:-------|:---------------------:|--------------:
Unknown|Started|No|The TaskRun has just been picked up by the controller.
Unknown|Pending|No|The TaskRun is waiting on a Pod in status Pending.
Unknown|Running|No|The TaskRun has been validate and started to perform its work.
Unknown|TaskRunCancelled|No|The user requested the TaskRun to be cancelled. Cancellation has not be done yet.
True|Succeeded|Yes|The TaskRun completed successfully.
False|Failed|Yes|The TaskRun failed because one of the steps failed.
False|\[Error message\]|No|The TaskRun encountered a non-permanent error, and it's still running. It may ultimately succeed.
False|\[Error message\]|Yes|The TaskRun failed with a permanent error (usually validation).
False|TaskRunCancelled|Yes|The TaskRun was cancelled successfully.
False|TaskRunTimeout|Yes|The TaskRun timed out.

When a `TaskRun` changes status, [events](events.md#taskruns) are triggered accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar observation as before, may be add a little more description to "The TaskRun encountered an error, but it's still running.", change it "The TaskRun encountered an error, but it's still running since waiting for other tasks or steps (need to check 🔍 ) to finish executing." 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, this is again about transient errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like?

{"level":"info","logger":"tekton.taskrun-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:\"TaskRun\", Namespace:\"default\", Name:\"pipelinerun-one-failure-two-success-task-c-9fmnd\", UID:\"a788c615-9058-45ae-8dea-de1310343208\", APIVersion:\"tekton.dev/v1beta1\", ResourceVersion:\"4190777\", FieldPath:\"\"}): type: 'Warning' reason: 'Error' Operation cannot be fulfilled on taskruns.tekton.dev \"pipelinerun-one-failure-two-success-task-c-9fmnd\": the object has been modified; please apply your changes to the latest version and try again","commit":"ac439fb","knative.dev/controller":"taskrun-controller"}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, failures on update are the only case. Hopefully with the genreconciler we'll see less of that

### Monitoring `Steps`

If multiple `Steps` are defined in the `Task` invoked by the `TaskRun`, you can monitor their execution
Expand Down
47 changes: 46 additions & 1 deletion pkg/apis/pipeline/v1beta1/pipelinerun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,11 @@ func (pr *PipelineRun) GetTaskRunRef() corev1.ObjectReference {
}
}

// GetStatusCondition returns the task run status as a ConditionAccessor
func (pr *PipelineRun) GetStatusCondition() apis.ConditionAccessor {
return &pr.Status
}

// GetOwnerReference gets the pipeline run as owner reference for any related objects
func (pr *PipelineRun) GetOwnerReference() metav1.OwnerReference {
return *metav1.NewControllerRef(pr, groupVersionKind)
Expand Down Expand Up @@ -101,6 +106,11 @@ func (pr *PipelineRun) GetRunKey() string {

// IsTimedOut returns true if a pipelinerun has exceeded its spec.Timeout based on its status.Timeout
func (pr *PipelineRun) IsTimedOut() bool {
return pr.HasTimedOut()
}

// HasTimedOut returns true if a pipelinerun has exceeded its spec.Timeout based on its status.Timeout
func (pr *PipelineRun) HasTimedOut() bool {
pipelineTimeout := pr.Spec.Timeout
startTime := pr.Status.StartTime

Expand Down Expand Up @@ -201,6 +211,32 @@ type PipelineRunStatus struct {
PipelineRunStatusFields `json:",inline"`
}

// PipelineRunReason represents a reason for the pipeline run "Succeeded" condition
type PipelineRunReason string

const (
// PipelineRunReasonStarted is the reason set when the PipelineRun has just started
PipelineRunReasonStarted PipelineRunReason = "Started"
// PipelineRunReasonRunning is the reason set when the PipelineRun is running
PipelineRunReasonRunning PipelineRunReason = "Running"
// PipelineRunReasonSuccessful is the reason set when the PipelineRun completed successfully
PipelineRunReasonSuccessful PipelineRunReason = "Succeeded"
// PipelineRunReasonCompleted is the reason set when the PipelineRun completed successfully with one or more skipped Tasks
PipelineRunReasonCompleted PipelineRunReason = "Completed"
// PipelineRunReasonFailed is the reason set when the PipelineRun completed with a failure
PipelineRunReasonFailed PipelineRunReason = "Failed"
// PipelineRunReasonCancelled is the reason set when the PipelineRun cancelled by the user
// This reason may be found with a corev1.ConditionFalse status, if the cancellation was processed successfully
// This reason may be found with a corev1.ConditionUnknown status, if the cancellation is being processed or failed
PipelineRunReasonCancelled PipelineRunReason = "Cancelled"
// PipelineRunReasonTimedOut is the reason set when the PipelineRun has timed out
PipelineRunReasonTimedOut PipelineRunReason = "PipelineRunTimeout"
)

func (t PipelineRunReason) String() string {
return string(t)
}

var pipelineRunCondSet = apis.NewBatchConditionSet()

// GetCondition returns the Condition matching the given type.
Expand All @@ -211,13 +247,22 @@ func (pr *PipelineRunStatus) GetCondition(t apis.ConditionType) *apis.Condition
// InitializeConditions will set all conditions in pipelineRunCondSet to unknown for the PipelineRun
// and set the started time to the current time
func (pr *PipelineRunStatus) InitializeConditions() {
started := false
if pr.TaskRuns == nil {
pr.TaskRuns = make(map[string]*PipelineRunTaskRunStatus)
}
if pr.StartTime.IsZero() {
pr.StartTime = &metav1.Time{Time: time.Now()}
started = true
}
conditionManager := pipelineRunCondSet.Manage(pr)
conditionManager.InitializeConditions()
// Ensure the started reason is set for the "Succeeded" condition
if started {
initialCondition := conditionManager.GetCondition(apis.ConditionSucceeded)
initialCondition.Reason = PipelineRunReasonStarted.String()
conditionManager.SetCondition(*initialCondition)
}
pipelineRunCondSet.Manage(pr).InitializeConditions()
}

// SetCondition sets the condition, unsetting previous conditions with the same
Expand Down
19 changes: 18 additions & 1 deletion pkg/apis/pipeline/v1beta1/pipelinerun_types_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ func TestPipelineRun_TaskRunref(t *testing.T) {
}
}

func TestInitializeConditions(t *testing.T) {
func TestInitializePipelineRunConditions(t *testing.T) {
p := &v1beta1.PipelineRun{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
Expand All @@ -100,12 +100,29 @@ func TestInitializeConditions(t *testing.T) {
t.Fatalf("PipelineRun StartTime not initialized correctly")
}

condition := p.Status.GetCondition(apis.ConditionSucceeded)
if condition.Reason != v1beta1.PipelineRunReasonStarted.String() {
t.Fatalf("PipelineRun initialize reason should be %s, got %s instead", v1beta1.PipelineRunReasonStarted.String(), condition.Reason)
}
p.Status.TaskRuns["fooTask"] = &v1beta1.PipelineRunTaskRunStatus{}

// Change the reason before we initialize again
p.Status.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Reason: "not just started",
Message: "hello",
})

p.Status.InitializeConditions()
if len(p.Status.TaskRuns) != 1 {
t.Fatalf("PipelineRun status getting reset")
}

newCondition := p.Status.GetCondition(apis.ConditionSucceeded)
if newCondition.Reason != "not just started" {
t.Fatalf("PipelineRun initialize reset the condition reason to %s", newCondition.Reason)
}
}

func TestPipelineRunIsDone(t *testing.T) {
Expand Down
61 changes: 54 additions & 7 deletions pkg/apis/pipeline/v1beta1/taskrun_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,6 @@ const (
// TaskRunSpecStatusCancelled indicates that the user wants to cancel the task,
// if not already cancelled or terminated
TaskRunSpecStatusCancelled = "TaskRunCancelled"

// TaskRunReasonCancelled indicates that the TaskRun has been cancelled
// because it was requested so by the user
TaskRunReasonCancelled = "TaskRunCancelled"
)

// TaskRunInputs holds the input values that this task was invoked with.
Expand All @@ -102,6 +98,43 @@ type TaskRunStatus struct {
TaskRunStatusFields `json:",inline"`
}

// TaskRunReason is an enum used to store all TaskRun reason for
// the Succeeded condition that are controlled by the TaskRun itself. Failure
// reasons that emerge from underlying resources are not included here
type TaskRunReason string

const (
// TaskRunReasonStarted is the reason set when the TaskRun has just started
TaskRunReasonStarted TaskRunReason = "Started"
// TaskRunReasonRunning is the reason set when the TaskRun is running
TaskRunReasonRunning TaskRunReason = "Running"
// TaskRunReasonSuccessful is the reason set when the TaskRun completed successfully
TaskRunReasonSuccessful TaskRunReason = "Succeeded"
// TaskRunReasonFailed is the reason set when the TaskRun completed with a failure
TaskRunReasonFailed TaskRunReason = "Failed"
// TaskRunReasonCancelled is the reason set when the Taskrun is cancelled by the user
TaskRunReasonCancelled TaskRunReason = "TaskRunCancelled"
// TaskRunReasonTimedOut is the reason set when the Taskrun has timed out
TaskRunReasonTimedOut TaskRunReason = "TaskRunTimeout"
)

func (t TaskRunReason) String() string {
return string(t)
}

// GetStartedReason returns the reason set to the "Succeeded" condition when
// InitializeConditions is invoked
func (trs *TaskRunStatus) GetStartedReason() string {
return TaskRunReasonStarted.String()
}

// GetRunningReason returns the reason set to the "Succeeded" condition when
// the RunsToCompletion starts running. This is used indicate that the resource
// could be validated is starting to perform its job.
func (trs *TaskRunStatus) GetRunningReason() string {
return TaskRunReasonRunning.String()
}

// MarkResourceNotConvertible adds a Warning-severity condition to the resource noting
// that it cannot be converted to a higher version.
func (trs *TaskRunStatus) MarkResourceNotConvertible(err *CannotConvertError) {
Expand All @@ -116,11 +149,11 @@ func (trs *TaskRunStatus) MarkResourceNotConvertible(err *CannotConvertError) {

// MarkResourceFailed sets the ConditionSucceeded condition to ConditionFalse
// based on an error that occurred and a reason
func (trs *TaskRunStatus) MarkResourceFailed(reason string, err error) {
func (trs *TaskRunStatus) MarkResourceFailed(reason TaskRunReason, err error) {
taskRunCondSet.Manage(trs).SetCondition(apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionFalse,
Reason: reason,
Reason: reason.String(),
Message: err.Error(),
})
}
Expand Down Expand Up @@ -185,6 +218,11 @@ func (tr *TaskRun) GetOwnerReference() metav1.OwnerReference {
return *metav1.NewControllerRef(tr, taskRunGroupVersionKind)
}

// GetStatusCondition returns the task run status as a ConditionAccessor
func (tr *TaskRun) GetStatusCondition() apis.ConditionAccessor {
return &tr.Status
}

// GetCondition returns the Condition matching the given type.
func (trs *TaskRunStatus) GetCondition(t apis.ConditionType) *apis.Condition {
return taskRunCondSet.Manage(trs).GetCondition(t)
Expand All @@ -193,10 +231,19 @@ func (trs *TaskRunStatus) GetCondition(t apis.ConditionType) *apis.Condition {
// InitializeConditions will set all conditions in taskRunCondSet to unknown for the TaskRun
// and set the started time to the current time
func (trs *TaskRunStatus) InitializeConditions() {
started := false
if trs.StartTime.IsZero() {
trs.StartTime = &metav1.Time{Time: time.Now()}
started = true
}
conditionManager := taskRunCondSet.Manage(trs)
conditionManager.InitializeConditions()
// Ensure the started reason is set for the "Succeeded" condition
if started {
initialCondition := conditionManager.GetCondition(apis.ConditionSucceeded)
initialCondition.Reason = TaskRunReasonStarted.String()
conditionManager.SetCondition(*initialCondition)
}
taskRunCondSet.Manage(trs).InitializeConditions()
}

// SetCondition sets the condition, unsetting previous conditions with the same
Expand Down
34 changes: 34 additions & 0 deletions pkg/apis/pipeline/v1beta1/taskrun_types_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -340,3 +340,37 @@ func TestHasTimedOut(t *testing.T) {
})
}
}

func TestInitializeTaskRunConditions(t *testing.T) {
tr := &v1beta1.TaskRun{
ObjectMeta: metav1.ObjectMeta{
Name: "test-name",
Namespace: "test-ns",
},
}
tr.Status.InitializeConditions()

if tr.Status.StartTime.IsZero() {
t.Fatalf("TaskRun StartTime not initialized correctly")
}

condition := tr.Status.GetCondition(apis.ConditionSucceeded)
if condition.Reason != v1beta1.TaskRunReasonStarted.String() {
t.Fatalf("TaskRun initialize reason should be %s, got %s instead", v1beta1.TaskRunReasonStarted.String(), condition.Reason)
}

// Change the reason before we initialize again
tr.Status.SetCondition(&apis.Condition{
Type: apis.ConditionSucceeded,
Status: corev1.ConditionUnknown,
Reason: "not just started",
Message: "hello",
})

tr.Status.InitializeConditions()

newCondition := tr.Status.GetCondition(apis.ConditionSucceeded)
if newCondition.Reason != "not just started" {
t.Fatalf("PipelineRun initialize reset the condition reason to %s", newCondition.Reason)
}
}
Loading