TEP-0004: Tekton Integrations - Annotations and Statuses #147

wlynch · 2020-07-13T23:53:07Z

TEP proposal for configuring post-pipeline integrations for Tekton
Pipelines via annotations and statuses.

/cc @afrittoli @dlorenc @imjasonh @bobcatfish @dibyom @bigkevmcd

tekton-robot · 2020-07-13T23:53:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign imjasonh
You can assign the PR to them by writing /assign @imjasonh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

TEP proposal for configuring post-pipeline integrations for Tekton Pipelines via annotations and statuses.

teps/0004-tekton-integrations-annotations-statuses.md

bobcatfish · 2020-07-14T18:18:27Z

Update from discussion offline: have been discussing similarities b/w this and tektoncd/pipeline#1740, might make sense to have this proposal directly tackle notification related requirements

imjasonh · 2020-07-14T18:49:08Z

Do we need to do something to ensure that updating the Run with new integration update status doesn't trigger new reconciliations for all watchers?

For example, a Run starts, and the GitHub status reconciler is notified, and it POSTs to GitHub and updates .status.integrations[name=github] to indicate that GitHub was notified of the status change. This results in a new version of the object (new observedGeneration) which means the GitHub status reconciler gets another update, etc., ...

support.

vdemeester

One thing that is "half" clear for me is what this TEP propose overall:

standard reserved annotation *.integration.tekton.dev
a new integration status field (optional)
more than this ? (who is responsible for what, …)

As commented (below), I don't see too much value in the integration status part of the *Run status. Or at least I don't see its value — other than "from a *Run what integration did run ; but I don't see too much use case for this (enlighten me please 👼), I don't see why we would "couple" integration with the *Runs.

More generic comments:

We could use we instead of I 😛
I think we can, most of the time, use the present tense (instead of the future), as, if the TEP gets accepted and implementable, it will be "the present" (aka what it's doing).

vdemeester · 2020-07-15T09:09:32Z

teps/0004-tekton-integrations-annotations-statuses.md

+**Examples:**
+
+- GitHub
+  ```
+  github.integrations.tekton.dev/owner: wlynch
+  github.integrations.tekton.dev/repo: test
+  github.integrations.tekton.dev/sha: 4d2f673c600a0f08c3f6737bc3f5c6a6a15d62e6
+  ```
+- Webhook
+  ```
+  webhook.integrations.tekton.dev/url: https://example.com:8080
+  ```
+- Slack
+  ```
+  slack.integrations.tekton.dev/url: https://tektoncd.slack.com
+  slack.integrations.tekton.dev/channel: build-cop
+  ```


Who is responsible for adding those ? Are those annotation provided by the user (or something like triggers, or both) or by the integration, or by both ?

Initially, my expectation is the integration would provide the annotation it supports in its documentation, then users would set these fields in their Runs (filling them in with params / TriggerTemplates).

Eventually, I'd like to get to a point where Triggers can do this automatically for users. Initially via ObjectMeta templating, but eventually via interceptors (or possibly some other mechanism) that can also be provided by an integration. e.g. If we know that a Trigger is for GitHub events, and we can filter for these types of events, we should be able to provide the relevant annotations to configure related integration functionality automatically. If users want to configure one-off builds for testing, they can still set the annotations manually.

This also designed in a way that other non-Tekton Trigger mechanisms could also leverage this if they wanted, or we can even mix and match mechanisms to trigger builds and related integrations (e.g. want to build a slack bot that retests pull requests? just set these annotations and you're good to go).

I think the exact details of how Triggers implements these features is a bit hand-wavy at the moment since a lot of it is work actively being discussed / designed, so I didn't want to go too deep in this proposal. The main goal here is to carve out the areas within the Runs API to define this data.

vdemeester · 2020-07-15T09:11:31Z

teps/0004-tekton-integrations-annotations-statuses.md

+- What integration the status is for (e.g. `github.integrations.tekton.dev`).
+- What object generation the status is for.
+- What was the outcome of the integration for this object (e.g. success,
+  failure).
+- (optional) More details of status outcome.


One thing to keep in mind is that TaskRun and PipelineRun are more or less "immutable", so I am not sure how fully useful the object generation will be as… except from the cancel field in spec, the rest of the spec won't change.

For TaskRun and PipelineRun spec, I 100% agree. status is where I see room for flexibility (since these are mutated during Pipeline execution anyway).

Re object/observed generation: I'm not particularly tied to this. I'm just borrowing this from our existing TaskRunStatus type which uses the Knative Status duck type.

vdemeester · 2020-07-15T09:14:46Z

teps/0004-tekton-integrations-annotations-statuses.md

+1. Push a commit, notice pull request wasn't updated for a run.
+2. Look for all Runs with Integration annotation for the repo / commit (if
+   empty, then you know there was no build kicked off).
+3. Find latest Run, `kubectl get` it, inspect integrations status field(s) for


This only works when you got access to the cluster (which, might "usually" not be the case)

True, but I think it's a reasonable assumption to say that users should be able to get at the Run object (i.e. if not through the k8s API, by Tekton Results), and that this will generally be a common pattern for debugging.

Updated to mention results / alternative mechanisms.

vdemeester · 2020-07-15T09:16:57Z

teps/0004-tekton-integrations-annotations-statuses.md

+pipeline run itself. This gives us an easy mechanism for users to do the
+following:
+
+1. Push a commit, notice pull request wasn't updated for a run.


Independently of this (but maybe related), an integration should ensure to do the proper thing whenever the *Run is created. e.g. for GitHub, as soon as the *Run is created, the checks that this *Run provides should be reseted (to pending). But then, the question is how do you know which checks this *Run provides ?

an integration should ensure to do the proper thing whenever the *Run is created

Agreed. There should definitely be some common behavior here to reset the state. As a first pass, I think clearing out the integration data should be sufficient, and should allow integrations to pick up a Run as if it was new.

But then, the question is how do you know which checks this *Run provides ?

I don't think we can or should enforce that the Run must declare the integrations it provides (though it declare some integrations in line if it wants to). This is to support integrations that are enforced automatically for users (e.g. CloudEvents, Results).

teps/0004-tekton-integrations-annotations-statuses.md

vdemeester · 2020-07-15T09:30:21Z

teps/0004-tekton-integrations-annotations-statuses.md

+  retryCount: 1
+```
+
+IntegrationStatus:


This is the first tmie we introduce this type / field — the detail about it is lower, so it feels a bit weird (because at that point of reading, I don't know what is this type, where it is and what's is "goal".

Yeah, I agree this is weird. Rearranged this a bit to not reference the type exactly until later. Let me know if this makes more sense or if I should make more changes.

vdemeester · 2020-07-15T09:33:52Z

teps/0004-tekton-integrations-annotations-statuses.md

+Similar to `TaskRunResults` and `PipelineResourceResults`, integration statuses
+are another form of external artifact generated as a result of the Pipeline.
+Because of this, it makes sense to store these statuses as a part of the
+Pipeline/TaskRun status, so that it is easy to find and see this data when
+something goes wrong.


I am not entirely sure about this. TaskRunResults and PipelineResourceResults are, by definition, results that are coming from the execution of the Task/Pipeline (aka what the task is supposed to do, like clone a git repository, …) through TaskRun/PipelineRun) This would not be the case with integration, as they do not necessary result from the execution of it (aka what the task/pipeline does), but only if it executed or not, …

I felt that this section was redundant with User Stories, so I took it out for the time being.

The reason why I feel this belongs in the Run status is because these integration actions are directly caused by the Run itself. Storing related data with it helps us couple the data to the object lifecycle, and makes it easy for users to get this information regardless of the tool they use.

I found out that this actually models an existing pattern of subresources (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#types-kinds), so I added another section about this under design details.

vdemeester · 2020-07-15T09:44:27Z

teps/0004-tekton-integrations-annotations-statuses.md

+
+## Design Details
+
+### Integration Status


I have a little bit of trouble around this new field in status and how useful they are from the PipelineRun/TaskRun perspective. I would "tend" to think, that I would need to look at the "integration" service/dashboard/… to see what failed and dig from there.

In my head, I am comparing this to the "food supply tracability chain" 🙃. When you buy your cereals, you can trace back to where/when/how each element of your cereals got in (aka where was it planted, where was it transformed, by whom it was shipped, …). From the top, you can get to the bottom — but from the bottom you don't really have a way (nor the need) to get to the top. I see PipelineRun and TaskRun as "the bottom" in there (from the integration perspective).

Let's take GitHub status checks as examples, and let's assume a Github integration (à-la commit-status-tracker). I submit my PR, the checks appear as pending. If there is any failure (before the pod execution, during, …), it's the integration responsability to give the user the information back, pointing to what went wrong — and in there, which PipelineRun it is, it's state, how to dig into it, … The PipelineRun itself doesn't need to know what integration is watching it, and how it handles thing.

In addition, writing the status from the integration "part" means possible race, possible reconciling loop for all the watchers, etc…

So I actually agree that users will generally start from the integration dashboard and working their way back to the Run.

That said, I think providing information the other way is equally as important for investigating when things do not work as expected. For example, say the access token we are using for GitHub exceeded quota (or the token was revoked, GitHub is down, etc)- this could manifest as statuses never showing up in GitHub, so we lose that link from the integration back to the Run. We currently don't have a mechanism to let users know that something is wrong.
Today our answer is "go look at the controller logs for errors", but this isn't really a good solution if the user doesn't have access to the pipeline controller (and they likely won't if this is a multitenant environment). Having a place to say: "we tried to update the status for , but we got this error" within a Run helps integrations give users transparency about what is happening as a result of their Task/PipelineRun, even though these things aren't necessarily executing as part of their Run, regardless of the state of the external integration.

I concede that in cases where everything worked this is less meaningful, but I think still provides benefit for consistency to help users verify that something actually happened.

To use your cereal analogy - this is like distributors keeping a log of outgoing shipments to grocery stores, so that if stores claim they never received a shipment, they can prove when the shipment left the warehouse, what was inside, tracking numbers, etc.

vdemeester · 2020-07-15T09:47:14Z

teps/0004-tekton-integrations-annotations-statuses.md

+### Automatically injecting Tasks
+
+This alternative keeps the idea of running user tasks inside of the Pipeline,
+but we make it easier for users by automatically appending integration tasks to
+the end of their pipeline / adding them as finally tasks.
+
+This has a few drawbacks:
+
+- We can't call tasks if the pipeline never runs.
+- The user pipeline needs to have access to all secrets, which limits what we
+  can do with shared multitenant secrets.


This describes what PipelineResource are doing 👼

Yes, but with the caveats:

PipelineResources can't do anything if the Pipeline fails to start (e.g. invalid TaskRefs)

They can't do anything in response to "accepted, but not running yet" (e.g. GitHub CheckRuns make a distinction between Queued and In Progress pending statuses)

Requires users to insert all integration lifecycle steps (the eventual goal is for Triggers to make this easier for users).

vdemeester · 2020-07-15T09:49:15Z

teps/0004-tekton-integrations-annotations-statuses.md

+As an alternative to storing integration status information within a
+Task/PipelineRun status, we could store this information in another CRD type
+that references back to the run. The advantage of this is that it avoids making
+an API change to the Task/PipelineRun object directly and the consequences that
+may arise.


I don't think we need another type for this. It would be on the integration itself to decide how it wants to store/display/… the status.

bobcatfish · 2020-07-16T14:37:20Z

What do you think if we were to address the notifications feature before we implement this TEP and then revisit it?

This TEP is aiming to solve these problems:

If the pipeline fails to start because of a configuration error, the external
integration is never notified.
If the pipeline fails a step and never reaches the notification step, the
external integration is never notified.
If a transient error (e.g. GitHub is down) prevents the task from completing
successfully, this may result in the pipeline failing.

My understanding of the notifications feature is that the notifications feature is all about creating integrations with external systems, but not just integrations: integrations with a level of importance greater than other integrations, such as Tasks within a Pipeline. Since notifications are about "notifying", i.e. telling the outside world about something that is happening, the problems you listed above feel to me like legit requirements for the notifications feature, e.g.:

"If the pipeline fails to start because of a configuration error, the external
integration is never notified" = if a pipeline fails to start, notifications should still occur
"If the pipeline fails a step and never reaches the notification step, the
external integration is never notified." = we don't really have a "notification step" yet since we have no notifications features, but again I think this goes back to the requirement above, notifications should occur even when other things fail. We might even want to go further and say that failure of a notification should not be tolerated, e.g. it should be retried, there should be an external alerting system triggered in this case
If a transient error (e.g. GitHub is down) prevents the task from completing successfully, this may result in the pipeline failing. = failures of notifications should not change the status of the pipeline / task execution they are notifying about

wlynch · 2020-07-16T20:15:41Z

@n3wscott You might be interested in this proposal, given your involvement in kubernetes/kubernetes#72637 (comment) and https://docs.google.com/document/d/1b1hRpJpSUE3xIG0v2IHFD0NBaxvRBTBnw3An7oZBV9I 😄

wlynch · 2020-07-16T23:23:03Z

@bobcatfish

What do you think if we were to address the notifications feature before we implement this TEP and then revisit it?

Depends on how notifications intend on handling status / receipt of delivery data. To me notifications is just a type of integration, so there's an argument to be made that this should happen soon rather than later so it's consistent with other integrations we add later on.

I need to learn more about the current state/plans of the notifications effort to have a more informed opinion about this.
(/cc @afrittoli It'd be great to chat more about this!)

This TEP is aiming to solve these problems:

If the pipeline fails to start because of a configuration error, the external
integration is never notified.

If the pipeline fails a step and never reaches the notification step, the
external integration is never notified.

If a transient error (e.g. GitHub is down) prevents the task from completing
successfully, this may result in the pipeline failing.

I would say this TEP is aiming to define a space within the Run API to enable other tools to address these problems, but not necessarily solve them directly (but I 100% agree that we should solve them). Even if we're only focused on in-pipeline notifications/actions for the time being, I think there's a benefit for having a space within the Run status / subresource to allow writing of status information and its effect on external services.

I'll look into adding a subsection to detail a more concrete example that can leverage this design (likely PR status updates and/or uploading of Tekton Results), but I don't want to get too bogged down by the exact details of the example.

bobcatfish · 2020-07-17T13:53:07Z

To me notifications is just a type of integration, so there's an argument to be made that this should happen soon rather than later so it's consistent with other integrations we add later on.

Without a specific "integration" to talk about, I find it very hard to endorse this proposal.

If you start looking at other parts of Tekton Pipelines there are a lot of pieces you could call "integrations", for example:

Triggers are "integrations" with outside sources of triggering
Emitting cloud events enables "integration" with whatever is consuming the cloud events
PipelineResources (which sometimes we talk about renaming as "plugins") "integrate" with the outside world by 'representing it on disk'
Custom Tasks allow "integration" with other non-tekton systems, e.g. writing a custom controller that interacts with GCB
Tasks themselves allow "integration" with a many of the common integrations listed above, e.g. code review tools, chat tools, bug trackers

So this proposal being about "integrations" in general isn't clear to me.

I don't think this is a direction we should go in until we have a clear need for it, which I don't see. Any tools outside of Tekton can store information in whatever way they see fit; I'm not convinced they need to be able to mutate Tekton types to do that.

One situation which I think would convince me is if we built a tool inside of Tekton that needs it, for example notifications, and I think you've highlighted some very important requirements for that feature, but I don't think it makes sense to address those requirements on their own.

bobcatfish · 2020-07-17T13:57:44Z

Any tools outside of Tekton can store information in whatever way they see fit; I'm not convinced they need to be able to mutate Tekton types to do that.

Side note that the annotations part of the proposal is something that a tool can do today, without requiring us to explicitly accept a TEP about it.

dlorenc · 2020-07-17T17:51:37Z

teps/0004-tekton-integrations-annotations-statuses.md

+update Tasks into their Pipelines. In practice, this ends up not being a good
+mechanism for ensuring that external integration notifications always occur, for
+example:
+


+100 to all of these problems!

teps/0004-tekton-integrations-annotations-statuses.md

Annotations / how to run integrations is a separate problem, and can come later. This helps address shorter term ideas with existing Notifications / CloudEvent work, while still enabling additional integrations to build on top of it independently.

wlynch · 2020-07-20T21:02:43Z

Hi everyone! After reading into more of the current CloudEvents implementation, I decided to table the annotations / reconciler bits included in this proposal and instead focus on just integration statuses and the discoverability of this information in Runs.

The CloudEvents implementation actually already follows much of the reconciler pattern previously described for some of the concerns raised above (e.g. to run on Pipeline failure) since it's currently baked into the Pipelines controller, with a separate discussion to pull it out into its own controller. I think the piece still missing here is exposing and making this information discoverable to Runs in a way that can also be used for other integrations.
I've also tried to add more context into this proposal where this fits in to existing work with Notifications/Actions/CloudEvents.

I'll likely raise the GitHub annotations/reconciler bits again in the future as a different TEP, reframed towards improving UX for GitHub CI statuses. 😃

wlynch · 2020-07-21T18:50:46Z

After some more discussion with @bobcatfish, I'm going to close this proposal. It's still early days for notifications / integrations, and it's not necessarily clear that modifying the Run status / using sub-resources is necessary / the best way to go about representing this.

I'll continue to think about how we surface this kind of information (e.g. whether a webhook/status update happened successfully), and see where we can fit this in with other notification efforts. Stay tuned!

tekton-robot requested review from afrittoli, bigkevmcd, bobcatfish, dibyom, dlorenc and imjasonh July 13, 2020 23:53

tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 13, 2020

wlynch force-pushed the integrations branch from b7e531c to 5d46c4f Compare July 14, 2020 13:25

TEP-0004: Tekton Integrations - Annotations and Statuses.

75c8c00

TEP proposal for configuring post-pipeline integrations for Tekton Pipelines via annotations and statuses.

wlynch force-pushed the integrations branch from 5d46c4f to 75c8c00 Compare July 14, 2020 13:26

wlynch changed the title ~~TEP-0004: Tekton Integrations~~ TEP-0004: Tekton Integrations - Annotations and Statuses Jul 14, 2020

imjasonh reviewed Jul 14, 2020

View reviewed changes

teps/0004-tekton-integrations-annotations-statuses.md Show resolved Hide resolved

Add integration status example, rephrase related alternative.

b0740d5

Add content for comparison with existing notifications and cloud event

fe40408

support.

vdemeester reviewed Jul 15, 2020

View reviewed changes

Move notification type details / examples to Design Details.

aa3596c

wlynch mentioned this pull request Jul 15, 2020

[WIP] Add continuous integration models (Projects and Builds) tektoncd/dashboard#1612

Closed

3 tasks

wlynch added 2 commits July 15, 2020 14:51

Less "I", more "we".

5545f4f

Add section on crd subresources.

6657f2b

vdemeester mentioned this pull request Jul 16, 2020

restricting tekton.dev labels in metadata tektoncd/pipeline#2891

Closed

4 tasks

This was referenced Jul 17, 2020

"Build Number" for PIpelineRuns scoped to the Pipeline tektoncd/pipeline#2957

Closed

Support more context variables (uid, annotations, …) tektoncd/pipeline#2958

Closed

dlorenc reviewed Jul 17, 2020

View reviewed changes

vdemeester mentioned this pull request Jul 20, 2020

Pipeline level sidecar tektoncd/pipeline#2973

Closed

n3wscott reviewed Jul 20, 2020

View reviewed changes

teps/0004-tekton-integrations-annotations-statuses.md Outdated Show resolved Hide resolved

vdemeester mentioned this pull request Jul 21, 2020

Adding TEP for task results in final tasks #153

Merged

wlynch closed this Jul 21, 2020

dibyom mentioned this pull request Jun 21, 2021

Proposal: Tekton Workflows #464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEP-0004: Tekton Integrations - Annotations and Statuses #147

TEP-0004: Tekton Integrations - Annotations and Statuses #147

wlynch commented Jul 13, 2020 •

edited

Loading

tekton-robot commented Jul 13, 2020

bobcatfish commented Jul 14, 2020

imjasonh commented Jul 14, 2020

vdemeester left a comment

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

wlynch Jul 15, 2020 •

edited

Loading

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

wlynch Jul 15, 2020

vdemeester Jul 15, 2020

bobcatfish commented Jul 16, 2020

wlynch commented Jul 16, 2020

wlynch commented Jul 16, 2020 •

edited

Loading

bobcatfish commented Jul 17, 2020

bobcatfish commented Jul 17, 2020

dlorenc Jul 17, 2020

wlynch commented Jul 20, 2020 •

edited

Loading

wlynch commented Jul 21, 2020

TEP-0004: Tekton Integrations - Annotations and Statuses #147

TEP-0004: Tekton Integrations - Annotations and Statuses #147

Conversation

wlynch commented Jul 13, 2020 • edited Loading

tekton-robot commented Jul 13, 2020

bobcatfish commented Jul 14, 2020

imjasonh commented Jul 14, 2020

vdemeester left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlynch Jul 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish commented Jul 16, 2020

wlynch commented Jul 16, 2020

wlynch commented Jul 16, 2020 • edited Loading

bobcatfish commented Jul 17, 2020

bobcatfish commented Jul 17, 2020

Choose a reason for hiding this comment

wlynch commented Jul 20, 2020 • edited Loading

wlynch commented Jul 21, 2020

wlynch commented Jul 13, 2020 •

edited

Loading

wlynch Jul 15, 2020 •

edited

Loading

wlynch commented Jul 16, 2020 •

edited

Loading

wlynch commented Jul 20, 2020 •

edited

Loading