Add TEP-0046: Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod) #318

jlpettersson · 2021-01-26T20:52:32Z

This TEP describes an alternative way to run Pipelines composed of Tasks and Workspaces without any changes for an author and without the need for external storage to back the Workspaces. Without external storage, problems with scheduling of Pods that use the same volume is avoided.

This is a formalization of tektoncd/pipeline#3638

Some wording is stolen from @bobcatfish and TEP-0044 (these two TEPs might eventually converge?)

vdemeester · 2021-01-27T11:12:08Z

/assign

vdemeester

Thanks @jlpettersson for this TEP. Linking to #316 as both are somewhat similar. I like this approach, but a few comments though.

vdemeester · 2021-01-27T15:29:41Z

teps/0046-pipelinerun-in-a-pod.md

+This TEP describes an alternative way to run a Pipeline composed of Tasks sharing data through Workspaces.
+
+The way we run Tasks in a Pipeline that share data through a Workspace today has several problems, some
+of them are related to the need to use Kubernetes Persistent Volumes to back Workspaces and to
+schedule the different TaskRun Pods in an appropriate way, especially to allow multiple of them to run
+concurrently while accessing the same Workspace volume.


Is it an alternative way or an additional way ? Maybe it's a nit, but I would rather use the work additional as I don't think we should stop supporting the way PipelineRun are executing today.

The intention was not to change the current way, but to add an alternative way - a PipelineRun will use either, not both. But this TEP only describes the alternative way :) Don't know how I would phrase it in better english :)

vdemeester · 2021-01-27T15:30:15Z

teps/0046-pipelinerun-in-a-pod.md

+Currently the only way to run a Pipeline composed of Tasks is that each task is executed in its own Pod. If you combine
+Tasks in a Pipeline and they need to share data (beyond a simple
+[result](https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#using-results)),
+you'll need to provision a PVC or do some other similar, cloud specific thing,


s/cloud specific thing/cloud specific provisionning/ ?

vdemeester · 2021-01-27T15:41:54Z

teps/0046-pipelinerun-in-a-pod.md

+- Make it possible to run whole PipelineRun (including all TaskRuns) within a single Pod so that the workspace can be within the Pod:
+    - At runtime in a PipelineRun
+    - This should not change anything for the author of Tasks or Pipelines
+- No changes in the Pipeline or Task API for authors


We are only talking about the Pipeline and Task CRD here right ? (because PipelineRun CRD will most likely change)

Yeah, at least Pipeline and Task for the author. Didn't think that PipelineRun would need changes but I might be wrong. And when using this alternative, TaskRun CRD will not be used.

vdemeester · 2021-01-27T15:44:09Z

teps/0046-pipelinerun-in-a-pod.md

+
+### Goals
+
+- Make it possible to run whole PipelineRun (including all TaskRuns) within a single Pod so that the workspace can be within the Pod:


Should the goal to be to run the whole PipelineRun into a single Pod or do we see ourselves having the need to support running into multiple pods (but some task on the same pod).
My initial thought is, I think it make sense to go with this goal "the whole PipelineRun into a single Pod" for now, and we can definitely see if we need more and if something like a Pipeline CustomTask would be a good way to "group" tasks into separate pipelines.

Yes, I tried to keep the scope narrow. I think support for multiple pods is too broad scope - that can be in so many different ways - and can be reasoned about in other TEPs perhaps.

vdemeester · 2021-01-27T15:45:16Z

teps/0046-pipelinerun-in-a-pod.md

+  to speed up the build and see the full PipelineRun to be run in a single Pod, on a single Node WITHOUT NEEDING
+  to mount external volumes and worry that its Pipeline is composed of several Pods that might be scheduled to 
+  different cloud Availability Zones and reach a deadlocked state.
+- A user wants to create a Pipeline composed of Tasks from the catalog, to checkout code, run unit tests and upload results,


This use-case is shared with TEP-0044 👼🏼

teps/0046-pipelinerun-in-a-pod.md

ghost · 2021-01-27T16:01:29Z

Something we'll need to think about carefully here is that Kubernetes doesn't support the notion of per-Container Service Accounts. A ServiceAccount attached to a Pod is given to every Container and there isn't really a way to isolate them from each other (that I'm aware of?). This means that a PipelineRun with different Service Accounts attached to different Tasks would need to (I guess?) attach all the service accounts to the single Pod.

Maybe there's another way to tackle this using per-Container Projected Volumes for the Service Account tokens that each "PipelineTask" needs? Whatever the solution we'll need to make sure we document differences in the way PipelineRuns execute when in "single-Pod" mode versus "multi-Pod" mode.

bobcatfish · 2021-01-28T21:35:05Z

/assign

bobcatfish

@jlpettersson do you feel like we definitely need this TEP as well as #316 ?

My feeling is that TEP 44 (#316) describes overlapping (if not the same) problems, this PR adds some more usecases and requirements (and specifies configuring this at runtime, while TEP 44 has focused on authoring time) but also describes a solution: pipelinerun in a pod

I'm wondering if we can add the use cases from this, expand 44 to include the ability to control this at runtime, and then have "pipelinerun in a pod" as a possible solution

bobcatfish · 2021-01-29T22:54:40Z

teps/0046-pipelinerun-in-a-pod.md

+### Goals
+
+- Make it possible to run whole PipelineRun (including all TaskRuns) within a single Pod so that the workspace can be within the Pod:
+    - At runtime in a PipelineRun


hmm interesting, this is a difference with #316

jlpettersson · 2021-01-29T23:24:15Z

do you feel like we definitely need this TEP as well as #316 ?

@bobcatfish we probably eventually converge to one, I think? But this is written from a different problem-perspective.

I'm wondering if we can add the use cases from this, expand 44 to include the ability to control this at runtime

Sure, it can be added there, but also the whole TEP must be aligned with this then, including goal and requirements if that is ok? they are slightly different in some parts.

and then have "pipelinerun in a pod" as a possible solution

I think the problem for this TEP has been described in Design doc: Task parallelism when using workspace and we have been in and out in multiple alternatives, including the Affinity Assistant and a Custom Scheduler, I feel that this TEP is about "PipelineRun in a Pod", not only as a possible solution? the problem and alternatives has been discussed back and from since may 2020.

jlpettersson · 2021-01-29T23:47:25Z

Some evolution context here:

Pre Affinity Assistant: The taskRun Pods for a typical Build Pipeline was spread and executed on different Nodes in a cluster. The problem is that a Workspace volume can typically only be mounted at one Node at a time - so the tasks was executed in a sequence whether the Pipeline author wanted it or not.
With Affinity Assistant: This problem was solved to some extent by using Pod-Affinity to have all taskRun Pods for a typical Build Pipeline to run on the same Node. This solved the task-parallelism problem, but the solution is not without problems and we need a solution that can tackle them better.
PipelineRun in a Pod: This is a more solid solution with even less problem to them above for a typical Build Pipeline, now run in the same Pod. But yeah, this is very different and there will be other aspects of problems to think about.

jlpettersson · 2021-01-31T11:36:03Z

The high level problem description is in Design doc: Task parallelism when using workspace

Perhaps this is a bit more abstract enhancement proposal, not strictly tied to implementation:

To avoid the problems related to pod-external volumes, this enhancement proposal proposes a solution where the workspace volume can be inside the pod and the Tasks that uses that workspace volume need to be within the same scheduling unit, the Pod, to avoid the scheduling problems that the affinity assistant tried to solve, but had its shortcomings when it comes to scheduling, scaling and resource allocations.

For a Pipeline, the Tasks that does not use a workspace or an unrelated workspace, can potentially be within a different scheduling unit (a Pod in Kubernetes). It would be beneficial if this can be done without any API additions (to reduce cognitive load for author) for the Task and Pipeline author, similar to how it works when using the affinity assistant (it only co-schedule pods using the same PVC-workspace to the same node).

vdemeester · 2021-02-10T13:55:15Z

I don't mind having this as a short-term solution, especially since it solves such an acute source of pain for users (PVC scheduling), but I'd like to be upfront that it's a band-aid, and understand how it progresses toward a more general solution, and what that solution is. That doesn't have to be completely designed right away, but I'd like to have committed to it becoming designed at least.

As a rough sketch, this could progress as:
1. support "Pipeline-in-a-Pod" (this TEP)

2. support grouped Tasks-in-a-Pod as a Custom Task (all of a Pipeline can be run in a Pod using this)

3. (after Custom Task experimentation and finding a good API for it) support grouped Tasks as first-class in Tekton

4. deprecate this Pipeline-in-a-Pod band-aid in favor of first-class grouped Tasks
This progression might take many many releases to achieve, and that's fine. I want to make sure to avoid the situation where this "short-term" solution lasts forever as a unique execution mode for Pipelines that we have to support as a special case, forever. This can happen when the specific motivating use case (PVC scheduling) is "solved enough" and we collectively lose interest in solving the general case. :(

If this plan sounds good, I'm fine proceeding with this, but I'd like more detail about how it could (not necessarily will) progress toward a general solution.

Yes that's definitely sounds like a plan that I would like 👼🏼

ghost · 2021-02-10T13:58:27Z

Sorry I've written a long post here but TL;DR can we add a non-goal of handling security consequences of running a pipeline in a pod? Or can we include the security implications of introducing this mode as part of the TEP?

Full version:

I would like to focus on the fact that we are specifically talking about having different "modes" for Pipelines. There's quite a lot of literature on the way modes can cause different kinds of problems for users and I think some of that might apply here. Just one example - here's an article from Nielson Norman Group that discusses this in the context of UIs. I want to call out the following parts:

Modes become useful when we have too many different options that we want to make available to users, and not enough available types of input to accommodate them all

^ This could be a good argument in favor of adding a "Pipeline-in-a-Pod" mode. Are we in a situation where we have too many options and not enough available "types" (CRDs? fields?) to capture all the variations that a user may want to employ? Are we unwilling or unable or uninterested to add more "types of input"?

Modes can cause a range of usability problems, including mode slips (occurring when the user is not aware of the currently active mode) and low discoverability of mode-specific features.

^ Are we going to introduce a large new suite of problems for users to navigate if we introduce this kind of execution mode toggle? To me this is made more complicated by the fact that "user" here actually might be multiple people - the Pipeline author, the PipelineRun user, etc.

As a Pipeline author I will now need to be aware that all my Tasks will possibly be consolidated into a single Pod on a single Node. That means any purposeful isolation I've introduced (e.g. separating credential Secrets so they're only accessed by a Task with images that my org trusts) could be violated by a PipelineRun. This ability for a PipelineRun to subvert the intention of a Pipeline seems to me worth exploring more in depth.

Moving on in the article, I want to call out this part:

Avoiding modes entirely when mode slips can have unsafe outcomes (such as accidental loss of work, deletion of data, embarrassment, or physical-safety consequences). Even if two features are conceptually similar (such as the aforementioned plane’s descent controls), if accidentally mixing them up can cause real harm, go for other design alternatives — like two separate controls.

The one I identified above (all service accounts will need to be exposed to the Pod) seems to me quite a considerable potential source of a) embarassment and b) safety or security consequences. So if we proceed down this path I really do want us to go in eyes-wide-open on what specifically we are going to expose unknowing users to when we recommend they switch their PipelineRuns into "Pod mode" because they're struggling to use Workspaces.

Sooo I think there should be more detail in this TEP on the specific set of tradeoffs that we would be willing to accept by running with Pipelines-in-a-Pod mode before we commit to introducing it. It could be that we need to add more Non-Goals, or we may want to describe the acceptable solutions. I'm personally particularly concerned about all Secrets, Service Accounts and ConfigMaps being exposed to a single Pod when the Pipeline author may have intended to isolate them into different Tasks. If we're not going to propose a solution to that as part of the TEP can we at least say that it's a non-goal to specify any security consequences related to running everything in a single unit of execution?

ghost · 2021-02-10T14:35:22Z

I don't mind having this as a short-term solution, especially since it solves such an acute source of pain for users (PVC scheduling), but I'd like to be upfront that it's a band-aid, and understand how it progresses toward a more general solution, and what that solution is. That doesn't have to be completely designed right away, but I'd like to have committed to it becoming designed at least.

+1 to this from my pov too.

Would we put this mode straight into beta or keep it in alpha while we work on the general solution? It might be useful to include its flagged-ness or gated-ness as part of this TEP too?

vdemeester · 2021-02-10T15:41:00Z

Would we put this mode straight into beta or keep it in alpha while we work on the general solution? It might be useful to include its flagged-ness or gated-ness as part of this TEP too?

For me it would definitely not being exposed in beta directly.

…sks and Workspaces without any changes for an author and without the need for external storage to back the Workspaces. Without external storage, problems with scheduling of Pods that use the same volume is avoided.

jlpettersson · 2021-02-16T06:46:58Z

That is good input @sbwsg, thanks.

I have added a Non-goal about addressing the security implications followed when most Secrets is mounted by the same Pod. And also clarified that only a single ServiceAccount can be used for a Pod - or multiple Tasks in this case.

tekton-robot · 2021-02-23T13:56:38Z

@jlpettersson: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jerop

In the most recent API working group we decided to keep this TEP and [TEP-0046](tektoncd#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!

bobcatfish

I've updated TEP-0044 with a compare + contrast with this TEP: 75d8537

From my perspective I'm happy to merge this TEP problem statement as-is and move forward

/approve

@vdemeester is also a reviewer on this one, and @sbwsg and @imjasonh had some feedback also (maybe addressed now that the TEP isn't assuming that pipelinerun on a pod is the solution we're going with?)

/hold

bobcatfish · 2021-02-24T21:25:52Z

teps/0046-colocation-of-tasks-and-workspaces.md

+- Make it possible to use a Pod-internal workspace to share data in a PipelineRun
+    - The Tasks that use the workspace is scheduled to run within the same scheduling unit (Pod)
+    - That Pipeline-features in use today is still usable, e.g. concurrency and `When`-expressions
+- No changes in the Pipeline or Task API for authors


@jlpettersson I forgot to ask you in the API working group: does this mean that you're proposing that this is something that is configured only at runtime (i.e. in the PipelineRun only)? or would you imagine the pipeline author expressing this also/instead?

tekton-robot · 2021-02-24T21:43:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [bobcatfish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jerop

In the most recent API working group we decided to keep this TEP and [TEP-0046](tektoncd#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!

@jerop

In the most recent API working group we decided to keep this TEP and [TEP-0046](tektoncd#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!

@jerop

In the most recent API working group we decided to keep this TEP and [TEP-0046](tektoncd#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!

@jerop

In the most recent API working group we decided to keep this TEP and [TEP-0046](tektoncd#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!

@jerop

In the most recent API working group we decided to keep this TEP and [TEP-0046](#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!

vdemeester · 2021-04-29T13:46:35Z

@bobcatfish @jlpettersson should we close this TEP and make sure TEP-0044 takes into account requirements from this one ?

jlpettersson · 2021-04-29T15:39:19Z

should we close this TEP and make sure TEP-0044 takes into account requirements from this one ?

This TEP probably address a slightly broader problem, since it also require parallel/concurrent Task execution.

But the TEP-0046 number is stolen now, so this cannot be merged. I am closing it.

bobcatfish · 2021-05-17T14:29:05Z

@jlpettersson I don't want this to get lost just due to a conflicting TEP number - please let me know if I can help at all - I DO think that TEP-0044 is going to address what you mentioned around parallel and concurrent task execution in the long run.

tekton-robot requested review from hrishin and sm43 January 26, 2021 20:52

tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 26, 2021

tekton-robot assigned vdemeester Jan 27, 2021

vdemeester reviewed Jan 27, 2021

View reviewed changes

pritidesai mentioned this pull request Jan 27, 2021

Replacing PipelineResources in tutorial tektoncd/pipeline#3705

Closed

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 2949bfb to 4ff7333 Compare January 27, 2021 23:35

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2021

vdemeester added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jan 28, 2021

jlpettersson mentioned this pull request Jan 28, 2021

reuse Tasks inside the same Pod? tektoncd/pipeline#3476

Closed

tekton-robot assigned bobcatfish Jan 28, 2021

pritidesai mentioned this pull request Jan 28, 2021

TEP-0047: Pipeline Task Display Name #319

Merged

jlpettersson mentioned this pull request Jan 29, 2021

Support custom TopologyKey in Affinity Assistant tektoncd/pipeline#3731

Closed

bobcatfish reviewed Jan 29, 2021

View reviewed changes

Base automatically changed from master to main February 3, 2021 16:34

vdemeester mentioned this pull request Feb 3, 2021

TEP-0040 - ignore step error #302

Merged

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 4ff7333 to 1d8f8da Compare February 7, 2021 19:43

tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 7, 2021

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 1d8f8da to 1891e1c Compare February 7, 2021 19:52

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2021

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 1891e1c to 7056c4d Compare February 7, 2021 19:57

jlpettersson changed the title ~~Add TEP-0046: PipelineRun in a Pod~~ Add TEP-0046: Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod) Feb 7, 2021

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 7056c4d to f672a25 Compare February 7, 2021 19:59

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 56c2462 to 41ffca2 Compare February 16, 2021 06:35

jlpettersson force-pushed the tep-pipelinerun-in-a-pod branch from 41ffca2 to 9c2329f Compare February 16, 2021 06:43

bobcatfish mentioned this pull request Feb 16, 2021

Design: "Finally" Steps in Tasks tektoncd/pipeline#2448

Closed

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021

bobcatfish reviewed Feb 24, 2021

View reviewed changes

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 24, 2021

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2021

jstrachan mentioned this pull request Mar 5, 2021

TEP-0054 Add a step reuse proposal #369

Closed

jlpettersson mentioned this pull request Mar 10, 2021

Help Scheduler make the right choice tektoncd/pipeline#3824

Closed

ghost mentioned this pull request Mar 16, 2021

Question: Best practice for storage and cleanup in a Tekton CI pipeline tektoncd/pipeline#3833

Closed

lugeng mentioned this pull request Apr 29, 2021

Exec Steps Concurrent in task (task support DAG) tektoncd/pipeline#3900

Closed

jlpettersson closed this Apr 29, 2021

bobcatfish mentioned this pull request Jun 3, 2021

Experimental Custom Task Proposal: Pipeline to TaskRun #447

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TEP-0046: Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod) #318

Add TEP-0046: Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod) #318

jlpettersson commented Jan 26, 2021 •

edited

Loading

vdemeester commented Jan 27, 2021

vdemeester left a comment

vdemeester Jan 27, 2021

jlpettersson Jan 27, 2021

vdemeester Jan 27, 2021

vdemeester Jan 27, 2021

jlpettersson Jan 27, 2021

vdemeester Jan 27, 2021

jlpettersson Jan 27, 2021

vdemeester Jan 27, 2021

ghost commented Jan 27, 2021

bobcatfish commented Jan 28, 2021

bobcatfish left a comment

bobcatfish Jan 29, 2021

jlpettersson commented Jan 29, 2021 •

edited

Loading

jlpettersson commented Jan 29, 2021

jlpettersson commented Jan 31, 2021

vdemeester commented Feb 10, 2021

ghost commented Feb 10, 2021

ghost commented Feb 10, 2021

vdemeester commented Feb 10, 2021

jlpettersson commented Feb 16, 2021

tekton-robot commented Feb 23, 2021

bobcatfish left a comment

bobcatfish Feb 24, 2021

tekton-robot commented Feb 24, 2021

vdemeester commented Apr 29, 2021

jlpettersson commented Apr 29, 2021

bobcatfish commented May 17, 2021


		### Goals

		- Make it possible to run whole PipelineRun (including all TaskRuns) within a single Pod so that the workspace can be within the Pod:

Add TEP-0046: Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod) #318

Add TEP-0046: Colocation of Tasks and Workspaces (formerly PipelineRun in a Pod) #318

Conversation

jlpettersson commented Jan 26, 2021 • edited Loading

vdemeester commented Jan 27, 2021

vdemeester left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Jan 27, 2021

bobcatfish commented Jan 28, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlpettersson commented Jan 29, 2021 • edited Loading

jlpettersson commented Jan 29, 2021

jlpettersson commented Jan 31, 2021

vdemeester commented Feb 10, 2021

ghost commented Feb 10, 2021

ghost commented Feb 10, 2021

vdemeester commented Feb 10, 2021

jlpettersson commented Feb 16, 2021

tekton-robot commented Feb 23, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Feb 24, 2021

vdemeester commented Apr 29, 2021

jlpettersson commented Apr 29, 2021

bobcatfish commented May 17, 2021

jlpettersson commented Jan 26, 2021 •

edited

Loading

jlpettersson commented Jan 29, 2021 •

edited

Loading