status | title | creation-date | last-updated | authors | see-also | |||
---|---|---|---|---|---|---|---|---|
implemented |
Larger Results via Sidecar Logs |
2022-11-30 |
2022-12-15 |
|
|
This TEP builds on the hard work of many people who have been tackling the problem over the past couple years, including but not limited to:
- '@abayer'
- '@afrittoli'
- '@bobcatfish'
- '@dibyom'
- '@imjasonh'
- '@pritidesai'
- '@ScrapCodes'
- '@skaegi'
- '@tlawrie'
- '@tomcli'
- '@vdemeester'
- '@wlynch'
Today, Results
have a size limit of 4KB per Step
and 12KB per TaskRun
in the best case - see issue.
The goal of TEP-0086 is to support larger Results
beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP proposes experimenting with one of the alternatives - Sidecar
logs. This
allows us to support larger Results
which are stored within TaskRun
CRDs.
Results
are too small - see issue. The current implementation of Results
involves parsing from disk and
storing as part of the Termination Message
which has a limit of 4KB per Container
and 12KB per Pod
. As such,
the size limit of Results
is 12KB per TaskRun
and 4KB per Step
at best.
To make matters worse, the limit is divided equally among all Containers
in a Pod
- see issue. The more
the Steps
in a Task
, the less the size limit for Results
. For example, if there are 12 Steps
then each has
1KB in Termination Message
storage to produce Results
.
TEP-0086 aims to support larger Results
. It has many alternatives but no proposal
because there's no obvious "best" solution that would meet all the requirements.
This TEP proposes experimenting with Sidecar
logs to support larger Results
that are stored within TaskRun
CRDs.
This allows us to provide an immediate solution to the urgent needs of users, while not blocking pursuit of the other
alternatives.
In addition, the documented guidance is that Results
are used for outputs less than 1KB while Workspaces
are used for larger data. Supporting larger Results
up to the CRD limit allows users to reuse Tasks
in more
scenarios without having to change the specification to use Workspaces
upon hitting the current low size limit of
4KB per TaskRun
.
As a general rule-of-thumb, if a
Result
needs to be larger than a kilobyte, you should likely use aWorkspace
to store and pass it betweenTasks
within aPipeline
.
The main goal of this TEP is to support larger Results
via Sidecar
logs. The Results
are stored in the TaskRun
,
therefore they are limited by the size limit of a TaskRun
CRD - 1.5MB.
The following are out of scope for this TEP:
-
Solving use cases that requires really large
Results
beyond the size limit of a CRD - 1.5MB. -
Addressing other alternatives for larger
Results
that are listed in TEP-0086. However, this approach should co-exist with the other alternatives when they are implemented as experiments as well.
-
Support signing
Results
using SPIRE for non-falsifiable provenance that's required for SLSA L3. As described in TEP-0089, the signatures and certificates used to verifyResults
are stored alongside theResults
in theTermination Message
. This exacerbates the size limit issues forResults
. The size of the certificate is 800 bytes and the size of signatures is approximately 100 bytes perResult
.For now, signatures of the
Results
will be contained within theTermination Message
of thePod
, alongside any additional material required to perform verification. One consideration of this is the size of the additional fields required. The size of the certificate needed for verification is about 800 bytes, and the size of the signatures is about 100 bytes * (number ofResults
+ 1). The currentTermination Message
size is 4K, but there is TEP-0086 looking at supporting larger results. -
Support emitting structured
Results
. For example, store the images produced byko
and all their copies to the various regional registries in aTaskRun
. The releasePipeline
has this use case, and it ran into the current size limit of 4096 bytes. As described in the issue, the images produced were 4572 bytes which didn't fit intoResults
. For further details about structuredResults
, see TEP-0075 and TEP-0076.
To support larger Results
, we propose using stdout logs from a dedicated Sidecar
to return a json Result
object.
The Pipeline
controller would wait for the Sidecar
to exit and then read the logs based on a particular query and
append Results
to the TaskRun
.
The dedicated Sidecar
will be injected alongside other Steps
. The Sidecar
will watch the Results
paths of the
Steps
. This Sidecar
will output the name of the Result
and its contents to stdout in a parsable pattern. The
TaskRun
controller will access the stdout logs of the Sidecar
then extract the Results
and its contents.
After the Steps
have terminated, the TaskRun
controller will write the Results
from the logs of the Sidecar
instead of using the Termination Message
, hence bypassing the 4KB limit. This approach keeps the rest of the existing
functionality the same and does not require any external storage mechanism.
For further details, see the demonstration.
This proposal provides an opportunity to experiment with this solution to provide Results
within the CRDs as we
continue to explore other alternatives, including those that involve external storage.
This feature will be gated using a results-from
feature flag. This feature flag defaults to "termination-message"
for backwards compatibility - Results
will continue to pass through Termination Message
.
Users can set the results-from
feature flag to "sidecar-logs"
to enable the larger Results
through Sidecar
logs:
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"results-from":"sidecar-logs"}}'
Other alternatives can use the results-from
feature flag to introduce other approaches. For now, the field will only
accept "termination-message"
or "sidecar-logs"
.
This feature requires that the Pipeline
controller has access to Pod
logs.
Users have to grant get
access to all pods/log
to the Pipeline
controller:
kubectl apply -f config/enable-log-access-to-controller/
The size limit per Result
can be configured using the max-result-size
feature flag, which takes the integer value
of the bytes.
The max-result-size
feature flag defaults to 4096 bytes. This ensures that we support existing Tasks
with only one
Result
that uses up the whole size limit of the Termination Message
.
If users want to set the size limit per Result
to be something other than 4096 bytes, they can set max-result-size
by setting max-result-size: <VALUE-IN-BYTES>
. The value set here cannot exceed the CRD size limit of 1.5MB; if it
does, the controller logs an error and uses the default value.
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"max-result-size":"<VALUE-IN-BYTES>"}}'
Even though the size limit per Result
is configurable, the size of Results
is limited by CRD size limit of 1.5MB.
If the size of Results
exceeds this limit, then the TaskRun
will fail with a message indicating the size limit has
been exceeded.
In TEP-0100, we proposed changes to PipelineRun
status to reduce the amount of information stored about
the status of TaskRuns
and Runs
. Now, the PipelineRun
status is set up to handle larger Results
in TaskRuns
without storage issues.
For ChildReferences
to be populated, the embedded-status
must be set to "minimal"
. We recommend that the minimal
embedded status - ChildReferences
- is enabled while migration is ongoing until it becomes the only supported status.
This will ensure that larger Results
from its TaskRuns
will not bloat the PipelineRun
CRD.
The Sidecar
will run a binary that:
- receives argument for
Results
' paths and names which are identified fromtaskSpec.results
field - this allows theSidecar
to know theResults
it needs to read. - has
/tekton/run
volume mounted as read-only where status of eachStep
is written. - periodically checks for
Step
status in the path/tekton/run
. - when all
Steps
have completed, it immediately parses all theResults
in paths and prints them to stdout in a parsable pattern.
For further details, see the demonstration.
This proposal does not introduce any API changes to specification Results
. The changes are in implementation details
of Results
. The existing Tasks
will continue to function as they are, only that they can support larger Results
.
Even more, supporting larger Results
upto the CRD limit allows users to reuse Tasks
in more scenarios without
having to change the specification to use Workspaces
upon hitting the current low size limit of 4KB per TaskRun
.
This allows users to control execution, as needed by their context, without having to modify Tasks
and Pipelines
.
Users may write Tasks
that assume larger Results
support. These Tasks
would only work on Tekton Pipelines
installations that are configured to support it. This is a risk to Task
interoperability which is mitigated by:
- Hard limit on the size of
Results
which is the CRD size limit - 1.5MB. - Plan to support larger
Results
in the long run regardless of the implementation details.
This proposal provides a simple solution that solves most use cases:
- Users don't need additional infrastructure, such as server or object storage, to support larger
Results
. - Existing
Tasks
will continue to function as they do now, while supporting largerResults
, without any API changes.
Performance benchmarking with 20-30 PipelineRuns
, each with 3 TaskRuns
each with two Steps
:
- Average
Pipeline
controller's CPU difference duringPipelineRun
execution: 1% - Average
Pipeline
controller's Memory usage difference duringPipelineRun
execution: 0.2% - Average
Pod
startup time (time to get to running state) difference: 3s perTaskRun
In the experiment, we will continue to measure the startup overhead and explore ways to improve it.
For further details, see the performance metrics.
This approach requires that the Pipeline
controller has access to Pod
logs. The Pipeline
controller already has
extensive permissions in the cluster, such as read access to Secrets
. Expanding the access even further is a concern
for some users, but is also acceptable for some users given the advantages. We will document the extended permissions
so that users can make the right choice for their own use cases and requirements.
There will be a validation check to ensure that users cannot inject their own Sidecar
overtop of the one specified by
Tekton.
These are some questions we plan to answer in the experiment:
-
What impact does this change have on the startup and execution time of
TaskRuns
andPipelineRuns
? Can we improve the performance impact? -
How reliable is using
Sidecar
logs to processResults
? -
How many users adopt this solution? How many are satisfied with it given the advantages and disadvantages? We will conduct a user survey soon after the feature has been released.
- Implementation:
- Tekton Enhancement Proposals:
- Issues:
- Prior Work: