Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish task fails, IMAGES result too large #4282

Open
afrittoli opened this issue Oct 6, 2021 · 19 comments
Open

Publish task fails, IMAGES result too large #4282

afrittoli opened this issue Oct 6, 2021 · 19 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@afrittoli
Copy link
Member

Expected Behavior

It is possible to release Tekton Pipelines

Actual Behavior

The publish task fails because the IMAGES result it too large:

{"level":"fatal","ts":1633523339.3185616,"caller":"entrypoint/entrypointer.go:203","msg":"Error while handling results: Termination message is above max allowed size 4096, caused by large task result.","stacktrace":"github.com/tektoncd/pipeline/pkg/entrypoint.Entrypointer.Go\n\tgithub.com/tektoncd/pipeline/pkg/entrypoint/entrypointer.go:203\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/entrypoint/main.go:126\nruntime.main\n\truntime/proc.go:225"}

Steps to Reproduce the Problem

  1. Trigger a nightly release

Additional Info

The IMAGES result is used by Tekton Chais to sign the container images.
The result includes all the container images produced by ko plus all their copies to the various regional registries.

  • Kubernetes version:
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.13-gke.1900", GitCommit:"ee714a7b695ca42b9bd0c8fe2c0159024cdcba5e", GitTreeState:"clean", BuildDate:"2021-08-11T09:19:42Z", GoVersion:"go1.15.13b5", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:
Client version: 0.19.0
Pipeline version: v0.27.3
Triggers version: v0.16.0
Dashboard version: v0.19.0
@afrittoli afrittoli added the kind/bug Categorizes issue or PR as related to a bug. label Oct 6, 2021
@afrittoli
Copy link
Member Author

@vdemeester
Copy link
Member

Ah 😅 This bring some light and urgency on the TEP around this problem then 🙃

@afrittoli
Copy link
Member Author

afrittoli commented Oct 6, 2021

Heh, indeed... but we'll need a solution before the TEP though. Using multiple results would not help, we would need to use multiple tasks 😅

@afrittoli
Copy link
Member Author

The result looks like this:

gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/controller@sha256:8e749dc794d6c26b54842599eaa61b6ecbc1161d4c8207f6227089a74272d838,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/controller@sha256:8e749dc794d6c26b54842599eaa61b6ecbc1161d4c8207f6227089a74272d838,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/controller@sha256:8e749dc794d6c26b54842599eaa61b6ecbc1161d4c8207f6227089a74272d838,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/controller@sha256:8e749dc794d6c26b54842599eaa61b6ecbc1161d4c8207f6227089a74272d838,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:fa6706ae3562ddaa3cf1efbfe3bf56cb1a07bcf9bdfbb191dc79b0b7cf3bd889,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:fa6706ae3562ddaa3cf1efbfe3bf56cb1a07bcf9bdfbb191dc79b0b7cf3bd889,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:fa6706ae3562ddaa3cf1efbfe3bf56cb1a07bcf9bdfbb191dc79b0b7cf3bd889,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:fa6706ae3562ddaa3cf1efbfe3bf56cb1a07bcf9bdfbb191dc79b0b7cf3bd889,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/git-init@sha256:64cfa7edd4243ecac8287b475ddd7745b44b0b2be2a21065aea5b202762d0bad,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/git-init@sha256:64cfa7edd4243ecac8287b475ddd7745b44b0b2be2a21065aea5b202762d0bad,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/git-init@sha256:64cfa7edd4243ecac8287b475ddd7745b44b0b2be2a21065aea5b202762d0bad,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/git-init@sha256:64cfa7edd4243ecac8287b475ddd7745b44b0b2be2a21065aea5b202762d0bad,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/entrypoint@sha256:ae20b7863effaa2cc620acc9cf6ff1f80681aab7e84419a388f3579a6392cb2c,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/entrypoint@sha256:ae20b7863effaa2cc620acc9cf6ff1f80681aab7e84419a388f3579a6392cb2c,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/entrypoint@sha256:ae20b7863effaa2cc620acc9cf6ff1f80681aab7e84419a388f3579a6392cb2c,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/entrypoint@sha256:ae20b7863effaa2cc620acc9cf6ff1f80681aab7e84419a388f3579a6392cb2c,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/nop@sha256:22308e68d9d550ea3d5af81f289529e6ab2b2d0f4e34b419aa3b4c867c8d7cbc,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/nop@sha256:22308e68d9d550ea3d5af81f289529e6ab2b2d0f4e34b419aa3b4c867c8d7cbc,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/nop@sha256:22308e68d9d550ea3d5af81f289529e6ab2b2d0f4e34b419aa3b4c867c8d7cbc,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/nop@sha256:22308e68d9d550ea3d5af81f289529e6ab2b2d0f4e34b419aa3b4c867c8d7cbc,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/imagedigestexporter@sha256:5f2ddfddf0930cd1907bec0006a613dbfce2d69184d7ed552acbec1d769e50dc,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/imagedigestexporter@sha256:5f2ddfddf0930cd1907bec0006a613dbfce2d69184d7ed552acbec1d769e50dc,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/imagedigestexporter@sha256:5f2ddfddf0930cd1907bec0006a613dbfce2d69184d7ed552acbec1d769e50dc,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/imagedigestexporter@sha256:5f2ddfddf0930cd1907bec0006a613dbfce2d69184d7ed552acbec1d769e50dc,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/pullrequest-init@sha256:c43f269ea4e66e85bb9611c89e7d2fe681b520286243a77e75479d338d0a84bc,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/pullrequest-init@sha256:c43f269ea4e66e85bb9611c89e7d2fe681b520286243a77e75479d338d0a84bc,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/pullrequest-init@sha256:c43f269ea4e66e85bb9611c89e7d2fe681b520286243a77e75479d338d0a84bc,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/pullrequest-init@sha256:c43f269ea4e66e85bb9611c89e7d2fe681b520286243a77e75479d338d0a84bc,
gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/webhook@sha256:6b9b7afe486afb7f71e84958a53013603b32dff3cc90c140d3b5c0606fe291c2,
us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/webhook@sha256:6b9b7afe486afb7f71e84958a53013603b32dff3cc90c140d3b5c0606fe291c2,
eu.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/webhook@sha256:6b9b7afe486afb7f71e84958a53013603b32dff3cc90c140d3b5c0606fe291c2,
asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/webhook@sha256:6b9b7afe486afb7f71e84958a53013603b32dff3cc90c140d3b5c0606fe291c2,

That is 4572 characters, which won't fit. I think the only alternative for now is to only sign the image on gcr.io and we can start signing the geo copies once we solve the issue on results size.

afrittoli added a commit to afrittoli/pipeline that referenced this issue Oct 6, 2021
The release process generates and number of container images and
publishes them to gcr.io as well as three regional versions of the
registry. Today all those images are added to the IMAGES result,
for signing by chains, however that causes the nightly build to
fail as we hit the termination message size limit.

Until we have a way to store larger results, we shall only sign
images on gcr.io as a workaround.

Workaround to tektoncd#4282

Signed-off-by: Andrea Frittoli <[email protected]>
@afrittoli afrittoli self-assigned this Oct 6, 2021
@afrittoli afrittoli added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 6, 2021
@vdemeester
Copy link
Member

Heh, indeed... but we'll need a solution before the TEP though. Using multiple results would not help, we would need to use multiple tasks sweat_smile

It wouldn't because of the termination message limit thingy right ?

@afrittoli
Copy link
Member Author

Heh, indeed... but we'll need a solution before the TEP though. Using multiple results would not help, we would need to use multiple tasks sweat_smile

It wouldn't because of the termination message limit thingy right ?

Yes, indeed. We store results in the POD termination message, so having multiple results or multiple steps does not help.

@pritidesai
Copy link
Member

Heh, indeed... but we'll need a solution before the TEP though. Using multiple results would not help, we would need to use multiple tasks sweat_smile

It wouldn't because of the termination message limit thingy right ?

Yes, indeed. We store results in the POD termination message, so having multiple results or multiple steps does not help.

So indirectly Chains has the limitation and does not support a taskRun producing so many images 😞

@priyawadhwa
Copy link

Heh, indeed... but we'll need a solution before the TEP though. Using multiple results would not help, we would need to use multiple tasks 😅

We ended up having to use multiple tasks for distroless, but it's pretty hacky 😕 Signing only gcr.io seems like a good short term solution. We also have a branch in Chains with a prototype Chains API, which could be reworked a little to also accept large results 🤔

tekton-robot pushed a commit that referenced this issue Oct 6, 2021
The release process generates and number of container images and
publishes them to gcr.io as well as three regional versions of the
registry. Today all those images are added to the IMAGES result,
for signing by chains, however that causes the nightly build to
fail as we hit the termination message size limit.

Until we have a way to store larger results, we shall only sign
images on gcr.io as a workaround.

Workaround to #4282

Signed-off-by: Andrea Frittoli <[email protected]>
@afrittoli afrittoli added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Oct 7, 2021
@afrittoli
Copy link
Member Author

Since the workaround was merged, I downgraded the priority of the issue now.

@afrittoli
Copy link
Member Author

Successful nightly run: https://dashboard.dogfooding.tekton.dev/#/namespaces/tekton-nightly/pipelineruns/pipeline-release-nightly-gp4wm?pipelineTask=git-clone&step=clone

@pritidesai
Copy link
Member

pritidesai commented Oct 7, 2021

@priyawadhwa any thoughts on using something other than a single task result IMAGES? Anything I can think of sounds hacky, for example, a dedicated section in a taskRun to maintain the list of images other than the task result?

EDIT: something like taskRun.status.images in addition to IMAGES task result

@afrittoli
Copy link
Member Author

@priyawadhwa any thoughts on using something other than a single task result IMAGES? Anything I can think of sounds hacky, for example, a dedicated section in a taskRun to maintain the list of images other than the task result?

EDIT: something like taskRun.status.images in addition to IMAGES task result

@pritidesai if we go down that route we might consider adding an artifact section instead, as it's not only container images that we might work with - which feels a bit like going back to PipelineResources :]

@priyawadhwa
Copy link

@pritidesai if we go down that route we might consider adding an artifact section instead, as it's not only container images that we might work with - which feels a bit like going back to PipelineResources :]

From what I remember you also had to specify your PipelineResources upfront (but I might have that wrong!) If that's the case it can get pretty inconvenient if you're building more than 3 images in a task. The nice thing about IMAGES result is that it's dynamic in that way.

@pritidesai
Copy link
Member

The image resource might not work with such dynamism. The outputs.resources has to list the number of images a task is going to produce in advance.

  outputs:
    resources:
    - name: builtImage
      type: image

I was thinking of a solution which is a little more structured than a task result.

chenbh pushed a commit to chenbh/pipeline that referenced this issue Oct 27, 2021
The release process generates and number of container images and
publishes them to gcr.io as well as three regional versions of the
registry. Today all those images are added to the IMAGES result,
for signing by chains, however that causes the nightly build to
fail as we hit the termination message size limit.

Until we have a way to store larger results, we shall only sign
images on gcr.io as a workaround.

Workaround to tektoncd#4282

Signed-off-by: Andrea Frittoli <[email protected]>
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2022
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 4, 2022
@pritidesai
Copy link
Member

pritidesai commented Feb 4, 2022

this still needs resolution, could be addressed by #4012 and TEP-0086.

/lifecycle frozen

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 4, 2022
@afrittoli
Copy link
Member Author

We discussed this in the Tekton Data Interface working group.
@wlynch commented :

We probably don't want to doing individual signing events for each image in each region.

These are going to be different signing events so you'll end up with different signature checksums because of embedded timestamps which throws people off sometimes. We likely want to promote each image to each region with their existing signatures via
cosign cp.
See https://github.com/kubernetes/registry.k8s.io/issues/187 for a similar issue.

@afrittoli
Copy link
Member Author

Thanks @wlynch - good point, I agree we should not sign regional copies separately.

Since the signing happens out of band (performed by chain) we cannot really copy the signature to the regional copies, unless we trigger another pipeline after the signature happen. This is probably ok since signature files are much smaller than the images.

We could copy the SBOM files around, but that's a separate issue. I would propose we close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
Status: Todo
Development

No branches or pull requests

5 participants