mutate.Time should be lazy #1168

imjasonh · 2021-11-05T15:47:41Z

pkg/v1/mutate.Time sets all timestamps in an image to the given time, typically to make the image more reproducible. For timestamps in layers, it calls layerTime which reads the layer contents and writes them to a buffered tar.Writer. This means that if the layer is huge, we may run out of memory buffering its contents.

We should consider making this a lazy transformation, so that layer mutations aren't made until they're read. We did a similar thing in pkg/v1/cache, where layer contents aren't cached until they're read via Compressed or Uncompressed.

The text was updated successfully, but these errors were encountered:

jonjohnsonjr · 2021-11-08T17:36:48Z

At one point I experimented with this. The API was gross due to streaming layers, but that was so long ago there might be a less naive way to do it: jonjohnsonjr@542d74b

imjasonh · 2021-11-08T17:40:38Z

Yeah I like this Mutation concept. It doesn't help that much with mutate.Time though since that wants to deal with files in the tar archive, and that's still a huge pain to extract from the v1.Image, especially streamily.

Maybe we could expose some adapter that takes a func(tar.Header) (tar.Header, error)?

jonjohnsonjr · 2021-11-08T19:38:09Z

For the layerTime, I ended up having a ReadCloser -> ReadCloser: jonjohnsonjr@542d74b#diff-05a233d3122c90ba0cf547c623e3a34194cf56c4ff05cdf414d5a265300b31e6R39

github-actions · 2022-02-07T01:24:16Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Keep fresh with the 'lifecycle/frozen' label.

This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime. As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that should really be addressed in go-containerregistry itself).

This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime.

bh-tt · 2024-10-23T12:19:03Z

I've improved this partially in GoogleContainerTools/kaniko#3347, but I wonder if I should submit the PR here? I see the value in using lazy logic here, that is likely a better method than what I've requested in kaniko. Would you be open to a PR implementing the lazy transformation much like @jonjohnsonjr had?

imjasonh added the good first issue Good for newcomers label Nov 5, 2021

imjasonh mentioned this issue Nov 11, 2021

Crane operations on Windows containers #1111

Closed

github-actions bot added the lifecycle/stale label Feb 7, 2022

github-actions bot closed this as completed Mar 9, 2022

imjasonh reopened this Mar 9, 2022

imjasonh added lifecycle/frozen and removed lifecycle/stale labels Mar 9, 2022

zx96 mentioned this issue Apr 22, 2023

Add --zero-file-timestamps flag GoogleContainerTools/kaniko#2477

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutate.Time should be lazy #1168

mutate.Time should be lazy #1168

imjasonh commented Nov 5, 2021

jonjohnsonjr commented Nov 8, 2021

imjasonh commented Nov 8, 2021

jonjohnsonjr commented Nov 8, 2021

github-actions bot commented Feb 7, 2022

bh-tt commented Oct 23, 2024 •

edited

Loading

mutate.Time should be lazy #1168

mutate.Time should be lazy #1168

Comments

imjasonh commented Nov 5, 2021

jonjohnsonjr commented Nov 8, 2021

imjasonh commented Nov 8, 2021

jonjohnsonjr commented Nov 8, 2021

github-actions bot commented Feb 7, 2022

bh-tt commented Oct 23, 2024 • edited Loading

bh-tt commented Oct 23, 2024 •

edited

Loading