-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--reproducible flag massively increases build time #1960
Comments
For your second question, see here Using the flag, all timestamps are stripped off and no history is available. Will have to dig deeper into performance issue. Can you clarify your use case a little bit more? |
Thank you for your answer @tejal29 ! So this behavior to completely strip off the timestamps and the history should be expected. Reproducible flag is not preferred because of the overhead in the build time andthe stripped down timestamps. We value those timestamps and the history of the image |
Hey, just to add some voice to the issue, I am also seeing super slow builds with I suspect (but have no proof here) that it is because it takes some time to strip the timestamp metadata after the image is built, the image/layers needs to be extracted, changed and then repacked. To be fair I have noticed that I use a very old version of Kaniko ( |
Ok I have tested and see that it still very slow with Let me show an example: # > cat Dockerfile
FROM quay.io/pypa/manylinux2014_x86_64
ENV FOO=BAR Here with
Here without, it takes 8 seconds
|
This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime. As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that should really be addressed in go-containerregistry itself).
This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime. As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that should really be addressed in go-containerregistry itself).
This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime.
With kaniko-project/executor:v1.19.2-debug, building the same image:
Activing profiling (https://github.com/GoogleContainerTools/kaniko#kaniko-builds---profiling) I see lot of traces inflate/deflate with |
Can confirm that I have this same issue |
Fixes GoogleContainerTools#862, may mitigate GoogleContainerTools#1960 The layerTime function returns a layer backed by an in-memory buffer, which means during the stripping of timestamps the entire image is loaded into memory. Additionally, it runs gzip after the layer is created, resulting in even an even larger in-memory blob. This commit changes this method to use a temporary file instead of an in-memory buffer, and to use gzip compression while writing to this layer file, instead of compressing during read. Signed-off-by: bh <[email protected]>
I am building one large docker image and I am experiencing a somewhat weird behavior that I would like to address.
Host info:
So we are talking about a large-ish image (~2.5GB) and when I built it with cache it is being built at ~1 min.
So far so good, but the problem that I am facing is that this cached image is always creating a new artifact with a new sha256.
I found out that for that issue the
--reproducible
flag exists and it does exactly that. When I built the image with the--reproducible
flag it doesn't create a new artifact with a new sha256 but the build time increases from ~1 min to ~7 mins. That's a huge overhead imho and I would like to figure that out.I got logs (with trace verbosity) when the
--reproducible
flag was enabled and when it was not. The main difference that was found in the logs:--reproducible
enabled--reproducible
disabledIf you watch closely in the timestamps when
--reproducible
is enabled the stagemapping digest...
is ~5 mins longer compared when the-reproducible
is disabled.Why is this behavior and why does it add so much overhead in the build process?
Is there any other way to use cache to build an image but not create a new artifact like docker does.
--reproducible
completely strips down the timestamps which provides a not useful output when runningdocker images
.e.g. In the below code block when using
--reproducible
the CREATED output is N/A which is not the desired behavior.Another issue that was found is that when
--reproducible
is used on an image when running docker history for that image the output is not useful at allAs you can see with the reproducible flag every useful information was lost from the docker history
The text was updated successfully, but these errors were encountered: