Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible builds broken in 1.8.0 #2005

Closed
NullHypothesis opened this issue Mar 21, 2022 · 14 comments · Fixed by #2384
Closed

Reproducible builds broken in 1.8.0 #2005

NullHypothesis opened this issue Mar 21, 2022 · 14 comments · Fixed by #2384

Comments

@NullHypothesis
Copy link

Actual behavior
Consider the Go program main.go and its corresponding Dockerfile (both listed below). Using kaniko in version 1.7.0, two subsequent reproducible builds using the command listed below result – as expected – in two identical Docker images. In version 1.8.0, however, two subsequent builds are no longer identical.

Expected behavior

I expect two subsequent reproducible builds to result in identical images.

To Reproduce
Steps to reproduce the behavior:

  1. Build an image by running:
$ docker run -v $(pwd):/src --network=host gcr.io/kaniko-project/executor:v1.8.0 --reproducible --dockerfile /src/Dockerfile --no-push --tarPath /src/image-file-main-00.tar --destination main:00 --cache=false --context dir:///src/
  1. Build a second image by running:
$ docker run -v $(pwd):/src --network=host gcr.io/kaniko-project/executor:v1.8.0 --reproducible --dockerfile /src/Dockerfile --no-push --tarPath /src/image-file-main-01.tar --destination main:01 --cache=false --context dir:///src/
  1. Import both images by running:
$ cat image-file-main-00.tar | docker load
$ cat image-file-main-01.tar | docker load
  1. Compare the image IDs:
$ docker image ls main
REPOSITORY   TAG       IMAGE ID       CREATED   SIZE
main         00        e65d80240143   N/A       1.75MB
main         01        77fc4150ed91   N/A       1.75MB

The Go program is identical in both builds but the surrounding tar archive isn't. I compared the hexdump of the tar archive of both builds and noticed that there are atime and ctime fields that contain a Unix timestamp, which is the reason why the builds differ. Could this regression have been caused by ee95be1?

Additional Information

  • Dockerfile
FROM golang:1.18 as builder
WORKDIR /src
COPY main.go ./
RUN CGO_ENABLED=0 GO111MODULE=off go build -trimpath -o main
FROM scratch as artifact
COPY --from=builder /src/main /bin/
CMD [ "/" ]
  • Build Context
package main                      
 
import "fmt"

func main() {
	fmt.Println("Hello!")
}

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@imjasonh
Copy link
Collaborator

Just to check, do you get reproducible builds if your builder is golang:1.17 or even :1.16? There were some build stamping changes in 1.18 that may be causing reproducibility to suffer.

@NullHypothesis
Copy link
Author

Just to check, do you get reproducible builds if your builder is golang:1.17 or even :1.16? There were some build stamping changes in 1.18 that may be causing reproducibility to suffer.

No, the problem remains with a 1.17 builder.

somdoron added a commit to somdoron/action-kaniko that referenced this issue Apr 30, 2022
Version 1.8.0 and above breaks reproducible builds.

GoogleContainerTools/kaniko#2005
somdoron added a commit to somdoron/action-kaniko that referenced this issue Apr 30, 2022
Version 1.8.0 and above breaks reproducible builds.

GoogleContainerTools/kaniko#2005
@hmemcpy
Copy link

hmemcpy commented Apr 30, 2022

My colleague and I stumbled on this exact issue and now can confirm: downgrading to v1.7.0 of kaniko solved the problem: the produced image had the same digest! So it's indeed a regression.

@fernandrone
Copy link

FWIW I'll add another anecdata here. I'm testing Kaniko at my workplace and had the exact same issue building a Golang 1.17 project. I could not figure out why Kaniko reproducible builds was not working even though I verified that my go binary was the same between builds. Downgrading to v1.7.0 of Kaniko fixed the issue. I could try and add some data here if that would be helpful but my situation was pretty much the one describe in the Issue description.

@Sineaggi
Copy link

Sineaggi commented May 5, 2022

Just encountered this today. We use kaniko --reproducible for our base-images. Downgrading to kaniko 1.7.0 worked for this project.

aexvir pushed a commit to aevea/action-kaniko that referenced this issue May 13, 2022
Version 1.8.0 and above breaks reproducible builds.

GoogleContainerTools/kaniko#2005
@suicide
Copy link

suicide commented Aug 9, 2022

I think #1809 causes this issue. I compared layer tar's produced by 1.7.0 and 1.8.1 (because I am idiot) and the tar format (Pax Header) changed between these versions. 1.8.1 contains ctime and atime as @NullHypothesis noted. The old version does not.

maybe --repoducible should switch back to the old format?

@paisleyrob
Copy link

Narrowing down @suicide's comment, the --reproducible flag was broken between v1.7.0 and v1.8.0. The only changes I saw on the v1.8.0 was atime/ctime changes @NullHypothesis mentioned.

angusjfw added a commit to pi-top/kaniko that referenced this issue Dec 14, 2022
This reverts commit ee95be1.
Due issues with reproducible builds: GoogleContainerTools#2005
@hrobertson
Copy link

I can confirm that building 1.9.1 but with #1809 reverted results in reproducible image builds working as expected.

@zx96
Copy link

zx96 commented Jan 2, 2023

I believe that the most sensible place to resolve this would be in the go-containerregistry project; I've opened an issue there.

zx96 added a commit to zx96/kaniko that referenced this issue Jan 2, 2023
This change adds a new flag to zero timestamps in layer tarballs without
making a fully reproducible image.

My use case for this is maintaining a large image with build tooling.
I have a multi-stage Dockerfile that generates an image containing
several toolchains for cross-compilation, with each toolchain being
prepared in a separate stage before being COPY'd into the final image.
This is a very large image, and while it's incredibly convenient for
development, making a change as simple as adding one new tool tends to
invalidate caches and force the devs to download another 10+ GB image.

If timestamps were removed from each layer, these images would be mostly
unchanged with each minor update, greatly reducing disk space needed for
keeping old versions around and time spent downloading updated images.

I wanted to use Kaniko's --reproducible flag to help with this, but ran
into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

This solution works around these problems by stripping out timestamps as
the layer tarballs are built. It removes the need for a separate
postprocessing step, and preserves image metadata so we can still see
when the image itself was built.

An alternative solution would be to use mutate.Time much like Kaniko
currently uses mutate.Canonical to implement --reproducible, but that
would not be a satisfactory solution for me until
[issue 1168](google/go-containerregistry#1168)
is addressed by go-containerregistry. Given my lack of Go experience, I
don't feel comfortable tackling that myself, and this seems like a
simple and useful workaround in the meantime.

As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that
should really be addressed in go-containerregistry itself).
@jglynn
Copy link

jglynn commented Jan 28, 2023

Ran into this defect yesterday with 1.9.1 -- any known workarounds for now short of downgrading to 1.7.0 ?

@BronzeDeer
Copy link
Contributor

BronzeDeer commented Jan 31, 2023

Ran into this defect yesterday with 1.9.1 -- any known workarounds for now short of downgrading to 1.7.0 ?

I've submitted a Pull request to go-containerregistry based on @zx96 initial investigation.

I've confirmed locally that if I build kaniko against that commit, that the bug vanishes. (Note that I also had to bump the version of cloud.google.com/go/storage to atleast v1.27.0 and not just the automatically resolved v1.21.1, before kaniko would compile again, this probably comes from jumping 5 minor versions in go-containerregistry)

As soon as the PR is merged in the upstream project, I'll create a PR for kaniko

@jglynn
Copy link

jglynn commented Jan 31, 2023

That's fantastic, thanks @BronzeDeer and kudos to @zx96 as well.

Given that change, could we then avoid using --reproducible in kaniko because PAX headers would be corrected in a way that would consistently produce identical shas?

The reason I ask is that I'm trying to optimize our build process for thousands of Spring Boot apps that use multi-stage Dockerfile to separate out the app layer from the 3rd party jar layer, etc.

My concern with --reproducable is described very clearly in zx96@bad2f94 and my goals are the same.

...issues with memory consumption memory consumption (#862) and build time (#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

@BronzeDeer
Copy link
Contributor

BronzeDeer commented Jan 31, 2023

That's fantastic, thanks @BronzeDeer and kudos to @zx96 as well.

Given that change, could we then avoid using --reproducible in kaniko because PAX headers would be corrected in a way that would consistently produce identical shas?

The reason I ask is that I'm trying to optimize our build process for thousands of Spring Boot apps that use multi-stage Dockerfile to separate out the app layer from the 3rd party jar layer, etc.

My concern with --reproducable is described very clearly in zx96@bad2f94 and my goals are the same.

...issues with memory consumption memory consumption (#862) and build time (#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

The bug fix makes --reproducible work as intended, it makes the layers identical by setting the timestamps in the tar headers by setting it to a static time. The bug stemmed from the fact, that if the PAX or GNU format was used for the underlying tars, then not only did the code need to change the modified timestamp, but also the access time and change time in the header. PAX tars are not reproducible by default, in fact the opposite, the include more time information which varies between builds by default, which broke the existing code which naively assumed that there was only 1 timestamp in tar headers.

TL;DR: --reproducible will start producing layers with identical shas again, but you need to use the flag, not using the flag will result in different shas even for the same content

P.S.: On Caching: Kaniko's own build-time caching seems mostly unaffected by the bug , since it caches based on content of the Dockerfile rather than raw digests, varying digests only hinder external tools (which also includes container runtimes pulling images sadly)

@jglynn
Copy link

jglynn commented Jan 31, 2023

Thanks for the clarity on --reproducable, looking forward to giving it a go after the fix.

I supposed the performance issues with this feature can be explored via (#862) and (#1960). I see a comment in #862 concerning the possible use of --compressed-caching=false?

On caching -- I've also struggled to get kaniko to generate matching shas for the cached COPY layers. I'm using the remote --cache-repo option and I see the cache layers are published with just their contents (as you noted) but the shas change and I get misses at build time.

zx96 added a commit to zx96/kaniko that referenced this issue Apr 22, 2023
This change adds a new flag to zero timestamps in layer tarballs without
making a fully reproducible image.

My use case for this is maintaining a large image with build tooling.
I have a multi-stage Dockerfile that generates an image containing
several toolchains for cross-compilation, with each toolchain being
prepared in a separate stage before being COPY'd into the final image.
This is a very large image, and while it's incredibly convenient for
development, making a change as simple as adding one new tool tends to
invalidate caches and force the devs to download another 10+ GB image.

If timestamps were removed from each layer, these images would be mostly
unchanged with each minor update, greatly reducing disk space needed for
keeping old versions around and time spent downloading updated images.

I wanted to use Kaniko's --reproducible flag to help with this, but ran
into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960).
Additionally, I didn't really care about reproducibility - I mainly
cared about the layers having identical contents so Docker could skip
pulling and storing redundant layers from a registry.

This solution works around these problems by stripping out timestamps as
the layer tarballs are built. It removes the need for a separate
postprocessing step, and preserves image metadata so we can still see
when the image itself was built.

An alternative solution would be to use mutate.Time much like Kaniko
currently uses mutate.Canonical to implement --reproducible, but that
would not be a satisfactory solution for me until
[issue 1168](google/go-containerregistry#1168)
is addressed by go-containerregistry. Given my lack of Go experience, I
don't feel comfortable tackling that myself, and this seems like a
simple and useful workaround in the meantime.

As a bonus, I believe that this change also fixes GoogleContainerTools#2005 (though that
should really be addressed in go-containerregistry itself).
NullHypothesis pushed a commit to NullHypothesis/tokenizer that referenced this issue Oct 31, 2023
This commit adds ia2 to our general docker build pipeline by adding a
GitHub action.  The commit also adds a Dockerfile that allows for
reproducible, kaniko-based builds.  Simply run "make docker" to create a
reproducible image.

We currently use kaniko in version 1.7.0 because 1.8.0 (the newest
version as of 2022-03-22) has a bug that breaks reproducible builds:
GoogleContainerTools/kaniko#2005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.