Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip downloading base image layers if they exists in target repository #1840

Merged
merged 69 commits into from
Jul 31, 2019

Conversation

chanseokoh
Copy link
Member

@chanseokoh chanseokoh commented Jul 11, 2019

Fixes #1673.

This will give substantial speedup for CI/CD environments where people usually don't persist the Jib base image cache.

Design

I tried to do intelligent, fine-grained BLOb checking on each layer. For example, if only a subset of the layers are missing in the target repo, it downloads and caches only those layers.

I also made it avoid checking the layer twice in the situation where it should push missing layers, because currently PushBlobStep always does a Blob check before pushing.

I spent a lot of time tinkering with several different designs and implementations in order to maintain generality and loose coupling between build steps and components, and I think I can settle with this design. So, other advantages from this are

  • generality: any layer can be checked and skipped using the same framework with no change; whether or not the layer is from a base image doesn't matter (hence future proof)
  • Therefore, checking/downloading/pushing layers can maintain the same level of parallelization as before. For example, it is possible that Jib is downloading one missing base image layer while pushing another missing base image layer as well as building/pushing application layers all at the same time. Potentially and theoretically in the future, the same story can be applied for application layers too. (I am saying theoretically here, because you have to cache applications layers once anyway.)

Speedup

For a large image like adoptopenjdk/openjdk9-openj9,

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  19.681 s
[INFO] Finished at: 2019-07-11T17:53:30-04:00
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  3.420 s
[INFO] Finished at: 2019-07-11T17:53:36-04:00
[INFO] ------------------------------------------------------------------------

(However, I admit adoptopenjdk/openjdk9-openj9 is 466MB and way larger than openjdk:8-slim or Distroless.)

Interaction with --offline

As pointed out before, not caching the base image layers may cause a problem for --offline when layers are missing. However, in reality, this would be extremely rare, because --offline can be used only with jib:dockerBuild and jib:buildTar; for those goals, Jib always downloads base image layers. And people do understand they have to do an online build at least once before they can do --offline.

But in an extremely rare case where the user

  1. never did jib:dockerBuild or jib:buildTar on their local dev machine
  2. and the base image exists in the target repo
  3. but did jib:build which skipped downloading layers (i.e., manifest JSON is cached but layers are not)
  4. now does --offline jib:dockerBuild without ever trying online jib:dockerBuild

then this case is still covered, as Jib will show the following error message.

[ERROR] Failed to execute goal com.google.cloud.tools:jib-maven-plugin:1.3.1-SNAPSHOT:dockerBuild (default-cli) on project helloworld: Cannot run Jib in offline mode; local Jib cache for base image is missing image layer sha256:.... Rerun Jib in online mode with "-Djib.cacheBaseImage=always" to re-download the base image layers. -> [Help 1]

(-Djib.cacheBaseImage=always is actually unnecessary because jib:dockerBuild and jib:buildTar always download layers and --offline is effective only for those goals, but I added it just for the unlikely case someone tries jib:build instead of jib:dockerBuild to do one-time online caching in this situation.)

@chanseokoh
Copy link
Member Author

chanseokoh commented Jul 23, 2019

I have addressed all the comments, AFAIKT. Good for another review pass.

I decided to update CHANGELOG in a follow-up PR.

.newTargetImageRegistryClientFactory()
.setAuthorization(pushAuthorization)
.newRegistryClient();
// TODO: also check if cross-repo blob mount is possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is as easy as adding buildConfiguration.getBaseImageRegistry().equals(buildConfiguration.getTargetImageRegistry()) on in the existing-checker below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it involves more than that, like checking

JibSystemProperties.useCrossRepositoryBlobMounts()
    && canAttemptBlobMount(authorization, sourceRepository))

to ensure that we will attempt the blob mount. And I think it is that we can attempt blob mount, and it might be possible that a server may not allow/support the mount by returning 202 Accepted instead of 201 Created, in which case RegistryClient.pushBlob() will fall back to do the usual pushing. So I think this is actually complicated.

But I think we are already pretty good without blob mount. Most of the time, the base image will be in the target repository. The blob mount can be a further optimization, but I think it does not give a huge value.

@chanseokoh chanseokoh merged commit 795a442 into master Jul 31, 2019
@chanseokoh chanseokoh deleted the i1673-skip-download-base-layers branch July 31, 2019 20:47
chanseokoh added a commit that referenced this pull request Jul 31, 2019
@raizoor
Copy link

raizoor commented Aug 5, 2019

Hi team,

Have any time for release it? Sounds good to my CI/CD method.

Thanks !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skip downloading and caching base image layers (BLOb) if target registry already has them
5 participants