Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base image development encounters load metadata errors (darwin-arm64) #4932

Closed
colelawrence opened this issue Sep 4, 2021 · 14 comments · Fixed by #4934
Closed

Base image development encounters load metadata errors (darwin-arm64) #4932

colelawrence opened this issue Sep 4, 2021 · 14 comments · Fixed by #4934
Labels
bug Something isn't working

Comments

@colelawrence
Copy link

Expected Behavior

Should be able to build base images from all images.

Using k3d, macOS M1 aarch64, running tilt up in the following repo:
colelawrence/nodejs-express-k8s@33b9e64

Current Behavior

STEP 1/5 — Building Dockerfile: [tilt.dev/nodejs-express-base]
Building Dockerfile:
  FROM node:14-alpine
  
  WORKDIR '/var/www/app'
  ADD package.json package.json
  RUN npm install
  ENTRYPOINT node server.js


     Tarring context…
     Building image
     copy /context / [done: 45ms]
     [1/4] FROM docker.io/library/node:14-alpine@sha256:8c94a0291133e16b92be5c667e0bc35930940dfa7be544fb142e25f8e4510a45
     [2/4] WORKDIR /var/www/app [cached]
     [3/4] ADD package.json package.json [cached]
     [4/4] RUN npm install [cached]
     exporting to image

STEP 2/5 — Pushing localhost:52780/tilt.dev_nodejs-express-base:tilt-858fb587e21f53c7
     Skipping push: base image does not need deploy

STEP 3/5 — Building Dockerfile: [tilt.dev/nodejs-express-app]
Building Dockerfile:
  FROM localhost:52780/tilt.dev_nodejs-express-base:tilt-858fb587e21f53c7
  
  WORKDIR '/var/www/app'
  
  ADD . .


     Tarring context…
     Building image
     copy /context / [done: 43ms]
     
     ERROR IN: [internal] load metadata for localhost:52780/tilt.dev_nodejs-express-base:tilt-858fb587e21f53c7

Build Failed: ImageBuild: failed to create LLB definition: localhost:52780/tilt.dev_nodejs-express-base:tilt-858fb587e21f53c7: not found

Steps to Reproduce

colelawrence/nodejs-express-k8s@33b9e64

  1. Be using k3d cluster create --config $configfile with a create registry on latest Docker for Desktop for macOS M1.
  2. git clone https://github.com/colelawrence/nodejs-express-k8s
  3. tilt up

Context

tilt doctor Output

$ tilt doctor
Tilt: v0.22.7, built 2021-09-03
System: darwin-arm64
---
Docker
- Host: [default]
- Server Version: 20.10.8
- API Version: 1.41
- Builder: 2
---
Kubernetes
- Env: k3d
- Context: k3d-tilt
- Cluster Name: k3d-tilt
- Namespace: default
- Container Runtime: containerd
- Version: v1.21.3+k3s1
- Cluster Local Registry: {Host:localhost:52780 hostFromCluster:k3d-tilt-registry:5000 SingleName:}
---
Thanks for seeing the Tilt Doctor!
Please send the info above when filing bug reports. 💗

The info below helps us understand how you're using Tilt so we can improve,
but is not required to ask for help.
---
Analytics Settings
--> (These results reflect your personal opt in/out status and may be overridden by an `analytics_settings` call in your Tiltfile)
- User Mode: opt-out
- Machine: eebd79741ae00518360169d555d4ce7e
- Repo: gCGWHHiUiupQMoOekqknDQ==

About Your Use Case

I cannot successfully use base images with tilt up.

(In my real code, I actually was experiencing this ERROR IN: [internal] load metadata for localhost:52780/tilt.dev_nodejs-express-base:tilt-858fb587e21f53c7 when I was based from rust:buster, so I'm suspicious that it's some kind of syntax property or build kit issue... maybe related to aarch64 (since that's usually the issue I end up hitting).

I've been debugging this for about 3 hours trying to figure out what the difference was between the example and my code. I really don't know how to pry into the ERROR IN: [internal] load metadata for error, but I am happy to help figure it out.

@colelawrence colelawrence added the bug Something isn't working label Sep 4, 2021
@colelawrence colelawrence changed the title Base image development breaks between node:13-alpine to node:14-alpine Base image development breaks between node:13-alpine to node:14-alpine (k3d registry) Sep 4, 2021
@colelawrence colelawrence changed the title Base image development breaks between node:13-alpine to node:14-alpine (k3d registry) Base image development breaks between node:14.17.5-alpine to node:14.17.6-alpine (k3d registry) Sep 4, 2021
@colelawrence
Copy link
Author

colelawrence commented Sep 4, 2021

I have a suspicion that this is related to when the docker image is published to hub.docker.io. Perhaps there is some kind of issue with the k3d registry?

  • node:14.17.5-alpine3.13 from 23 days ago works
    Screen Shot 2021-09-04 at 4 45 37 PM

while

  • node:14.17.5-alpine3.14 from 8 days ago DOES NOT work
    image

  • node:14.17.6-alpine3.14 from 4 days ago DOES NOT work

@colelawrence
Copy link
Author

I've been trying an assortment of additional little things, and I'm not so sure anymore what the issue is... I wish I knew how to troubleshoot this more deeply than just getting ERROR IN: [internal] load metadata for ....

@colelawrence
Copy link
Author

It appears that multi-stage builds are a whole other complexity with this error at it's center

@colelawrence colelawrence changed the title Base image development breaks between node:14.17.5-alpine to node:14.17.6-alpine (k3d registry) Base image development encounters load metadata errors (k3d registry) Sep 4, 2021
@colelawrence
Copy link
Author

colelawrence commented Sep 4, 2021

I've opened another branch on my repro repo. This demonstrates a little more complex of a set-up with multiple-stage builds which fail with similar issues.

I'm not sure if the two issues are a part of the same issue, but both prevent me from using tilt with both multi-stage and base images.

For the record, independent multi-stage images work fine even pointing against latest tags.

tilt-dev/tilt-example-base-image@70d2bc4
image

@nicks
Copy link
Member

nicks commented Sep 4, 2021

I tried your example (colelawrence/nodejs-express-k8s@33b9e64) on Linux/Kind and Linux/k3d, and it built OK for me. We can poke around on it more after the weekend.

Given some of the symptoms you're describing, in particular:

  • node:14.17.5-alpine3.13 works but node:14.17.5-alpine3.14 does not work
  • weird errors about the image not being found
  • the fact that you're on a MacOS M1

my educated guess is you've run into some sort of multi-arch image bug (i.e., you're dealing with an image for multiple chipsets, and some part of the system is handling it incorrectly). But it's hard to say from here which components are misbehaving (whether it's tilt, or the k3d registry, or the docker build, or some interop between any two of them) 😢

@colelawrence
Copy link
Author

@nicks, thanks so much! I'll be happy to help diagnose when the weekdays come. For now, I might just have to switch operating systems, or I might see if I can force building a more universal byte code with qemu.

@colelawrence colelawrence changed the title Base image development encounters load metadata errors (k3d registry) Base image development encounters load metadata errors (darwin-arm64) Sep 5, 2021
@colelawrence
Copy link
Author

colelawrence commented Sep 5, 2021

I've tested with kind with ctlptl without luck.

I'm largely experiencing the exact same issues. It appears that this issue happens even without Tilt or k8s. So, this seems like it might be needed to be reported elsewhere.

I've created a few minimal test cases at https://github.com/colelawrence/nodejs-express-k8s/tree/7923b335a4dcc3689fcd800e47122fba56c04d82

See the outputs between the following:

In this repo, I have a set of x-*.sh files for tests. Each one starts by docker system prune -a -f and then performs a few docker build commands.
At the head of every log output, I print the contents of the script followed by the stderr + stdout output of the script.

The gist so far, is that the commands work when I export DOCKER_DEFAULT_PLATFORM=linux/arm64.

This may have an effect on several things, but one thing I found very interesting was this git diff between fail and pass for the docker image inspect tilt.dev/demo-base | jq command:

image

The passing version does NOT have the "Variant": "v8". I'm not sure if this is related, but it seems interesting.

@colelawrence
Copy link
Author

colelawrence commented Sep 5, 2021

Possibly related "docker pull --platform=linux/arm64 silently falls back to linux/amd64 image" docker/for-mac#5625

@colelawrence
Copy link
Author

This looks very related docker/for-mac#5873 (comment)

@colelawrence
Copy link
Author

colelawrence commented Sep 5, 2021

One thing that I think could immediately unblock me is if I could either set docker settings via env vars for docker_build (so I can set --platform linux/arm64 or DOCKER_DEFAULT_PLATFORM=linux/arm64 that would be reflected in my Tiltfile docker_build settings).

Otherwise, I am actually able to do custom_build where I manually set --platform, but I end up missing out on the live_update tools, I think.

@nicks
Copy link
Member

nicks commented Sep 5, 2021

Does setting DOCKER_BUILDKIT=0 fix the problem? Tilt should respect that.

@colelawrence
Copy link
Author

colelawrence commented Sep 5, 2021

Setting DOCKER_BUILDKIT=0 fixes the minimal repro (see logs), but I rely on it for the --mount features in one of my images.

I can look into re-writing images that were requiring --mount.

I just deleted a following comment about having to prune, because I was using the wrong env var when experiencing that issue

@milas
Copy link
Contributor

milas commented Sep 7, 2021

Thanks for all the debugging @colelawrence!

I just opened #4934 which will allow you to either explicitly pass platform as an arg to docker_build or respect DOCKER_DEFAULT_PLATFORM if not explicitly set (i.e. matching the Docker CLI behavior).

You could handle this automatically by putting something like the following at the beginning of your Tiltfile after upgrading to the next release once it's out:

if 'DOCKER_DEFAULT_PLATFORM' not in os.environ:
    arch=str(local('uname -m', command_bat='echo %PROCESSOR_ARCHITECTURE%', quiet=True, echo_off=True)).rstrip().lower()
    if arch == 'arm64':
        print('Setting default architecture for Docker builds to "linux/arm64"')
        os.putenv('DOCKER_DEFAULT_PLATFORM', 'linux/arm64')

milas added a commit that referenced this issue Sep 7, 2021
Docker CLI has a `--platform` argument and `DOCKER_DEFAULT_PLATFORM`
environment variable for default.

This is useful when dealing with multi-architecture images in
different scenarios, e.g. using an M1-based macOS machine.

This adds a `platform` argument to `docker_build` which will get
passed through to Docker/BuildKit. Similar to the Docker CLI, if
not specified, `DOCKER_DEFAULT_PLATFORM` will be used as a default
if set.

Fixes #4932.
@milas
Copy link
Contributor

milas commented Sep 13, 2021

v0.22.8+ is out which supports platform for docker_build - if you run into any problems with it, feel free to comment here or open a new issue.

See https://docs.tilt.dev/api.html#api.docker_build for usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants