[Feature Request] Support any number of `ARG` being declared before `FROM` and available during build #37622

aitorpazos · 2018-08-10T11:25:51Z

Once parameters support have been introduced into FROM instruction through ARG, I think the following will improve consistency and enable reducing the number of layers in our images.

Consider the following Dockerfie:

ARG CENTOS_VERSION=7.4.1708
FROM centos:${CENTOS_VERSION}
ARG MY_TAG="my tag value"
LABEL arg_version="${CENTOS_VERSION}" \
      my_tag="${MY_TAG}"

You'll get the following build output:

$ docker build --rm -t arg_issue -f Dockerfile .
Sending build context to Docker daemon  3.072kB
Step 1/4 : ARG CENTOS_VERSION=7.4.1708
Step 2/4 : FROM centos:${CENTOS_VERSION}
7.4.1708: Pulling from library/centos
Digest: sha256:de88676c62e619d07f081be9e518bb201c03fa554882aa7cf7ca72fe3ca7846d
Status: Downloaded newer image for centos:7.4.1708
 ---> d3949e34634c
Step 3/4 : ARG MY_TAG="my tag value"
 ---> Using cache
 ---> 57794f23d33d
Step 4/4 : LABEL arg_version="${CENTOS_VERSION}"       my_tag="${MY_TAG}"
 ---> Using cache
 ---> cdc00d203730
Successfully built cdc00d203730
Successfully tagged arg_issue:latest

and image's labels:

$ docker inspect arg_issue | jq '.[].Config.Labels'
{
  "arg_version": "",
  "build-date": "20170911",
  "license": "GPLv2",
  "my_tag": "my tag value",
  "name": "CentOS Base Image",
  "vendor": "CentOS"
}

You can see that the same ARG instruction have different behaviors:

CENTOS_VERSION will only be visible to FROM instruction but not to any following instruction
CENTOS_VERSION instruction will create no new layer in the resulting image
MY_TAG will be available for any following build container
MY_TAG instruction will imply the creation of a new layer and multiple of them can't be merged into one as ENV ( Multiline ARG - docker failed to build: ARG requires exactly one argument #35950 )

This behavior is problematic for the following reasons:

I can't record the value of CENTOS_VERSION in my image without duplicating the instruction declaration after FROM, introducing inconsistency risks and complexity
ARG being a build argument, I don't see a strong need to generate a layer for each of them (e.g.: we don't generate new layers for proxy configuration)

What I propose is:

Support any number of ARG instructions to be declared before FROM and make them available to instructions following FROM
ARG instructions declared after FROM can remain supported for backward compatibility purposes but I'd encourage the new behavior.

I think it would bring the following benefits:

I could reduce the number of layers by moving ARG before FROM.
It would establish what I think is a more intuitive difference between ARG and ENV which I see colleagues being confused about them.
I'm not so sure about this but I think those ARG values could be available to all stages of multi-stage build.

Docker versions (Docker Toolbox in Win 8.1):

$ docker-machine ssh default docker -v
Docker version 18.04.0-ce, build 3d479c0

$ docker -v
Docker version 18.06.0-ce, build 0ffa8257ec

The text was updated successfully, but these errors were encountered:

thaJeztah · 2018-08-10T14:37:27Z

Thanks for opening this request!

Let me start with "layers"; in Docker before version 1.10, any Dockerfile instruction created a new container; the Dockerfile instruction (wether it be RUN, ENV, LABEL, or any other instruction) was applied to that container, and the result committed to a new (intermediate) image.

Docker created a chain or "parent" images, and when pushing (or pulling) the final image, all intermediate images had to be pushed, or pulled, even if a Dockerfile instruction did not make filesystem changes (e.g. also in case that only a change to the metadata (LABEL or ENV for example))

Docker 1.10 and up uses a new, content-addressable store. The new store does keep track of intermediate images during build (so that you can docker run each intermediate step for debugging, but also as caching mechanism), but when pushing or pulling the image to/from a registry, only the final resulting image is used.

The builder is also "smart" enough to decide if a new layer has to be created; if a Dockerfile instruction doesn't make changes to the container's filesystem, then only the metadata is updated, and no new layer is created.

For example, this Dockerfile only has a single layer:

ARG BASE=busybox:latest
FROM $BASE
ENV foobar=baz
ENV hello=world
VOLUME /something
LABEL iam=a-label
ENTRYPOINT ["/bin/sh"]
CMD ["-c", "echo hello"]

$ docker build -t foo .

$ docker image inspect --format='{{json .RootFS}}' foo
{
  "Type": "layers",
  "Layers": [
    "sha256:f9d9e4e6e2f0689cd752390e14ade48b0ec6f2a488a05af5ab2f9ccaf54c299d"
  ]
}

So, while combining (for example) multiple LABEL or ENV instructions in a single line resulted in a slight performance gain in the past (only a single container had to be started, and only a single intermediate image was created instead of multiple); with current versions of Docker, combining those instructions will not improve performance or image size, and could even make caching less efficient (changing one label invalidates the cache for al labels on the same line, instead of only invalidating the cache for that label)

Support any number of ARG instructions to be declared before FROM and make them available to instructions following FROM

This is really by design; build stages should be scoped, independent. Globally defined (before the first FROM) build args should not be able to override ARG or ENV environment variables inside a stage, unless the stage explicitly defines that they should be.

Because of this design, it's possible to globally define a build-arg, but only consume it in some build-stages. It also allow (even more so in the next builder, powered by BuildKit) to build stages in parallel (the builder creates a dependency graph, and based on that performs stages in parallel, or even skips stages if they're not needed for the final image).

Yes; this design requires you to define every ARG that is to be consumed by a build stage (which I reckon can result in extra work when writing the Dockerfile), but brings many improvements, both in performance, and in flexibility.

MY_TAG will be available for any following build container

(Following the explanation above); doing so would force the build to always be done sequential; no "following stage" can start until all previous stages were executed (because they potentially modify the value of an ARG). Detecting if a stage uses a specific ARG (or not) also is not possible if those args are available automatically in a stage; any RUN instruction could (possibly hidden in a shell script) depend on the value of a build-arg, so cache for those steps must be invalidated for every change in the build arg's value.

So, how to re-use a globally defined build-arg in multiple stages

A globally defined build-arg can be used in a stage by declaring it inside that stage. If you want to inherit the value that's set globally, omit a value, and the ARG will inherit the value that's set on a global level (or, if a value is set on the command-line, use the value that's specified there).

For example; the following Dockerfile:

# default value for `FOO`
ARG FOO=bar

# no default value
ARG BAR

FROM busybox
ARG FOO

# only $FOO is set in this stage
RUN echo $FOO; echo $BAR


FROM busybox
ARG FOO
ARG BAR

# both $FOO and $BAR are set in this stage
RUN echo $FOO; echo $BAR

FROM busybox
ARG FOO="default for this stage"
ARG BAR

# both $FOO and $BAR are set in this stage
RUN echo $FOO; echo $BAR

FROM busybox
ENV FOO="I am an env"
ARG BAR="default for this stage but overridden"

# both $FOO and $BAR are set in this stage
RUN echo $FOO; echo $BAR

docker build --no-cache --build-arg BAR="command-line value"  -t foo .

Step 1/17 : ARG FOO=bar
Step 2/17 : ARG BAR
Step 3/17 : FROM busybox
 ---> e1ddd7948a1c
Step 4/17 : ARG FOO
 ---> Running in 9dcdf1666365
Removing intermediate container 9dcdf1666365
 ---> e280ff0655db
Step 5/17 : RUN echo $FOO; echo $BAR
 ---> Running in 42d999bb0e24
bar

Removing intermediate container 42d999bb0e24
 ---> c1513a5bad04
Step 6/17 : FROM busybox
 ---> e1ddd7948a1c
Step 7/17 : ARG FOO
 ---> Running in f533eb867ee3
Removing intermediate container f533eb867ee3
 ---> f8108c428e99
Step 8/17 : ARG BAR
 ---> Running in bbe3af4236d9
Removing intermediate container bbe3af4236d9
 ---> d99dd8158aec
Step 9/17 : RUN echo $FOO; echo $BAR
 ---> Running in aa5b02eba9a2
bar
command-line value
Removing intermediate container aa5b02eba9a2
 ---> 1452606ca620
Step 10/17 : FROM busybox
 ---> e1ddd7948a1c
Step 11/17 : ARG FOO="default for this stage"
 ---> Running in 86dcdd11ac14
Removing intermediate container 86dcdd11ac14
 ---> 5010924fc5ce
Step 12/17 : ARG BAR
 ---> Running in e83a57b16402
Removing intermediate container e83a57b16402
 ---> 43fffe5388d6
Step 13/17 : RUN echo $FOO; echo $BAR
 ---> Running in a2eb9624ad0b
default for this stage
command-line value
Removing intermediate container a2eb9624ad0b
 ---> e97abc690b75
Step 14/17 : FROM busybox
 ---> e1ddd7948a1c
Step 15/17 : ENV FOO="I am an env"
 ---> Running in c5a74b625158
Removing intermediate container c5a74b625158
 ---> dde4eb645836
Step 16/17 : ARG BAR="default for this stage but overridden"
 ---> Running in 084f854fc113
Removing intermediate container 084f854fc113
 ---> c22dcc4c81ae
Step 17/17 : RUN echo $FOO; echo $BAR
 ---> Running in 0c656c75ed35
I am an env
command-line value
Removing intermediate container 0c656c75ed35
 ---> e434a711f19e
Successfully built e434a711f19e
Successfully tagged foo:latest

Inspecting the image shows that it still consists of a single layer;

$ docker image inspect --format='{{json .RootFS}}' foo
{
  "Type": "layers",
  "Layers": [
    "sha256:f9d9e4e6e2f0689cd752390e14ade48b0ec6f2a488a05af5ab2f9ccaf54c299d"
  ]
}

aitorpazos · 2018-08-10T15:42:22Z

Thanks so much for your reply and it's great to find out there is no need to change anything :)
Cheers!

GordonTheTurtle added the area/builder label Aug 10, 2018

aitorpazos changed the title ~~[Feature Request] Support any number of ARG being declared before FROM~~ [Feature Request] Support any number of ARG being declared before FROM and available during build Aug 10, 2018

GordonTheTurtle added the area/builder label Aug 10, 2018

thaJeztah added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Aug 10, 2018

aitorpazos closed this as completed Aug 10, 2018

thaJeztah mentioned this issue Dec 9, 2018

ARG before FROM in Dockerfile doesn't behave as expected #34129

Closed

h-vetinari mentioned this issue Apr 10, 2019

BUG: global ARG not shown as build step containers/buildah#1503

Closed

keineahnung2345 mentioned this issue Jun 12, 2019

update dockerfile according to the new INSTALL.md facebookresearch/maskrcnn-benchmark#883

Merged

buehner mentioned this issue Nov 9, 2023

fix: Use consistent version number in Dockerfile geoserver/docker#33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support any number of `ARG` being declared before `FROM` and available during build #37622

[Feature Request] Support any number of `ARG` being declared before `FROM` and available during build #37622

aitorpazos commented Aug 10, 2018

thaJeztah commented Aug 10, 2018 •

edited

Loading

aitorpazos commented Aug 10, 2018

[Feature Request] Support any number of ARG being declared before FROM and available during build #37622

[Feature Request] Support any number of ARG being declared before FROM and available during build #37622

Comments

aitorpazos commented Aug 10, 2018

thaJeztah commented Aug 10, 2018 • edited Loading

So, how to re-use a globally defined build-arg in multiple stages

aitorpazos commented Aug 10, 2018

[Feature Request] Support any number of `ARG` being declared before `FROM` and available during build #37622

[Feature Request] Support any number of `ARG` being declared before `FROM` and available during build #37622

thaJeztah commented Aug 10, 2018 •

edited

Loading