Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile heredocs #2132

Merged
merged 5 commits into from
Jun 10, 2021
Merged

Dockerfile heredocs #2132

merged 5 commits into from
Jun 10, 2021

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented May 25, 2021

relates to moby/moby#34423

As mentioned in #2121, I've been making progress towards implementing heredocs in Dockerfiles, and thought it might be time to open a PR for it 🎉

I've essentially got all the functionality I think we'd need before wanting to merge, though I'm sure there's some fixes/tests to write before that.

Things that definitely need resolving before a merge is really possible:

  • Gate the feature behind a build tag, as suggested by @tonistiigi
  • Warn/error/do something if a heredoc is used in a place it's not expected (e.g. an ENV command)
  • Handle RUN heredocs in Windows more elegantly (doesn't look particularly doable with cmd so the current hacky approach, might be the best?)
  • Tests! Currently only the parsing stages are tested, so we some some more complex integration tests.

I'd really appreciate any feedback anyone has on the current design and implementation!

@jedevc jedevc force-pushed the dockerfile-heredocs branch from 2f41375 to fa64578 Compare May 25, 2021 16:27
@tonistiigi tonistiigi requested a review from thaJeztah May 25, 2021 22:35
@tonistiigi
Copy link
Member

Warn/error/do something if a heredoc is used in a place it's not expected (e.g. an ENV command)

Yes, this needs to be improved if I understand the current code correctly. Heredoc should be only allowed in certain commands and in certain positions. Heredoc in a random command or in a random place needs to cause an error. Heredoc prefix inside the string of run command (or any other command) should behave as it did before.

@jedevc jedevc force-pushed the dockerfile-heredocs branch from aec3d64 to eb95843 Compare May 26, 2021 08:20
@jedevc
Copy link
Member Author

jedevc commented May 26, 2021

Sorry for all the commit spam, I'll try and squash into a better series of commits once I've got some code review 😄

I've gone and added some more complete tests (and caught some weird behavior in the process), and specifically allowed only the COPY, ADD and RUN commands for heredoc parsing. Heredocs are parsed for those commands, and completely ignored for other commands, which are parsed exactly as before (I've added a test for this as well). Annoyingly, this is done at the parser level, which feels a bit hacky, but I don't think it's feasible to do it later on in the process, at least with the current parsing architecture.

Still need to work on this bit though and handle the case where a heredoc appears as part of an existing RUN command; this is a bit tricky, as is looks very similar to the ADD/COPY case, so somehow, the parser has to differentiate between them, probably in a similar manner to how I've done the allow-list for commands. Related to that do you think about the (admittedly, rather strange) case if a Dockerfile has done something like COPY <<EOF /test to copy a file called <<EOF (which is the current behavior)? Or is just RUN that's the main concern, since it's most likely that a heredoc would appear there?

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I left some comments / thoughts in line (haven't looked really closely at everything, so mostly "thoughts")

Perhaps not for the first iteration, but expanding this to also support ENV, ARG, LABEL would be interesting (I know some users have asked for multi-line values in ENV (e.g.), which would be easier with here docs).

Also, reading https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04, which was linked from moby/moby#34423 (comment), I noticed that multiple here docs can be combined (tbh, didn't know that worked);

cat <<eof1; cat <<eof2
Hi,
eof1
Helene.
eof2

I'm not exactly sure if we want to add support for this, but if we want to keep the option open, we could look at how that would fit in the Dockerfile syntax.

One possibly useful case for this would be to set (e.g.) multiple environment variables using heredoc;

ENV ONE=<<EOF1; TWO=<<EOF2
content of
env ONE
EOF1
content of
env TWO
EOF2

Definitely one we don't need for the first iteration / implementation, as long as we keep our options open.


Finally, moby/moby#34423 (comment) contains two suggestions (not sure if we want those, but leaving them here);

Using | sh to pipe the output (not sure if it adds value);

# 2, is this to change the shell that runs the script?
RUN <<EOF | sh
echo aa
EOF

Or alternatively, supporting shebang;

# Is there a need for this special sytax when you have shebang and the `SHELL` command
# the parser would have to check the first line for it so that it can exec the right thing
RUN <<EOF
#!/bin/sh
echo aa
EOF

(I'm not sure the above examples add much value, so leaning towards "don't implement those")

@thaJeztah
Copy link
Member

Whoops didn't see your last comment when posting

Or is just RUN that's the main concern, since it's most likely that a heredoc would appear there?

I (personally) think that it's ok to iterate over this feature (we can start small, and extend after that). That said; it's good to at least think of what this feature would look like for other commands to make sure we have not painted ourself in a corner.

I left some comments along those lines. It's also good to read the discussion on moby/moby#34423 - I gave it a quick "once over" for use-cases and examples, but perhaps I missed some.

RUN definitely would be my "1" priority for this feature, but the feature can definitely be useful for other commands (COPY, ENV, ARG, LABEL as mentioned).

@jedevc
Copy link
Member Author

jedevc commented May 26, 2021

Woo 🎉 thanks so much for your initial thoughts, they're really helpful! I'm expecting that given the size of the feature (and the number of potential use cases we want to solve for), there might be a few rounds of review 😅

Have replied to most of your inline comments, most of them are pretty quick fixes.

So in my imagination the the development of heredocs would look something like this:

  • Development and modification of the low-level code parser to support attaching heredocs to any explicitly allow-listed command, as well as some utilities for heredocs
  • Addition of heredoc primitives to higher level command extraction, giving heredocs actual meaning within their appropriate command contexts, specifically, would really like to include ADD/COPY and RUN in a first iteration. Doing both requires that the generic heredoc code stays as broad as possible, so that we're not forcing the direction of heredocs into a corner.
  • Finally, probably in another PR(s), heredocs in other commands like ARG, ENV, and LABEL. No reason these couldn't be done in this PR, but once the primitives are implemented, these ones should be fairly easy to do, and it should help keep this PR from getting too big!

So I really really like the idea of supporting multiple heredocs in a command, had no idea that that was even a possible feature! From an implementation point-of-view, this is quite doable - the code just switches from using a single possible heredoc attached to each command AST node to a map[string] of heredocs. The scenario I imagine it in:

COPY <<file1 <<file2 /destfolder/
contents for the first file
file1
contents for the second file
file2

Implementing the ability to do this in the parser primitives would be a really good idea for future flexibility, regardless of whether it ends up being used in ENV or COPY or anywhere else.


For the piping suggestion, while I'm not sold on it personally, it should be quite trivial to actually implement, see here. Currently, it's expressed quite neatly as a pipe into sh, so it makes sense to allow customization of it.

A possible use case, inspired by the Base64 encoding of files in moby#34423 (comment):

RUN <<EOF | base64 -d > /res1
n8un0dwuxO/vcAgj/wgcWIUvYLGN6GPcpr45veKL/WiRBOZfXkqsEE+vX4w385Zwh1IsBS0KHU4I
ByaSHRys6TnaZmhxQSitGwzDr48QDDX4Fft/L9pW401WyK10Q5uPAOrhEHuPfEEQyHGK8xpc+3Nu
V7pZ4Cx5kRARZkg8zhyBsurzG6eCvFoIKJr2/Ps+AEwUiuAMGjzPmZ2f9/Gmu1I9m2I7u9EBfJTE
l+HqgqIl3xZMj488CNui0v/bO7OBkvC7xA0zwzhDWN+CC+VSicOsIc5VPvOOn1lAmS0k1SBoocKB
tBY67fETzeAEQf5Vplg997igTO7XIkxSTkvhlNhffe/QSOUYjSdR5Wzm/KF0X4ztwGsJsgn/viMH
oEbzdvmeZ/N6fh9DRwQmX2LXUUEYRjLn1RyyYhpWOd9EmhndL1yxcXu7CRGunHxvbmUVgCFO0k0+
EuQF1FhYlaSq/cZty5sEBTdc2K508DEtOjyt8dCwz2H8dTl0n4mrZDH0Tuftpu2ji/VyBjoMKCUx
AFQkAdVJhx1nHr2u5KTxvq35hLeVicPcSlv0DAXzbXoEgrYYSFyx16bCW02T1ZKd9RSfTcWhUXLo
WvpNg3y0sIUs1kFsAqO43lbYmVdMqDn/qOMHH+HuQf+/xXd2NWL9M9uZbDjLxBYstA7SMo5oqLQ=
EOF

This syntax wouldn't affect how the low-level parser would work, and would just be contained to the Run command. Because it's a relatively simple change to make, I think it might be worth including.


Supporting shebangs, I'm not very sold on. Mostly because, (I think) shebangs are interpreted by the kernel, and not by the shell - so to get a script like this to execute, we'd need to do some messing around by saving the file before running it, instead of being able to express it in a single command line.

If we do want to support different shells, then we could do that with the piping syntax above.


Sorry for the wall of text, just my views on what the syntax could look like. With COPY/ADD and RUN (maybe with the piping) and the possibility for multiple heredocs, I think the door's left wide-open for future expansion of heredocs into other commands, and would really like to work on those myself, once we've got some initial functionality going!

Glancing over the original proposal, I think that about 80% of the use cases in there can be solved with just heredoc support in those 3 commands. The only major thing that's not covered is heredocs stored as ARGs, which does look a little involved to get working, since a stored ARG has to "remember" whether it was defined using a plain = or a heredoc =.

Thanks for your review again, really having quite a lot of fun deving this feature 😄

@jedevc
Copy link
Member Author

jedevc commented May 26, 2021

Glancing through the original proposal again, it seems like there might be other options for this kind of feature, in terms of syntax.

To summarize some of the options:

  1. Heredocs (as implemented in this PR, and explained in the initial comment in the proposal), this is my favorite (and also seems the most flexible).
  2. Introduce new Dockerfile commands (which could potentially compile to other commands internally in the parser)
    e.g.
    RUNS
    command1
    command2
    ENDRUNS
    This would be good for backwards compatibility, since nothing would need changing, but does have the downside of being much more brittle, and possibly introducing duplicates for quite a few commands that could benefit with heredocs.
  3. Alternative heredoc-style syntax, e.g.
    RUN {
        command1
        command2
    }
    
    COPY {
        line1
        line2
    } /dest
    This feels a bit simpler (to me), though I do prefer the classic heredoc style. Additionally, I imagine that backwards compatibility for this might not be very enjoyable to do: I'd guess more users have { and } in their RUNs than heredocs.

@jedevc jedevc force-pushed the dockerfile-heredocs branch from eb95843 to a552ee2 Compare May 26, 2021 14:17
@thaJeztah
Copy link
Member

So in my imagination the the development of heredocs would look something like this

Sounds good from my perspective (but input from @tonistiigi is always welcome as he's way more familiar with this repository)

Finally, probably in another PR(s), heredocs in other commands like ARG, ENV, and LABEL. No reason these couldn't be done in this PR, but once the primitives are implemented, these ones should be fairly easy to do, and it should help keep this PR from getting too big!

👍 that sound great, and I agree, it's fine to start with "something" and iterate.
If we can include it in the docker/dockerfile:labs frontend / syntax, people can start using it, and provide feedback.

So I really really like the idea of supporting multiple heredocs in a command, had no idea that that was even a possible feature!

:fist-bump: I literally learned it this morning when I glanced over those docs (... wait .. that's possible ??)

regardless of whether it ends up being used in ENV or COPY or anywhere else.

👍 yup! It's good to keep the door open for this

A possible use case, inspired by the Base64 encoding of files in moby/moby#34423 (comment)

Yes. Trying my head around it a bit to think of pros/cons. For "non-from-scratch" images, it could be solved by just doing that inside a shell, but could be useful for from scratch images. OTOH, sometimes existing features such as a multi-stage build could also help for those scenarios. This one (for me) falls into the category; needs to be explored (and outweigh PROS (feature!) and CONS (complexity / complicated syntax / ambiguity)

Supporting shebangs, I'm not very sold on

Same. I think it's preferable to keep the "content" of the here-doc (basically) an "opaque" bit to BuildKit (except for substituting variables). Adding support for shebangs etc may take us "down the rabbithole", better left to the container's process (we don't want to re-invent Bash or the kernel).

The only major thing that's not covered is heredocs stored as ARGs, which does look a little involved (...)

I'll defer that one to @tonistiigi 😅

Introduce new Dockerfile commands (which could potentially compile to other commands internally in the parser)

and

Alternative heredoc-style syntax, e.g.

Yes, I don't think we should take those approaches. I recall there have been long (looong) discussions about "begin/end" syntaxes like that (before we had multi-stage builds); many of those were to "group actions in a single layer". I'd like to avoid going down that rabithole with this feature, as there's a lot of things involved there (including "caching").

Overall, I think the heredoc approach is the "right" one;

  • it can solve many cases where line-continuation symbols made up for ugly, hard-to-read dockerfiles
  • it's a proven, well-known concept
  • "boilerplating" is limited (syntax isn't too verbose), and with that allows users to "structure" their dockerfile (indenting), choose between substituting variables yes/no
  • as discussed above, it also is useful for other commands (COPY, ENV, ARG, LABEL), allowing us to fix both "multi line variables", "inline files" and more

Only downside (somewhat) is that it's really a Linux concept, so somewhat "foreign" to Windows users. I think the benefits outweigh that downside.

@thaJeztah
Copy link
Member

For Windows (probably Linux as well), the only thing we should check is (possibly) line continuation symbols within heredoc. Make sure (and document) that within heredoc (in the RUN case), those are handled by the shell, and "opaque" to BuildKit / Docker itself: i.e.;

The \ here is not (should not) be handled by the Dockerfile; everything between EOF ... EOF is executed as-is

RUN <<EOF
echo hello \
    world
EOF

Likewise:

# escape=`
FROM busybox
# here ` is the line-continuation symbol (as set in `# escape=`
RUN echo hello`
world
RUN <<EOF
# here, \ is the shell's line-continuation, not ` (backtick)
echo hello \
world
EOF

(Input welcome on this)

@jedevc jedevc force-pushed the dockerfile-heredocs branch from 1f5595a to 6a4ad6d Compare May 26, 2021 15:15
@jedevc
Copy link
Member Author

jedevc commented May 26, 2021

Yes. Trying my head around it a bit to think of pros/cons. For "non-from-scratch" images, it could be solved by just doing that inside a shell, but could be useful for from scratch images. OTOH, sometimes existing features such as a multi-stage build could also help for those scenarios. This one (for me) falls into the category; needs to be explored (and outweigh PROS (feature!) and CONS (complexity / complicated syntax / ambiguity)

Ok, might be good to shelve that in the "maybe another PR later" category. However, for now, should we try and use the shell configured by SHELL for that heredoc? That's the most logical behavior to me, but of course, it falls down if you configure a shell that you can't pipe to.

Another use case for it that I just thought of: might be nice to be able to run python scripts easily if python is installed? i.e. instead of:

RUN python -c 'print("hello world"); print("foo bar")'

we could have,

RUN <<EOF | python
print("hello world")
print("foo bar")
EOF

Obviously this could be extendable to other languages 😝

For Windows (probably Linux as well), the only thing we should check is (possibly) line continuation symbols within heredoc.

For line continuations, atm, the parser ignores every single character till it gets to the heredoc line, so the continuation works as you described. However, this only happens on Linux: on Windows, it's a bit odd since I'm not sure how to pipe multiple commands into cmd (or even if it's easily possible).

@tonistiigi
Copy link
Member

as discussed above, it also is useful for other commands (COPY, ENV, ARG, LABEL), allowing us to fix both "multi line variables", "inline files" and more

Let's not expand this too quickly. "ENV, ARG, LABEL" would be a completely different concept. The logical place to use it is where currently path is required and can be replaced with an anonymous file. Shell doesn't support anything like this either. Some people may find it as a cool alternative syntax but it is not consistent with anything else.

@thaJeztah
Copy link
Member

for now, should we try and use the shell configured by SHELL for that heredoc?

Yes, I think the expectation would be that all RUN commands are executed using SHELL (or the default)

That's the most logical behavior to me, but of course, it falls down if you configure a shell that you can't pipe to.

That's a good point; not sure we'd have a real solution for that though, other than "improve the error message" (if possible)?

Another use case for it that I just thought of: might be nice to be able to run python scripts easily if python is installed?

Ah! Yes, I like that idea. I'd still be fine if that would be added as a "follow-up", but I like the general idea

Before that's implemented, the equivalent would be this (correct?)

SHELL ["/bin/python", "-c"]

RUN <<EOF
print("hello world")
print("foo bar")
EOF

# restore default shell
SHELL ["/bin/sh", "-c"]

For line continuations, atm, the parser ignores every single character till it gets to the heredoc line, so the continuation works as you described.

👍 perfect

However, this only happens on Linux: on Windows, it's a bit odd since I'm not sure how to pipe multiple commands into cmd (or even if it's easily possible).

It would be (mostly) theoretical (for now) as BuildKit is not yet supported on Windows. That said; it's something to think about for "when it will be supported". That said; if this code will also be used by the classic builder, we will need a way to indicate "this is using heredoc", and produce a sensible error if it's used.

Perhaps @TBBle or @olljanat know about the "pipe into cmd"?

Let's not expand this too quickly. "ENV, ARG, LABEL" would be a completely different concept. The logical place to use it is where currently path is required and can be replaced with an anonymous file. Shell doesn't support anything like this either. Some people may find it as a cool alternative syntax but it is not consistent with anything else.

That's fair. OK to exclude those.

@jedevc
Copy link
Member Author

jedevc commented May 27, 2021

Before that's implemented, the equivalent would be this (correct?)

Yup, either you'd have to switch shells midway through the build, or write a file containing the script, and RUN that...

Actually I think I oversimplified the idea of just being able to use the SHELL for heredocs - the problem is, a shell like sh -c doesn't read from stdin, only sh or sh -s reads from stdin, likewise for python -c and python. You'd almost want a way to configure an "interactive" and "non-interactive" shell, since not every shell neccessarily follows the -c convention.

# maybe something like this?
SHELL ["/bin/sh", "-c"]
SHELL --interactive ["/bin/sh", "-s"]

Maybe for now, we could have sh as the only allowed shell for heredocs (error-ing otherwise), and have some follow-up for expansion to other shells, maybe as part of the proposed shell piping syntax for RUN?

It would be (mostly) theoretical (for now) as BuildKit is not yet supported on Windows. That said; it's something to think about for "when it will be supported". That said; if this code will also be used by the classic builder, we will need a way to indicate "this is using heredoc", and produce a sensible error if it's used.

Ah interesting, I wasn't aware of that! Thankfully it shouldn't be too difficult to add support later, if it doesn't make it in for now.

Also, the idea of getting this into the classic builder - would be happy to work towards that kind of state. However, I've just realized that the classic builder uses the buildkit parser, so buildkit should probably expose some sort of interface to enable/disable heredocs, at least until the classic builder would be ready to take those changes (we don't want heredocs parsed and then ignored by the evaluator).

@thaJeztah
Copy link
Member

the problem is, a shell like sh -c doesn't read from stdin, only sh or sh -s reads from stdin, likewise for python -c and python. You'd almost want a way to configure an "interactive" and "non-interactive" shell, since not every shell neccessarily follows the -c convention.

Hmm.. that's a good point. I didn't dive into the technical implications yet (in my mind I was naively thinking heredoc would be equivalent to sh -c "<content of heredoc>"). So the default /bin/sh -c can't be used for heredoc and we'd have to switch it to /bin/sh -s for heredocs, correct?

This would indeed be an issue with custom SHELL situations. Doing a quick search for SHELL in Dockerfiles; https://grep.app/search?q=%5ESHELL&regexp=true&case=true&filter[lang][0]=Dockerfile shows various examples, e.g.;

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

While bash would support -s, I guess "blindly" doing s/-c/-s/ would be potentially dangerous, and the "safe" options here would be to either;

  • Ignore SHELL and always use the default shell (and document that it always uses the default)
    • PRO: still allows adding the <<EOF | somecommand in future
    • CON: adding support for SHELL in future would be a breaking change (existing Dockerfiles would now pick up the custom shell, which it didn't do before that)
  • Do not ignore SHELL, but produce an error if a custom SHELL is used and doesn't match the default (which could be in a parent image).
    • PRO/CON forces users to (temporarily) override the custom shell "before" the heredoc, and restore it back to the desired shell afterwards. This is annoying, but:
    • PRO: would allow adding support for custom shells in future without breaking those Dockerfiles
    • PRO: still allows adding the <<EOF | somecommand in future (same as "ignore SHELL")
    • PRO: would also produce an error if the base image would update the shell (allowing for a more informative error message)
    • CON: annoying to require "resetting" the SHELL in some cases
    • CON: does not allow for "I use a custom shell, but I know what I'm doing" (SHELL ["/bin/bash", "-o", "pipefail", "-s"])

@tonistiigi WDYT?

However, I've just realized that the classic builder uses the buildkit parser, so buildkit should probably expose some sort of interface to enable/disable heredocs, at least until the classic builder would be ready to take those changes (we don't want heredocs parsed and then ignored by the evaluator).

Yes, there was some discussion about that on #1954 (and the related moby/moby#41912). That PR was mostly considered a "quick fix", and ideally there'd be a better solution. That said; our intent is to make BuildKit the default builder (having to maintain two separate builders is a bit of a waste of time), so we should not invest "too much time" into this (worst case; document that it's not supported without BuildKit) but of course without support for Windows, we may still need some patching to keep it alive for that platform.

@thaJeztah
Copy link
Member

Let me also /cc @tianon @yosifkit - perhaps they have useful input on some of the things discussed above

@jedevc
Copy link
Member Author

jedevc commented May 27, 2021

Hmm.. that's a good point. I didn't dive into the technical implications yet (in my mind I was naively thinking heredoc would be equivalent to sh -c ""). So the default /bin/sh -c can't be used for heredoc and we'd have to switch it to /bin/sh -s for heredocs, correct?

Welp, I feel a bit silly now 😄 I did a bit of experimentation, and we can still use sh -c. That's my bad, sorry!

RUN <<EOF
whoami
ls /
EOF

could become (I think)

["sh", "-c", "whoami\nls /"]

For the possible future piping this still works ok, though we have to fiddle around a bit to keep it as close as possible to a Linux-style pipe:

RUN <<EOF | python
print("hello")
print("world")
EOF

becomes (as an example, base64 used here to stop quoting and such being a problem)

["sh", "-c", "echo cHJpbnQoImhlbGxvIikKcHJpbnQoIndvcmxkIik= | base64 -d | python"]

Apologies, I got confused! Keeping with sh -c keeps life much easier.

@thaJeztah
Copy link
Member

Welp, I feel a bit silly now 😄 I did a bit of experimentation, and we can still use sh -c. That's my bad, sorry!

Don't feel silly! It's good to call out such things (even if they turn out to be "wrong"); better to mention than regret later for not mentioning them (weird corner cases are easy to overlook, and reviewers may not think of them as well). So far, it's been a great discussion, and I learn new things every day from discussions like this.

For the possible future piping this still works ok, though we have to fiddle around a bit to keep it as close as possible to a Linux-style pipe:

Good point; wondering if the expectation should be for RUN <<EOF | python to be run in a shell or "exec(ish) form. In other words (thinking out loud here); how would I get the equivalent of python -c <heredoc content> (without it running in a shell?)

I think we'd want to avoid the combination of here docs with the JSON syntax (for "exec" form) 😬

RUN <<EOF | ["/bin/python", "-c"]

If we would default to "not run in a shell" in the pipe case, I expect many users to be confused that no variable expansion (by the shell) takes place. Also not sure how to (in that case) switch to execute it with a shell.

So yes, perhaps defaulting to sh -c (echo) <heredoc> | <command> would make most sense. Definitely one to let sink down a bit and give some thinking. (Which could be a good motivation to implementing it in a follow-up)

@justincormack
Copy link
Contributor

I don't think ["sh", "-c", "echo cHJpbnQoImhlbGxvIikKcHJpbnQoIndvcmxkIik= | base64 -d | python"] is right above, the heredoc should be piped to stdin for this syntax, without a shell or echo, the implementation should just start reading lines and copying to the pipe until it gets to the EOF. I think having redirects to pipe eg to python is better than people setting SHELL to python to achieve the same thing.

@jedevc
Copy link
Member Author

jedevc commented May 27, 2021

I don't think ["sh", "-c", "echo cHJpbnQoImhlbGxvIikKcHJpbnQoIndvcmxkIik= | base64 -d | python"] is right above, the heredoc should be piped to stdin for this syntax, without a shell or echo, the implementation should just start reading lines and copying to the pipe until it gets to the EOF.

Unfortunately, I can't see a way to express that kind of piping in LLB - I think doing so probably would require making some changes to LLB and the protobuf declarations? Not super familiar with it though, so I could be very wrong here 😄 Maybe getting that worked out is a pre-requisite for doing the shell piping syntax (and another good reason for doing it as a follow-up).

@TBBle
Copy link
Collaborator

TBBle commented May 27, 2021

I haven't fully grokked the conversation, just skimmed it a bit.

It's possible to convince cmd to run a script from stdin, but it's a little awful, and it's actually running cmd /c for each command rather than as a single file as bash -c does, i.e.

echo "whoami
whoami" | cmd /c "for /F %G in ('more') do @cmd /c %G"

does this when run under PowerShell (since echoing a string with a newline is a hassle under cmd)

> echo "whoami
>> whoami" | cmd /c "for /F %G in ('more') do @cmd /c %G"
keitaro\paulh
keitaro\paulh

As an aside, Windows PowerShell and PowerShell Core both have -Command "-" to execute a script from stdin though, like bash -s. I try to set SHELL=powershell in Dockerfiles for Windows unless I'm on nanoserver and don't have PowerShell.

It looks like the current leaning is not for piping the script through stdin though, but building the script as a parameter string?

As this PR already does, the best you can do for cmd /c is &-join commands if you want them to run irrespective of the return status of the previous command. Again, both PowerShell implementations are happy with newlines in the string passed to -Command "-".

@jedevc jedevc force-pushed the dockerfile-heredocs branch from bc513d5 to 272acb9 Compare May 28, 2021 10:17
@jedevc
Copy link
Member Author

jedevc commented May 28, 2021

Following conversations above, I've changed up a couple things:

  • RUN doesn't do the weird hacky thing with pipes and multiple shells, and invokes it much more clearly
  • COPY/ADD support multiple heredocs, so you can do things like:
    COPY <<FILE1 <<FILE2 /destination/
    first content
    FILE1
    second content
    FILE2
    This creates two files, FILE1 and FILE2 in /destination, with the contents from the heredocs.
    Whether we actually want to allow this is another question 😄 but it's nice that the current setup allows this to be possible, and the future flexibility is convenient.

@jedevc
Copy link
Member Author

jedevc commented May 28, 2021

If anyone's interested in taking a look, at the current setup, you can give it a try using jedevc/dockerfile:labs.

For example:

# syntax=jedevc/dockerfile:labs
FROM alpine:latest

ARG name=world
WORKDIR /demo

COPY <<EOF greetings/expanded
Hello ${name}!
EOF
COPY <<'EOF' greetings/not-expanded
Hello ${name}!
EOF

ADD <<file1 <<file2 files/
first bit of content
file1
second bit of content
file2

RUN <<EOF
mkdir -p cmds/
echo $(whoami) >> cmds/user
echo $(ls) >> cmds/directories
EOF

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heredocs need to be handled through stdin (or other input files). Splitting heredoc value by \n and then doing some semantics to join them back together is not valid heredoc behavior.

The echo | base64 -d hack is a nice idea but I think we can do better.

A run command is parsed as currently(eg. multi-line handling and comment skipping). If it is not in JSON format then we check if the command contains [n]<<-?name indicators and if it does parse, then collect them into separate objects by n by parsing next lines(this is mostly handled by this PR already).

Now when converting to llb.Exec, for every n we make a new file with llb.Mkfile on top of llb.Scratch and add it as a mount, eg. to /dev/inputs or /run/entrypoint etc.

The command for llb.Exec will mostly remain the same, except the heredoc indicators are replaced with the [n]</dev/input/name.

Eg.

RUN <<eof cat
foobar
eof

runs:

sh -c "</dev/pipes/0 cat"

Some additional conversions are required. If there are two heredocs on the same line with same n then they need to be manually merged together, and one of them needs to be removed from the command line. Or is there any other solution for this?

If the replaced command starts with | then that argument is just removed. I think this is heredoc-specific behavior and works the same without a pipe. I don't think we need to care about handling other redirects (eg. >) without a command.

The additional FileOp containing the Mkfile actions will create a new progress line that we can name [internal] preparing here-document. We can potentially extend LLB in the future with input files support so that this would not be needed on newer daemons.

If RUN command is just the heredoc, then the file created from it is invoked directly. Mkfile needs to make sure the file is created with +x permissions.

RUN <<eof
#!/usr/bin/env python
print("hello")
eof
sh -c "/dev/pipes/0"

Shebang is not required for shell scripts. This is automatic fallback in sh that we do not need to handle.

More examples:

RUN <<eof python
print("hello")
eof
sh -c "</dev/pipes/0 python"
RUN <<eof | python
print("hello")
eof
sh -c "</dev/pipes/0 python"
RUN <<eof
echo foo
eof
sh -c "/dev/pipes/0"
RUN <<eof 3<<eof2 myapp 
foo
eof
bar
eof2

sh -c "</dev/pipes/0 3</dev/pipes/3 myapp"

I do not know about windows support or if any of this unix-style heredoc logic applies there. It is possible Windows side has a different/custom implementation, or if Windows doesn't have any heredoc logic, then it might not be possible to use it in such context. This should not affect our design decision for the unix side.

@tonistiigi
Copy link
Member

Actually, thinking about it more, I think it is better not to keep index by n and not manually merge heredocs for them. I think it is much more useful for 2 commands within the same RUN to both use stdin form separate heredoc than for the single command to use 2 heredocs for the same input file. The latter seems like a gimmick case that we can just choose to ignore as I don't see solution for it atm without parsing shell semantics.

So ignore indexing by n. Every heredoc creates a separate file.

RUN <<eof cat > foo; <<eof2 cat >> bar
i am foo
eof
i am bar
eof2
sh -c "</dev/pipes/eof cat > foo; </dev/pipes/eof2 cat >> bar"

@tonistiigi
Copy link
Member

Also, am I missing something but why doesn't it just work if we place all the heredocs as-is into sh -c. Basically, just parse it as a single string block. I don't think this is what is happening atm though as iiuc after the split+join the individual heredocs are not preserved.

@tonistiigi tonistiigi force-pushed the dockerfile-heredocs branch 3 times, most recently from c86feac to 8cead73 Compare June 10, 2021 00:50
@tonistiigi tonistiigi force-pushed the dockerfile-heredocs branch from 8cead73 to 2d7f4ee Compare June 10, 2021 04:35
@jedevc
Copy link
Member Author

jedevc commented Jun 10, 2021

Documentation looks good to me 🎉

@thaJeztah
Copy link
Member

Sorry for going AWOL on this one; I kept half-an-eye on the conversation (and very happy to see it progressing); I'll try to have another look at this, and give it a try 🤗 🎉

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick first look at the docs, and tried some dockerfile (using # syntax=jedevc/dockerfile:labs) 🎉

left some comments.

Comment on lines +261 to +262
apt-get update
apt-get install -y vim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we describe tab stripping behaviour in this section as well? My fear is a bit that from these examples, users may conclude that (by default) leading whitespace is stripped. For these RUN examples, the shell will ignore the whitespace, so effectively they don't really affect the outcome, but there may be situations where it's important. (I might be remembering wrong, but ISTR having <TAB> in a heredoc could cause things to be evaluated differently, which may be an issue for users that use <TAB> for indentation of their Dockerfile)

Not sure what's best though (the indentation for sure "looks" nice, and makes it more readable).

@tonistiigi tonistiigi force-pushed the dockerfile-heredocs branch from 2d7f4ee to fa632c5 Compare June 10, 2021 15:08
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs LGTM

tried some builds, and those worked as expected (but not an expert on the codebase, so I'll leave more in-depth review of that to others)

Thanks!

@tonistiigi tonistiigi merged commit 4518627 into moby:master Jun 10, 2021
@jedevc jedevc deleted the dockerfile-heredocs branch June 10, 2021 16:43
@jedevc
Copy link
Member Author

jedevc commented Jun 10, 2021

Woo! 🚀 Thanks all for your help in getting this through!

At some point soon, I'll take a look at this again, and open some issues/comment on #2121 with the aim of working out what to work out for follow ups 🎉

@thaJeztah
Copy link
Member

Thank you so much, @jedevc !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants