-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile heredocs #2132
Dockerfile heredocs #2132
Conversation
2f41375
to
fa64578
Compare
Yes, this needs to be improved if I understand the current code correctly. Heredoc should be only allowed in certain commands and in certain positions. Heredoc in a random command or in a random place needs to cause an error. Heredoc prefix inside the string of run command (or any other command) should behave as it did before. |
aec3d64
to
eb95843
Compare
Sorry for all the commit spam, I'll try and squash into a better series of commits once I've got some code review 😄 I've gone and added some more complete tests (and caught some weird behavior in the process), and specifically allowed only the Still need to work on this bit though and handle the case where a heredoc appears as part of an existing RUN command; this is a bit tricky, as is looks very similar to the ADD/COPY case, so somehow, the parser has to differentiate between them, probably in a similar manner to how I've done the allow-list for commands. Related to that do you think about the (admittedly, rather strange) case if a Dockerfile has done something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I left some comments / thoughts in line (haven't looked really closely at everything, so mostly "thoughts")
Perhaps not for the first iteration, but expanding this to also support ENV
, ARG
, LABEL
would be interesting (I know some users have asked for multi-line values in ENV
(e.g.), which would be easier with here docs).
Also, reading https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04, which was linked from moby/moby#34423 (comment), I noticed that multiple here docs can be combined (tbh, didn't know that worked);
cat <<eof1; cat <<eof2
Hi,
eof1
Helene.
eof2
I'm not exactly sure if we want to add support for this, but if we want to keep the option open, we could look at how that would fit in the Dockerfile syntax.
One possibly useful case for this would be to set (e.g.) multiple environment variables using heredoc;
ENV ONE=<<EOF1; TWO=<<EOF2
content of
env ONE
EOF1
content of
env TWO
EOF2
Definitely one we don't need for the first iteration / implementation, as long as we keep our options open.
Finally, moby/moby#34423 (comment) contains two suggestions (not sure if we want those, but leaving them here);
Using | sh
to pipe the output (not sure if it adds value);
# 2, is this to change the shell that runs the script?
RUN <<EOF | sh
echo aa
EOF
Or alternatively, supporting shebang;
# Is there a need for this special sytax when you have shebang and the `SHELL` command
# the parser would have to check the first line for it so that it can exec the right thing
RUN <<EOF
#!/bin/sh
echo aa
EOF
(I'm not sure the above examples add much value, so leaning towards "don't implement those")
Whoops didn't see your last comment when posting
I (personally) think that it's ok to iterate over this feature (we can start small, and extend after that). That said; it's good to at least think of what this feature would look like for other commands to make sure we have not painted ourself in a corner. I left some comments along those lines. It's also good to read the discussion on moby/moby#34423 - I gave it a quick "once over" for use-cases and examples, but perhaps I missed some.
|
Woo 🎉 thanks so much for your initial thoughts, they're really helpful! I'm expecting that given the size of the feature (and the number of potential use cases we want to solve for), there might be a few rounds of review 😅 Have replied to most of your inline comments, most of them are pretty quick fixes. So in my imagination the the development of heredocs would look something like this:
So I really really like the idea of supporting multiple heredocs in a command, had no idea that that was even a possible feature! From an implementation point-of-view, this is quite doable - the code just switches from using a single possible heredoc attached to each command AST node to a COPY <<file1 <<file2 /destfolder/
contents for the first file
file1
contents for the second file
file2 Implementing the ability to do this in the parser primitives would be a really good idea for future flexibility, regardless of whether it ends up being used in For the piping suggestion, while I'm not sold on it personally, it should be quite trivial to actually implement, see here. Currently, it's expressed quite neatly as a pipe into A possible use case, inspired by the Base64 encoding of files in moby#34423 (comment): RUN <<EOF | base64 -d > /res1
n8un0dwuxO/vcAgj/wgcWIUvYLGN6GPcpr45veKL/WiRBOZfXkqsEE+vX4w385Zwh1IsBS0KHU4I
ByaSHRys6TnaZmhxQSitGwzDr48QDDX4Fft/L9pW401WyK10Q5uPAOrhEHuPfEEQyHGK8xpc+3Nu
V7pZ4Cx5kRARZkg8zhyBsurzG6eCvFoIKJr2/Ps+AEwUiuAMGjzPmZ2f9/Gmu1I9m2I7u9EBfJTE
l+HqgqIl3xZMj488CNui0v/bO7OBkvC7xA0zwzhDWN+CC+VSicOsIc5VPvOOn1lAmS0k1SBoocKB
tBY67fETzeAEQf5Vplg997igTO7XIkxSTkvhlNhffe/QSOUYjSdR5Wzm/KF0X4ztwGsJsgn/viMH
oEbzdvmeZ/N6fh9DRwQmX2LXUUEYRjLn1RyyYhpWOd9EmhndL1yxcXu7CRGunHxvbmUVgCFO0k0+
EuQF1FhYlaSq/cZty5sEBTdc2K508DEtOjyt8dCwz2H8dTl0n4mrZDH0Tuftpu2ji/VyBjoMKCUx
AFQkAdVJhx1nHr2u5KTxvq35hLeVicPcSlv0DAXzbXoEgrYYSFyx16bCW02T1ZKd9RSfTcWhUXLo
WvpNg3y0sIUs1kFsAqO43lbYmVdMqDn/qOMHH+HuQf+/xXd2NWL9M9uZbDjLxBYstA7SMo5oqLQ=
EOF This syntax wouldn't affect how the low-level parser would work, and would just be contained to the Run command. Because it's a relatively simple change to make, I think it might be worth including. Supporting shebangs, I'm not very sold on. Mostly because, (I think) shebangs are interpreted by the kernel, and not by the shell - so to get a script like this to execute, we'd need to do some messing around by saving the file before running it, instead of being able to express it in a single command line. If we do want to support different shells, then we could do that with the piping syntax above. Sorry for the wall of text, just my views on what the syntax could look like. With Glancing over the original proposal, I think that about 80% of the use cases in there can be solved with just heredoc support in those 3 commands. The only major thing that's not covered is heredocs stored as Thanks for your review again, really having quite a lot of fun deving this feature 😄 |
Glancing through the original proposal again, it seems like there might be other options for this kind of feature, in terms of syntax. To summarize some of the options:
|
eb95843
to
a552ee2
Compare
Sounds good from my perspective (but input from @tonistiigi is always welcome as he's way more familiar with this repository)
👍 that sound great, and I agree, it's fine to start with "something" and iterate.
:fist-bump: I literally learned it this morning when I glanced over those docs (... wait .. that's possible ??)
👍 yup! It's good to keep the door open for this
Yes. Trying my head around it a bit to think of pros/cons. For "non-from-scratch" images, it could be solved by just doing that inside a shell, but could be useful for
Same. I think it's preferable to keep the "content" of the here-doc (basically) an "opaque" bit to BuildKit (except for substituting variables). Adding support for shebangs etc may take us "down the rabbithole", better left to the container's process (we don't want to re-invent Bash or the kernel).
I'll defer that one to @tonistiigi 😅
and
Yes, I don't think we should take those approaches. I recall there have been long (looong) discussions about "begin/end" syntaxes like that (before we had multi-stage builds); many of those were to "group actions in a single layer". I'd like to avoid going down that rabithole with this feature, as there's a lot of things involved there (including "caching"). Overall, I think the heredoc approach is the "right" one;
Only downside (somewhat) is that it's really a Linux concept, so somewhat "foreign" to Windows users. I think the benefits outweigh that downside. |
For Windows (probably Linux as well), the only thing we should check is (possibly) line continuation symbols within heredoc. Make sure (and document) that within heredoc (in the The RUN <<EOF
echo hello \
world
EOF Likewise: # escape=`
FROM busybox
# here ` is the line-continuation symbol (as set in `# escape=`
RUN echo hello`
world
RUN <<EOF
# here, \ is the shell's line-continuation, not ` (backtick)
echo hello \
world
EOF (Input welcome on this) |
1f5595a
to
6a4ad6d
Compare
Ok, might be good to shelve that in the "maybe another PR later" category. However, for now, should we try and use the shell configured by Another use case for it that I just thought of: might be nice to be able to run python scripts easily if python is installed? i.e. instead of: RUN python -c 'print("hello world"); print("foo bar")' we could have, RUN <<EOF | python
print("hello world")
print("foo bar")
EOF Obviously this could be extendable to other languages 😝
For line continuations, atm, the parser ignores every single character till it gets to the heredoc line, so the continuation works as you described. However, this only happens on Linux: on Windows, it's a bit odd since I'm not sure how to pipe multiple commands into |
Let's not expand this too quickly. "ENV, ARG, LABEL" would be a completely different concept. The logical place to use it is where currently path is required and can be replaced with an anonymous file. Shell doesn't support anything like this either. Some people may find it as a cool alternative syntax but it is not consistent with anything else. |
Yes, I think the expectation would be that all
That's a good point; not sure we'd have a real solution for that though, other than "improve the error message" (if possible)?
Ah! Yes, I like that idea. I'd still be fine if that would be added as a "follow-up", but I like the general idea Before that's implemented, the equivalent would be this (correct?) SHELL ["/bin/python", "-c"]
RUN <<EOF
print("hello world")
print("foo bar")
EOF
# restore default shell
SHELL ["/bin/sh", "-c"]
👍 perfect
It would be (mostly) theoretical (for now) as BuildKit is not yet supported on Windows. That said; it's something to think about for "when it will be supported". That said; if this code will also be used by the classic builder, we will need a way to indicate "this is using heredoc", and produce a sensible error if it's used. Perhaps @TBBle or @olljanat know about the "pipe into cmd"?
That's fair. OK to exclude those. |
Yup, either you'd have to switch shells midway through the build, or write a file containing the script, and Actually I think I oversimplified the idea of just being able to use the # maybe something like this?
SHELL ["/bin/sh", "-c"]
SHELL --interactive ["/bin/sh", "-s"] Maybe for now, we could have
Ah interesting, I wasn't aware of that! Thankfully it shouldn't be too difficult to add support later, if it doesn't make it in for now. Also, the idea of getting this into the classic builder - would be happy to work towards that kind of state. However, I've just realized that the classic builder uses the buildkit parser, so buildkit should probably expose some sort of interface to enable/disable heredocs, at least until the classic builder would be ready to take those changes (we don't want heredocs parsed and then ignored by the evaluator). |
Hmm.. that's a good point. I didn't dive into the technical implications yet (in my mind I was naively thinking heredoc would be equivalent to This would indeed be an issue with custom SHELL ["/bin/bash", "-o", "pipefail", "-c"] While bash would support
@tonistiigi WDYT?
Yes, there was some discussion about that on #1954 (and the related moby/moby#41912). That PR was mostly considered a "quick fix", and ideally there'd be a better solution. That said; our intent is to make BuildKit the default builder (having to maintain two separate builders is a bit of a waste of time), so we should not invest "too much time" into this (worst case; document that it's not supported without BuildKit) but of course without support for Windows, we may still need some patching to keep it alive for that platform. |
Welp, I feel a bit silly now 😄 I did a bit of experimentation, and we can still use RUN <<EOF
whoami
ls /
EOF could become (I think) ["sh", "-c", "whoami\nls /"] For the possible future piping this still works ok, though we have to fiddle around a bit to keep it as close as possible to a Linux-style pipe: RUN <<EOF | python
print("hello")
print("world")
EOF becomes (as an example, base64 used here to stop quoting and such being a problem) ["sh", "-c", "echo cHJpbnQoImhlbGxvIikKcHJpbnQoIndvcmxkIik= | base64 -d | python"] Apologies, I got confused! Keeping with |
Don't feel silly! It's good to call out such things (even if they turn out to be "wrong"); better to mention than regret later for not mentioning them (weird corner cases are easy to overlook, and reviewers may not think of them as well). So far, it's been a great discussion, and I learn new things every day from discussions like this.
Good point; wondering if the expectation should be for I think we'd want to avoid the combination of here docs with the JSON syntax (for "exec" form) 😬 RUN <<EOF | ["/bin/python", "-c"] If we would default to "not run in a shell" in the So yes, perhaps defaulting to |
I don't think |
Unfortunately, I can't see a way to express that kind of piping in LLB - I think doing so probably would require making some changes to LLB and the protobuf declarations? Not super familiar with it though, so I could be very wrong here 😄 Maybe getting that worked out is a pre-requisite for doing the shell piping syntax (and another good reason for doing it as a follow-up). |
I haven't fully grokked the conversation, just skimmed it a bit. It's possible to convince cmd to run a script from stdin, but it's a little awful, and it's actually running
does this when run under PowerShell (since echoing a string with a newline is a hassle under cmd)
As an aside, Windows PowerShell and PowerShell Core both have It looks like the current leaning is not for piping the script through stdin though, but building the script as a parameter string? As this PR already does, the best you can do for |
bc513d5
to
272acb9
Compare
Following conversations above, I've changed up a couple things:
|
If anyone's interested in taking a look, at the current setup, you can give it a try using jedevc/dockerfile:labs. For example: # syntax=jedevc/dockerfile:labs
FROM alpine:latest
ARG name=world
WORKDIR /demo
COPY <<EOF greetings/expanded
Hello ${name}!
EOF
COPY <<'EOF' greetings/not-expanded
Hello ${name}!
EOF
ADD <<file1 <<file2 files/
first bit of content
file1
second bit of content
file2
RUN <<EOF
mkdir -p cmds/
echo $(whoami) >> cmds/user
echo $(ls) >> cmds/directories
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heredocs need to be handled through stdin (or other input files). Splitting heredoc value by \n
and then doing some semantics to join them back together is not valid heredoc behavior.
The echo | base64 -d
hack is a nice idea but I think we can do better.
A run command is parsed as currently(eg. multi-line handling and comment skipping). If it is not in JSON format then we check if the command contains [n]<<-?name
indicators and if it does parse, then collect them into separate objects by n
by parsing next lines(this is mostly handled by this PR already).
Now when converting to llb.Exec
, for every n
we make a new file with llb.Mkfile
on top of llb.Scratch
and add it as a mount, eg. to /dev/inputs
or /run/entrypoint
etc.
The command for llb.Exec
will mostly remain the same, except the heredoc indicators are replaced with the [n]</dev/input/name
.
Eg.
RUN <<eof cat
foobar
eof
runs:
sh -c "</dev/pipes/0 cat"
Some additional conversions are required. If there are two heredocs on the same line with same n
then they need to be manually merged together, and one of them needs to be removed from the command line. Or is there any other solution for this?
If the replaced command starts with |
then that argument is just removed. I think this is heredoc-specific behavior and works the same without a pipe. I don't think we need to care about handling other redirects (eg. >
) without a command.
The additional FileOp
containing the Mkfile
actions will create a new progress line that we can name [internal] preparing here-document
. We can potentially extend LLB in the future with input files support so that this would not be needed on newer daemons.
If RUN
command is just the heredoc, then the file created from it is invoked directly. Mkfile
needs to make sure the file is created with +x
permissions.
RUN <<eof
#!/usr/bin/env python
print("hello")
eof
sh -c "/dev/pipes/0"
Shebang is not required for shell scripts. This is automatic fallback in sh
that we do not need to handle.
More examples:
RUN <<eof python
print("hello")
eof
sh -c "</dev/pipes/0 python"
RUN <<eof | python
print("hello")
eof
sh -c "</dev/pipes/0 python"
RUN <<eof
echo foo
eof
sh -c "/dev/pipes/0"
RUN <<eof 3<<eof2 myapp
foo
eof
bar
eof2
sh -c "</dev/pipes/0 3</dev/pipes/3 myapp"
I do not know about windows support or if any of this unix-style heredoc logic applies there. It is possible Windows side has a different/custom implementation, or if Windows doesn't have any heredoc logic, then it might not be possible to use it in such context. This should not affect our design decision for the unix side.
Actually, thinking about it more, I think it is better not to keep index by So ignore indexing by
|
Also, am I missing something but why doesn't it just work if we place all the heredocs as-is into |
c86feac
to
8cead73
Compare
8cead73
to
2d7f4ee
Compare
Documentation looks good to me 🎉 |
Sorry for going AWOL on this one; I kept half-an-eye on the conversation (and very happy to see it progressing); I'll try to have another look at this, and give it a try 🤗 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a quick first look at the docs, and tried some dockerfile (using # syntax=jedevc/dockerfile:labs
) 🎉
left some comments.
apt-get update | ||
apt-get install -y vim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we describe tab
stripping behaviour in this section as well? My fear is a bit that from these examples, users may conclude that (by default) leading whitespace is stripped. For these RUN
examples, the shell will ignore the whitespace, so effectively they don't really affect the outcome, but there may be situations where it's important. (I might be remembering wrong, but ISTR having <TAB>
in a heredoc could cause things to be evaluated differently, which may be an issue for users that use <TAB>
for indentation of their Dockerfile)
Not sure what's best though (the indentation for sure "looks" nice, and makes it more readable).
Signed-off-by: Tonis Tiigi <[email protected]>
2d7f4ee
to
fa632c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs LGTM
tried some builds, and those worked as expected (but not an expert on the codebase, so I'll leave more in-depth review of that to others)
Thanks!
Woo! 🚀 Thanks all for your help in getting this through! At some point soon, I'll take a look at this again, and open some issues/comment on #2121 with the aim of working out what to work out for follow ups 🎉 |
Thank you so much, @jedevc ! |
relates to moby/moby#34423
As mentioned in #2121, I've been making progress towards implementing heredocs in Dockerfiles, and thought it might be time to open a PR for it 🎉
I've essentially got all the functionality I think we'd need before wanting to merge, though I'm sure there's some fixes/tests to write before that.
Things that definitely need resolving before a merge is really possible:
cmd
so the current hacky approach, might be the best?)I'd really appreciate any feedback anyone has on the current design and implementation!