Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add include_sources to pex_binary target #16215

Merged
merged 1 commit into from
Jul 20, 2022

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Jul 19, 2022

At first glance this might seem like a nonsensical yin to include_requirements's yang. However... that's exactly what it is (minus the nonsensical part).

Consider the multi-stage build documented here: https://pex.readthedocs.io/en/latest/recipes.html#pex-app-in-a-container.

Now consider if each stage consumed not a single all-in-one PEX, but the deps stage used a PEX build with include_requirements=True and include_sources=False. Likewise, but flipped, for the srcs stage. The COPY instruction in each stage wouldn't be invalidated unless truly something going into that stage changed.

For PEXs with large reqs, this cache re-use can save a lot of time, as the compilation of deps might take a long time.

[ci skip-rust]
[ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@thejcannon
Copy link
Member Author

FYI @jsirois the PEX-meister.

@thejcannon thejcannon changed the title Add include_sources to pex_binary target Add include_sources to pex_binary target Jul 19, 2022
Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍🏽

default = True
help = softwrap(
"""
Whether to include your first party sources the binary uses in the packaged PEX file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think expanding this doc with why this could be beneficial would make sense. It could be the difference of "what, why!?" to "oh, aha!!" for a novice user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on a blog post later, then linking it in the docker docs, and probably here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that explaining how this is used and actually sketching an example might be good. It might help clarify that there are other UXs available to solve the problem.

Copy link
Member Author

@thejcannon thejcannon Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say this value mirrors include_requirements directly, which also doesn't include potential use cases.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely non-blocking comment.

I have no idea what the UX would look like, but is this something that could/should be done natively instead, maybe by something like a docker_pex_image, or a multi_stage_pex_binary target?

It's a bit odd, because currently our expectation is that the package goal produces roughly one file... although it can totally produce more. And so a multi-stage PEX build could produce multiple files with well known ... suffixes, maybe?

Also, the packed layout is potentially useful for this kind of usecase... I wonder if it would be possible to pull in the relevant portions of that layout into the two layers, rather than building two independent files?

Would be interested in John's thoughts, but again: not blocking.

default = True
help = softwrap(
"""
Whether to include your first party sources the binary uses in the packaged PEX file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that explaining how this is used and actually sketching an example might be good. It might help clarify that there are other UXs available to solve the problem.

@thejcannon
Copy link
Member Author

Definitely non-blocking comment.

I have no idea what the UX would look like, but is this something that could/should be done natively instead, maybe by something like a docker_pex_image, or a multi_stage_pex_binary target?

It's a bit odd, because currently our expectation is that the package goal produces roughly one file... although it can totally produce more. And so a multi-stage PEX build could produce multiple files with well known ... suffixes, maybe?

Also, the packed layout is potentially useful for this kind of usecase... I wonder if it would be possible to pull in the relevant portions of that layout into the two layers, rather than building two independent files?

Would be interested in John's thoughts, but again: not blocking.

Personally, I'd love to see support for "synthetic" targets, that way we expose the low-layer API like this (for this use-case and others), but also a plugin which can synthetically create these targets and build more interesting ones (while still having minimal rule piping). I've thought a lot about this very particular issue and landed on this change because we're exposing all the bits necessary to do the interesting thing, and allowing devs to take it over from there.

It'd be really hard to get the secret sauce just right on a docker_pex_image target because you still need customization points for other things like copying loose files or installing system packages. Of course multi-stage helps this, but then you need to reference the stage in your instructions. Not impossible, but either the assumptions or acrobatics start piling up and I decided to punt.

Using packed layout is certainly useful for this kind of case, but there's 2 stumbling blocks:

  • the COPY instruction doesn't have rich "match" support (e.g. match .deps and x, y, and z but not a) so you either hand-roll the COPY instructions (plural!) or end up COPYing things you don't need
  • using venv command on the built PEX requires PEX_INFO which contains a code hash. Therefore the "deps" stage gets invalidated with source-only changes.

I don't think this is the end of the road for this line of optimization, but is certainly a useful stepping stone we can provide today. Once we can leverage the facets of this design, we can see where the best next step is for Pants and/or PEX to make it simpler.

@stuhood stuhood merged commit d3918db into pantsbuild:main Jul 20, 2022
@thejcannon thejcannon deleted the pex_include_source branch July 20, 2022 18:21
@thejcannon thejcannon added this to the 2.13.x milestone Jul 20, 2022
thejcannon added a commit to thejcannon/pants that referenced this pull request Jul 20, 2022
At first glance this might seem like a nonsensical yin to `include_requirements`'s yang. However... that's exactly what it is (minus the nonsensical part).

Consider the multi-stage build documented here: https://pex.readthedocs.io/en/latest/recipes.html#pex-app-in-a-container. 

Now consider if each stage consumed not a single all-in-one PEX, but the `deps` stage used a `PEX` build with `include_requirements=True` and `include_sources=False`. Likewise, but flipped, for the `srcs` stage. The `COPY` instruction in each stage wouldn't be invalidated unless truly something going into that stage changed.

For PEXs with large reqs, this cache re-use can save a lot of time, as the compilation of `deps` might take a long time.

[ci skip-rust]
[ci skip-build-wheels]
thejcannon added a commit that referenced this pull request Jul 20, 2022
…16252)

At first glance this might seem like a nonsensical yin to `include_requirements`'s yang. However... that's exactly what it is (minus the nonsensical part).

Consider the multi-stage build documented here: https://pex.readthedocs.io/en/latest/recipes.html#pex-app-in-a-container. 

Now consider if each stage consumed not a single all-in-one PEX, but the `deps` stage used a `PEX` build with `include_requirements=True` and `include_sources=False`. Likewise, but flipped, for the `srcs` stage. The `COPY` instruction in each stage wouldn't be invalidated unless truly something going into that stage changed.

For PEXs with large reqs, this cache re-use can save a lot of time, as the compilation of `deps` might take a long time.

[ci skip-rust]
[ci skip-build-wheels]
@jsirois
Copy link
Contributor

jsirois commented Jul 21, 2022

For PEXs with large reqs, this cache re-use can save a lot of time, as the compilation of deps might take a long time.

Do you have numbers? Another approach if it truly is the compilation time is amending the advice over at Pex recipes to not compile the deps layer, then in a new layer, venv/bin/python -mcompileall venv. If this was all to 90% save compile time that would've avoided adding a feature that we may or may not churn users on in the future by deprecating or removing.

@jsirois
Copy link
Contributor

jsirois commented Jul 22, 2022

Thanks @thejcannon for providing the PEX-INFO offline. This helped me build an apples to apples repro for timing sake.

I've attached the requirements.txt distilled from that PEX-INFO and I used the following lock to (re-)build PEXes from for the tests:

pex3 lock create --resolver-version pip-2020-resolver --style strict --python python3.8 -o lock.json --indent 2 -r requirements.txt

I then just used an initially empty src/empty.py that I'd add comment lines to to force a sources-only Pex change simulation like so:

echo "# comment" >> src/empty.py
pex --lock lock.json --python python3.8 -D src/ -o context/my-app.pex --include-tools

Ok, so, using existing technology you can eliminate the time taken to bytecode-compile 3rdparty deps.

Amending the Pex recipe with a new compiled-deps layer to separate out deps --compile so we can avoid it when deps are unchanged:

FROM python:3.8-slim as deps
COPY /my-app.pex /
RUN PEX_TOOLS=1 /usr/local/bin/python3.8 /my-app.pex venv --scope deps --rm all /my-app

FROM python:3.8-slim as compiled-deps
COPY --from=deps /my-app /my-app
RUN /my-app/bin/python -m compileall /my-app

FROM python:3.8-slim as srcs
COPY /my-app.pex /
RUN PEX_TOOLS=1 /usr/local/bin/python3.8 /my-app.pex venv --scope srcs --rm all --compile /my-app

FROM python:3.8-slim
COPY --from=compiled-deps /my-app /my-app
COPY --from=srcs /my-app /my-app
ENTRYPOINT ["/my-app/pex"]

You observe the expected cache hit / skip of compilation on a sources-only Pex change:

^jsirois@gill ~/Downloads/Josh-include_sources $ docker build context/
Sending build context to Docker daemon  1.023GB
Step 1/13 : FROM python:3.8-slim as deps
 ---> 642a8290c35e
Step 2/13 : COPY /my-app.pex /
 ---> Using cache
 ---> 0d4aa874ccf5
Step 3/13 : RUN PEX_TOOLS=1 /usr/local/bin/python3.8 /my-app.pex venv --scope deps --rm all /my-app
 ---> Using cache
 ---> 9c83e6ddd0bb
Step 4/13 : FROM python:3.8-slim as compiled-deps
 ---> 642a8290c35e
Step 5/13 : COPY --from=deps /my-app /my-app
 ---> Using cache
 ---> 3270ef78b373
Step 6/13 : RUN /my-app/bin/python -m compileall /my-app
 ---> Using cache
 ---> e4679019f0fb
Step 7/13 : FROM python:3.8-slim as srcs
 ---> 642a8290c35e
Step 8/13 : COPY /my-app.pex /
 ---> Using cache
 ---> 0d4aa874ccf5
Step 9/13 : RUN PEX_TOOLS=1 /usr/local/bin/python3.8 /my-app.pex venv --scope srcs --rm all --compile /my-app
 ---> Using cache
 ---> 74d6395d476c
Step 10/13 : FROM python:3.8-slim
 ---> 642a8290c35e
Step 11/13 : COPY --from=compiled-deps /my-app /my-app
 ---> Using cache
 ---> d977b0abe224
Step 12/13 : COPY --from=srcs /my-app /my-app
 ---> Using cache
 ---> 266851a97813
Step 13/13 : ENTRYPOINT ["/my-app/pex"]
 ---> Using cache
 ---> b5a20b6ab172
Successfully built b5a20b6ab172
^jsirois@gill ~/Downloads/Josh-include_sources $ echo "# comment" >> src/empty.py 
^jsirois@gill ~/Downloads/Josh-include_sources $ pex --lock lock.json --python python3.8 -D src/ -o context/my-app.pex --include-tools
^jsirois@gill ~/Downloads/Josh-include_sources $ docker build context/
Sending build context to Docker daemon  1.023GB
Step 1/13 : FROM python:3.8-slim as deps
 ---> 642a8290c35e
Step 2/13 : COPY /my-app.pex /
 ---> e2fa5bde8cb9
Step 3/13 : RUN PEX_TOOLS=1 /usr/local/bin/python3.8 /my-app.pex venv --scope deps --rm all /my-app
 ---> Running in 015d60a0db26
Removing intermediate container 015d60a0db26
 ---> fe6b0e57765c
Step 4/13 : FROM python:3.8-slim as compiled-deps
 ---> 642a8290c35e
Step 5/13 : COPY --from=deps /my-app /my-app
 ---> Using cache
 ---> 3270ef78b373
Step 6/13 : RUN /my-app/bin/python -m compileall /my-app
 ---> Using cache
 ---> e4679019f0fb
Step 7/13 : FROM python:3.8-slim as srcs
 ---> 642a8290c35e
Step 8/13 : COPY /my-app.pex /
 ---> Using cache
 ---> e2fa5bde8cb9
Step 9/13 : RUN PEX_TOOLS=1 /usr/local/bin/python3.8 /my-app.pex venv --scope srcs --rm all --compile /my-app
 ---> Running in fb30174d8e7e
Removing intermediate container fb30174d8e7e
 ---> 84a03fe8b10f
Step 10/13 : FROM python:3.8-slim
 ---> 642a8290c35e
Step 11/13 : COPY --from=compiled-deps /my-app /my-app
 ---> Using cache
 ---> d977b0abe224
Step 12/13 : COPY --from=srcs /my-app /my-app
 ---> f9de77d60a48
Step 13/13 : ENTRYPOINT ["/my-app/pex"]
 ---> Running in 8b2b274c396e
Removing intermediate container 8b2b274c396e
 ---> aca7da3518d8
Successfully built aca7da3518d8

That's not super-enlightening re timings though. The BuildKit makes this better and also allows leveraging a persistent PEX_ROOT cache to speed up venv creation run over run.

That Dockerfile:

# syntax=docker/dockerfile:1.2

FROM python:3.8-slim as deps
RUN \
    --mount=type=cache,target=/pex-root \
    --mount=target=/context \
    PEX_ROOT=/pex-root \
    PEX_TOOLS=1 \
    /usr/local/bin/python3.8 /context/my-app.pex venv --scope deps /my-app

FROM python:3.8-slim as compiled-deps
COPY --from=deps /my-app /my-app
RUN /my-app/bin/python -m compileall /my-app

FROM python:3.8-slim as srcs
RUN \
    --mount=type=cache,target=/pex-root \
    --mount=target=/context \
    PEX_ROOT=/pex-root \
    PEX_TOOLS=1 \
    /usr/local/bin/python3.8 /context/my-app.pex venv --scope srcs --compile /my-app

FROM python:3.8-slim
COPY --from=compiled-deps /my-app /my-app
COPY --from=srcs /my-app /my-app
ENTRYPOINT ["/my-app/pex"]

And the similar build result after a sources-only change:

^jsirois@gill ~/Downloads/Josh-include_sources $ DOCKER_BUILDKIT=1 docker build -f context/Dockerfile.run-mount context/
[+] Building 0.6s (14/14) FINISHED                                                                                                                                                                                                                                               
 => [internal] load build definition from Dockerfile.run-mount                                                                                                                                                                                                              0.0s
 => => transferring dockerfile: 48B                                                                                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                           0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                             0.0s
 => resolve image config for docker.io/docker/dockerfile:1.2                                                                                                                                                                                                                0.5s
 => CACHED docker-image://docker.io/docker/dockerfile:1.2@sha256:e2a8561e419ab1ba6b2fe6cbdf49fd92b95912df1cf7d313c3e2230a333fdbcc                                                                                                                                           0.0s
 => [internal] load metadata for docker.io/library/python:3.8-slim                                                                                                                                                                                                          0.0s
 => [srcs 1/2] FROM docker.io/library/python:3.8-slim                                                                                                                                                                                                                       0.0s
 => [internal] load build context                                                                                                                                                                                                                                           0.0s
 => => transferring context: 105B                                                                                                                                                                                                                                           0.0s
 => CACHED [deps 2/2] RUN     --mount=type=cache,target=/pex-root     --mount=target=/context     PEX_ROOT=/pex-root     PEX_TOOLS=1     /usr/local/bin/python3.8 /context/my-app.pex venv --scope deps /my-app                                                             0.0s
 => CACHED [compiled-deps 2/3] COPY --from=deps /my-app /my-app                                                                                                                                                                                                             0.0s
 => CACHED [compiled-deps 3/3] RUN /my-app/bin/python -m compileall /my-app                                                                                                                                                                                                 0.0s
 => CACHED [stage-3 2/3] COPY --from=compiled-deps /my-app /my-app                                                                                                                                                                                                          0.0s
 => CACHED [srcs 2/2] RUN     --mount=type=cache,target=/pex-root     --mount=target=/context     PEX_ROOT=/pex-root     PEX_TOOLS=1     /usr/local/bin/python3.8 /context/my-app.pex venv --scope srcs --compile /my-app                                                   0.0s
 => CACHED [stage-3 3/3] COPY --from=srcs /my-app /my-app                                                                                                                                                                                                                   0.0s
 => exporting to image                                                                                                                                                                                                                                                      0.0s
 => => exporting layers                                                                                                                                                                                                                                                     0.0s
 => => writing image sha256:ca51a65259c4d56696803d5d8ada8d1674eb59c7a463f606912225858921ed98                                                                                                                                                                                0.0s
^jsirois@gill ~/Downloads/Josh-include_sources $ echo "# comment" >> src/empty.py 
^jsirois@gill ~/Downloads/Josh-include_sources $ pex --lock lock.json --python python3.8 -D src/ -o context/my-app.pex --include-tools
^jsirois@gill ~/Downloads/Josh-include_sources $ DOCKER_BUILDKIT=1 docker build -f context/Dockerfile.run-mount context/
[+] Building 18.1s (14/14) FINISHED                                                                                                                                                                                                                                              
 => [internal] load build definition from Dockerfile.run-mount                                                                                                                                                                                                              0.0s
 => => transferring dockerfile: 48B                                                                                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                           0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                             0.0s
 => resolve image config for docker.io/docker/dockerfile:1.2                                                                                                                                                                                                                0.4s
 => CACHED docker-image://docker.io/docker/dockerfile:1.2@sha256:e2a8561e419ab1ba6b2fe6cbdf49fd92b95912df1cf7d313c3e2230a333fdbcc                                                                                                                                           0.0s
 => [internal] load metadata for docker.io/library/python:3.8-slim                                                                                                                                                                                                          0.0s
 => CACHED [srcs 1/2] FROM docker.io/library/python:3.8-slim                                                                                                                                                                                                                0.0s
 => [internal] load build context                                                                                                                                                                                                                                           3.1s
 => => transferring context: 1.02GB                                                                                                                                                                                                                                         3.0s
 => [deps 2/2] RUN     --mount=type=cache,target=/pex-root     --mount=target=/context     PEX_ROOT=/pex-root     PEX_TOOLS=1     /usr/local/bin/python3.8 /context/my-app.pex venv --scope deps /my-app                                                                    7.7s
 => [srcs 2/2] RUN     --mount=type=cache,target=/pex-root     --mount=target=/context     PEX_ROOT=/pex-root     PEX_TOOLS=1     /usr/local/bin/python3.8 /context/my-app.pex venv --scope srcs --compile /my-app                                                          1.8s
 => CACHED [compiled-deps 2/3] COPY --from=deps /my-app /my-app                                                                                                                                                                                                             0.0s
 => CACHED [compiled-deps 3/3] RUN /my-app/bin/python -m compileall /my-app                                                                                                                                                                                                 0.0s
 => CACHED [stage-3 2/3] COPY --from=compiled-deps /my-app /my-app                                                                                                                                                                                                          0.0s
 => [stage-3 3/3] COPY --from=srcs /my-app /my-app                                                                                                                                                                                                                          0.0s
 => exporting to image                                                                                                                                                                                                                                                      1.0s
 => => exporting layers                                                                                                                                                                                                                                                     1.0s
 => => writing image sha256:a841e55228271267640782b78c257556486c36c5b3e29eafc2f56e8aa35b1d36

Note that BuildKit is smart enough to run the [deps 2/2] and [srcs 2/2] RUN instruction in parallel; so the wasted time is 7.7s - 1.8s = 5.9s to build the unchanged deps venv that gets thrown away when the new compile-deps layer hits its cache.

This may not be a good enough result to undo the utility of the new include_sources feature, but I just wanted to run this down since I was already uneasy about the include_requirements feature it extends the tradition of: #13894 (comment)

There are even more ways to do this of course. On a Linux host the venv could be pre-split, for example into venv.deps/ and venv.srcs/ leveraging PEX_ROOT=~/.cache/pants/named_caches/pex_root PEX_TOOLS=1 /this/python/matching/container/major/minor dist/my.pex venv --scope {deps,srcs} context/venv.{deps,srcs} and then those two venvs could be used as the COPY input. This works using Pex + Docker, but presumably not using Pants since Pants currently only deals in ./pants package outputs as inputs to its Docker integration IIUC; so maybe its this PEX as Dockerfile input limitation you're ultimately working around here.

@thejcannon
Copy link
Member Author

Aside from mounting (which I'll have to museum about) we have similar findings.

I also mused extracting the venv on the host and COPYing it, but decided against it since it bakes in additional assumptions.

In general, I think there's PEX the tool, with all it's bells and whistles and levers and knobs. Then there's PEX, Pants' Python courier. In this instance "the user" doesn't care about the PEX as anything but transportation from repo to container. Maybe it's unfortunate we put all use cases in a single target?

@jsirois
Copy link
Contributor

jsirois commented Jul 22, 2022

Maybe it's unfortunate we put all use cases in a single target?

Perhaps, yes. That's what I was getting at with:

This works using Pex + Docker, but presumably not using Pants since Pants currently only deals in ./pants package outputs as inputs to its Docker integration IIUC; so maybe its this PEX as Dockerfile input limitation you're ultimately working around here.

If your intent as a Pants user is to get a runnable venv that accurately reflects tested code installed in a Docker container all using Pants, it seems having to reference a pex_binary at all as an end user is an abstraction leak.

jyggen pushed a commit to jyggen/pants that referenced this pull request Jul 27, 2022
At first glance this might seem like a nonsensical yin to `include_requirements`'s yang. However... that's exactly what it is (minus the nonsensical part).

Consider the multi-stage build documented here: https://pex.readthedocs.io/en/latest/recipes.html#pex-app-in-a-container. 

Now consider if each stage consumed not a single all-in-one PEX, but the `deps` stage used a `PEX` build with `include_requirements=True` and `include_sources=False`. Likewise, but flipped, for the `srcs` stage. The `COPY` instruction in each stage wouldn't be invalidated unless truly something going into that stage changed.

For PEXs with large reqs, this cache re-use can save a lot of time, as the compilation of `deps` might take a long time.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants