Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker multi-stage build Dockerfile best practices #1178

Closed
2 tasks done
hozn opened this issue Jun 20, 2019 · 15 comments
Closed
2 tasks done

Docker multi-stage build Dockerfile best practices #1178

hozn opened this issue Jun 20, 2019 · 15 comments

Comments

@hozn
Copy link

hozn commented Jun 20, 2019

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Question

I continue to try to replace our pip/setuptools-based system with poetry, but hit a new snag when it comes to how we build our Docker images.

Here's the basic pattern we use for our Docker images, in a build and a deploy stage:

  1. (build) Resolve all dependencies and build wheels for them
  2. (build) Build the actual project as a wheel too
  3. (deploy) Take all of those wheels we built and install them into a lightweight image (that has no build tools)

Here's how this translates into a Dockerfile:

# ---------------------------------------------------------------------------------------
# BUILD
# ---------------------------------------------------------------------------------------

FROM gitlab.example.com:4567/namespace/build-base:py37-1.0.4 as builder

RUN mkdir -p /build/wheels

# This is separated out to take advantage of caching
ADD requirements.txt /tmp/requirements.txt

RUN pip3.7 wheel --trusted-host pypi.example.com  \
    --wheel-dir=/tmp/python-wheels --index-url http://pypi.example.com/simple/ \
    -r /tmp/requirements.txt

ADD . /src
WORKDIR /src

RUN pip3.7 wheel --find-links /tmp/python-wheels --trusted-host=pypi.example.com --wheel-dir=/build/wheels .

# ---------------------------------------------------------------------------------------
# DEPLOY
# ---------------------------------------------------------------------------------------

FROM gitlab.example.sec:4567/namespace/deploy-base:py37-1.0.0 as deploy

WORKDIR /opt/app

# Copy the already-built wheels
COPY --from=builder /build/wheels /tmp/wheels

# Install into main system python.
RUN pip3.7 install --no-cache-dir /tmp/wheels/* && rm -rf /tmp/wheels

CMD ["myproject-server"]

How do I do this with poetry? -- in the most short-sighted form, I'd like to know how to collect all dependencies as wheels in order to match this pattern.

However, my real requirement here is just to have separate build and deploy stages where the deploy image has no python (or lower-level) build-related tools installed (but does have pip) and simply takes artifacts from the build image.

(I suppose one idea would be to treat the entire virtualenv from the build stage as an artifact? That seems a little dirty, but provided the base OS images were the same, might work?)

@hozn hozn changed the title Docker best practices / collect dependency wheels? Dockerfile multi-stage build best practices Jun 20, 2019
@hozn hozn changed the title Dockerfile multi-stage build best practices Docker multi-stage build Dockerfile best practices Jun 20, 2019
@dojeda
Copy link

dojeda commented Jul 24, 2019

I too have this exact use case (the short-sighted one: use an intermediate image to generate wheels); I could not find how to do this with poetry. I haven't been able to create wheels of all dependencies, but creating a wheel of my main package does work with poetry build --format wheel.

@hozn
Copy link
Author

hozn commented Jul 24, 2019

Yeah, this and a couple other issues caused us to switch back to good-old pip/setuptools, which at least provides the flexibility to support our development workflow -- if a bit less elegant.

@dpraul
Copy link

dpraul commented Aug 2, 2019

We've had relatively good success copying virtualenvs between images with pip. I'm just beginning to see if we can transition to poetry - we've just been using it for small utilities right now but we're using the following code to do what you're describing:

FROM python:3.7.4-slim as python-base
ENV PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100 \
    POETRY_PATH=/opt/poetry \
    VENV_PATH=/opt/venv \
    POETRY_VERSION=0.12.17
ENV PATH="$POETRY_PATH/bin:$VENV_PATH/bin:$PATH"

FROM python-base as poetry
RUN apt-get update \
    && apt-get install --no-install-recommends -y \
        # deps for installing poetry
        curl \
        # deps for building python deps
        build-essential \
    \
    # install poetry - uses $POETRY_VERSION internally
    && curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | python \
    && mv /root/.poetry $POETRY_PATH \
    && poetry --version \
    \
    # configure poetry & make a virtualenv ahead of time since we only need one
    && python -m venv $VENV_PATH \
    && poetry config settings.virtualenvs.create false \
    \
    # cleanup
    && rm -rf /var/lib/apt/lists/*

COPY poetry.lock pyproject.toml ./
RUN poetry install --no-interaction --no-ansi -vvv

FROM python-base as runtime
WORKDIR /app

COPY --from=poetry $VENV_PATH $VENV_PATH
COPY . ./

ENTRYPOINT ["python", "-m", "app"]

Haven't figured out a clean way to work with prod vs. dev dependencies

@hozn
Copy link
Author

hozn commented Aug 2, 2019

Thanks, @dpraul , this is a very helpful reference.

@dpraul
Copy link

dpraul commented Aug 2, 2019

No problem! Disclaimer: not heavily tested, so YMMV. Open on suggestions for how to improve it

@stale
Copy link

stale bot commented Nov 13, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 13, 2019
@stale
Copy link

stale bot commented Nov 20, 2019

Closing this issue automatically because it has not had any activity since it has been marked as stale. If you think it is still relevant and should be addressed, feel free to open a new one.

@abersheeran
Copy link

https://github.com/Docker-s-IMAGES/no-pypoetry

FROM abersh/no-pypoetry as requirements

FROM python:3.7

# ... yourself commands

COPY --from=requirements /src/requirements.txt .

RUN pip install -r requirements.txt

# ... yourself commands

You can use this multi-stage Dockerfile to ensure that you don't need to install a poetry in your final image. This can reduce the installation of many useless dependencies.

@foosinn
Copy link

foosinn commented Dec 21, 2021

Hey,

I found this solution for me to work best:

FROM python:3.9-alpine AS builder
WORKDIR /app
ADD pyproject.toml poetry.lock /app/

RUN apk add build-base libffi-dev
RUN pip install poetry
RUN poetry config virtualenvs.in-project true
RUN poetry install --no-ansi

# ---

FROM python:3.9-alpine
WORKDIR /app

COPY --from=builder /app /app
ADD . /app

RUN addgroup -g 1000 app
RUN adduser app -h /app -u 1000 -G 1000 -DH
USER 1000

# change this to match your application
CMD /app/.venv/bin/python -m module_name
# or
CMD /app/.venv/bin/python app.py

Dont forget a .dockerignore:

.git/
__pycache__/
**/__pycache__/
*.py[cod]
*$py.class

Ticks all my boxes:

  • no need for a requirements file
  • the virtualenv is managed by poetry
  • no poetry in the final image
  • application and venv contained in one folder
  • python application can not write to its files or the virtualenv
  • virtual env is only rebuild if pyproject.toml or poetry.lock change

Just make sure to use the same path in the builder and the final image, virtualenv uses some hardcoded paths. Change the CMD to match your application.

EDIT: Updated to reflect some of @alexpovel's critics

@alexpovel
Copy link
Contributor

alexpovel commented Sep 6, 2022

Building on top of @foosinn 's approach:

# Global ARG, available to all stages (if renewed)
ARG WORKDIR="/app"

FROM python:3.10 AS builder

# Renew (https://stackoverflow.com/a/53682110):
ARG WORKDIR

# Don't buffer `stdout`:
ENV PYTHONUNBUFFERED=1
# Don't create `.pyc` files:
ENV PYTHONDONTWRITEBYTECODE=1

RUN pip install poetry && poetry config virtualenvs.in-project true

WORKDIR ${WORKDIR}
COPY . .

RUN poetry install --only main

FROM python:3.10-alpine

ARG WORKDIR

WORKDIR ${WORKDIR}

COPY --from=builder ${WORKDIR} .

# For options, see https://boxmatrix.info/wiki/Property:adduser
RUN adduser app -DHh ${WORKDIR} -u 1000
USER 1000

# App-specific settings:
EXPOSE 8080
ENTRYPOINT [ "./.venv/bin/python", "-m", "ancv" ]
CMD [ "serve", "api", "--port", "8080" ]

from here. As opposed to simply using python:3.10 (1GB) or python:3.10-slim and then installing build tools (c. 600 MB) manually, this now lands at 200 MB. A decent win. python:3.10-alpine is c. 50 MB by itself, so in this example, I am still 150 MB above that baseline, but it's good enough.

Changes from the previous post:

  • global ARG makes it DRY, only having to specify /app once

  • I reckon using python:3.10 to build is easier than pulling some slim or alpine version, then installing build tools manually:

    • all build tools are already there, no need to guess (do I need gcc and/or g++?)
    • caching and layering works better; python:3.10 etc. are probably cached and used a ton the world over, whereas RUN install build-tools and friends will always have to run and won't be easily cached
  • development and other dependencies aren't installed, --only main (the core) is; this requires poetry v1.2.0 and higher. This cuts a lot of build time in my case

  • didn't bother with being surgical with COPY, just COPY . .; I prefer excluding using a .dockerignore to filter COPY. It's more maintainable. For example, you wouldn't want to copy __pycache__ from your package on disk into the image, yet that's gonna happen with any COPY anyway. So you'd want a .dockerignore anyway probably. It looks similar to a .gitignore (but don't quote me exactly how it works...), for example:

    .git/
    # Byte-compiled / optimized / DLL files
    __pycache__/
    **/__pycache__/
    *.py[cod]
    *$py.class
  • I believe the -g flag to busybox's adduser isn't what it looks like, it sets the GECOS field, not the group ID. That's -G, but then group 1000 doesn't exist so -G 1000 errors out. It's not needed: it seems the group ID is then set automatically anyway:

    ~ $ id
    uid=1000(app) gid=1000(app)

@neersighted
Copy link
Member

Thanks for bumping this -- the pattern you describe here is another one to polish and roll up into the work described at #6398 (which is mostly about containers despite ostensibly being not specific to them, as containers are usually the only place install into system site-packages comes up).

@foosinn
Copy link

foosinn commented Sep 7, 2022

@alexpovel thanks for you input. I've updated some points where i fully agree.

thanks a lot!

@max-pfeiffer
Copy link

max-pfeiffer commented Oct 8, 2022

I provide two docker images that you can use as builder base for multistage builds. They also contain an example application which demonstrates multistage builds with Poetry:

I aim for the best practices there. Please check them out and leave a comment if you have additional suggestions.

@willbush
Copy link

willbush commented Oct 4, 2023

@foosinn thanks that helped me a lot. I had to modify the adduser / addgroup lines, and personally modified mine for flask.

FROM python:3.10-alpine AS builder

WORKDIR /app

ADD pyproject.toml poetry.lock /app/

RUN apk add build-base libffi-dev

RUN pip install poetry
RUN poetry config virtualenvs.in-project true
RUN poetry install --no-ansi

FROM python:3.10-alpine
WORKDIR /app

COPY --from=builder /app /app
ADD . /app

# Create a new group `app` with Group ID `1000`.
RUN addgroup --gid 1000 app
# Create a new user `app`, sets home directory to `/app`, User ID `1000`, in
# the group `app`. The `-DH` option results in a system account.
RUN adduser app -h /app -u 1000 -G app -DH
# Change the user for subsequent commands in Dockerfile to the user with ID
# `1000`.
USER 1000

CMD ["/app/.venv/bin/gunicorn", "--bind", ":80", "app:app"]

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants