-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make API image smaller by not including dev dependencies #4285
Conversation
It's a 7.5% reduction, I'd say that's pretty meaningful. That said, I'm having a hard time following how this works based on the diff, but I think I need to look at the broader CI/CD workflow build to understand when Instead of building two images, I wonder if we could always The general idea would be, always only build a production image, and then use entrypoint/command changes in the compose stack to augment the production image with development dependencies. That's a lot easier to understand than two distinct builds, at least to me. It's a layered rather than a branched approach. |
I will have to update the CI+CD docs. What you're saying @sarayourfriend is completely true and this PR does make the workflow quite a lot more complex that it previous was (which was already a lot). I wanted to not put time into that if the team found the size reduction quite less and not worth pursuing.
That is correct. In the CI+CD workflow, we build all the images we would need in
Whenever the API needs to be tested, we build Overall, I agree with your review, that the dev dependencies should be added in another layer but in our current setup, the issue with building a layered image is that we do not include PDM in the final image (as an agressive optimisation, and just the |
In the context of the local compose stack, we map the api directory into the container, so the pdm lockfile will be available. But I take your point that PDM itself isn't available in the "prod" build, so it's irrelevant whether the lockfile is. After reviewing the CI/CD workflow again, and in particular refreshing my understanding of the difference between the The only way to solve that would be to have a single image, with the development environment augmented in the compose context. For what it's worth, I don't see anything wrong with doing If we wanted something "pure" for at least some tests, it's worth considering whether those integration tests could be extracted into a simpler PDM project that can be pointed to any API instance, rather than running them inside the container. Then at least we could theoretically run a significant number of tests against a "pure" container, unmodified from the image that we'd push to a live environment. That's an improvement over our current testing, which always happens in an environment where the development dependencies are available. I suppose we don't actually even know whether we have a strict separation between development and non-development dependencies with our current setup (at least not in any way that's confirmed by automated testing). If there is simply no way to avoid building two images, then there's no justification for this that makes sense. The published image is smaller, but at the cost of a nearly actually doubling of the data and build time... and that on every push to a branch. That simply doesn't compare to the number of times the image is downloaded, which would be the only thing improving with this approach. |
If we build a dev API image in the only place it's used, we can skip the artifact upload, download and cache steps and at the cost of some DRYness, reduce the artifact volume. I'm interested in your thoughts on 2d9dca2. |
It looks good to me. Am I correct in understanding that the basic difference is we wouldn't upload/load the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this increase in complexity. If there are additional improvements we can make towards reusing the layers through shared caching or other approaches, that's fine. Even ignoring the size improvements, it is a basic security improvement not to include dependencies that aren't required for the deployed application. In that sense, this is a win to me regardless of whether it ultimately increases the volume of images/bytes-on-disk/bytes-on-wire.
Yes @sarayourfriend. The The production image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting discussion in this thread! I agree with the conclusions made here and support the change. Code LGTM. Do you think there's anywhere we should document this change?
Yes, there is some documentation about the CI + CD workflow that needs updating. I'll update the docs, push to this PR and then merge it. |
Full-stack documentation: https://docs.openverse.org/_preview/4285 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. Changed files 🔄: |
Fixes
Fixes #1008 by @krysal
Description
This PR separates the API image into
api
andapi_dev
, the former of which is published to GHCR and does not include dev-dependencies, while the latter is used in CI and includes dev-dependencies.It also updates the
set_matrix_images.py
code to enforce consistency in the outputs so that fallbacks and null-handling is not needed inci.yml
.Testing Instructions
api
andapi_dev
.api_dev
to run the tests (proven by the CI passing, the prod image does not have pytest).api
(cannot test without merging the PR unfortunately).Checklist
Update index.md
).main
) or a parent feature branch.just catalog/generate-docs
for catalogPRs) or the media properties generator (
just catalog/generate-docs media-props
for the catalog or
just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin