-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caching: Enable caching & loading of container images in Forklift's cache #245
Comments
An alternative UX could be to make use of Forklift's container image cache optional, more like a transparent/read-through cache than as the primary location where contaimer images are stored: for example, maybe it's only used if there's no internet access; or it's preferentially used, but with a fallback when there's no internet; and then there could be optional flags about whether to actually save images in the Forklift cache when running Or we could add a command to re-copy all needed container images from docker/containerd into the Forklift container image cache, and run that on first boot (after multi-user.target). This way, we can keep SD card images under 2 GB while still populating Forklift's container image cache for subsequent use |
PlanktoScope/PlanktoScope#520 is relevant to this issue: it attempts to migrate off QEMU, by running Docker natively in an unbooted systemd-nspawn container on an arm64 GitHub-hosted runner. If/when it's merged, we can just interface with docker and forget about anything related to containerd. That way, we don't need to configure Docker to use the containerd storage driver. The drawback is that we can only load container images when the Docker daemon is running, but I think that's worth the simplicity we get that way. |
If gzip compression makes a big difference in the cached size of the container images, since it's relatively easy to read gzipped tar archives with dive (e.g. |
Currently, Forklift treats the Docker daemon's image storage as the only place where container images are cached. This means that Forklift can only download container images when the Docker daemon is running and only when we have permissions to talk to the Docker daemon (i.e. with
root
permissions or thedocker
usergroup).If we want to pre-cache container images (e.g. to re-use them across many CI jobs without hitting Docker Hub's public rate limits), currently we temporarily download and run crane (and also rush) and run a shell script to download container images; and then we run another shell script to load container images into the Docker daemon. Forcing the OS maintainer to include and maintain these scripts exposes a lot of complexity which we could instead hide in Forklift (and which would allow that functionality to be reused much more conveniently).
We could modify the
[dev] plt cache-img
/stage cache-img
subcommands so that they download all required container images to a local cache (e.g./var/cache/forklift/downloads/docker-archives
or~/.cache/forklift/downloads/docker-archives
) in a format which can be loaded into Docker:crane pull
which does what we need (and is ideal since we already depend on crane); if we want to stop using crane, then we'll have wrap around skopeo or https://github.com/containers/image - though to get static builds of Forklift with either of those two options, we'd need to build with dynamically-linked dependencies disabled (see https://github.com/containers/skopeo/blob/main/install.md#building-a-static-binary for additional details). On the other hand, if we can use skopeo instead of crane, then maybe we can store container images with deduplication of shared layers (i.e. in thecontainers-storage
format rather than thedocker-archive
format) to save disk space.cache-img
subcommands attempt to load the cached images into the Docker daemon.cache rm-img
shouldn't delete images which might be needed #228. In this case, we'd probably wantcache rm-img
andcache rm-all
to only touch Forklift's cache of downloaded container images, and then we'd add a newhost prune-img
command to touch the Docker daemon's image storage.forklift
was compiled for. This could be an--override-arch
flag on thecache-img
subcommands.Then we could add another subcommand (maybe
cache load-img
) to load cached images into Docker's image storage using https://pkg.go.dev/github.com/docker/docker/client#Client.ImageLoad. Maybe we should also have[dev] plt load-img
andstage load-img
subcommands to do the same thing but only for cached images required by the pallet or staged pallet bundle?It would also be useful if we could hide all the complexity currently at https://github.com/PlanktoScope/PlanktoScope/blob/0fde2af225f238380a27b07d5a31cdfa82a75402/.github/workflows/build-os.yml#L167 into a GitHub Action for downloading (with caching) all container images required by a particular pallet.
For exporting files from OCI container images, ideally those container images should be downloaded into the container image cache, and files should be loaded from the container image cache for export. Currently they're downloaded and saved to
~/.cache/forklift/downloads/oci-image-fs-tarballs
instead.The text was updated successfully, but these errors were encountered: