Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching: Enable caching & loading of container images in Forklift's cache #245

Open
ethanjli opened this issue Jun 13, 2024 · 3 comments
Open
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ethanjli
Copy link
Member

ethanjli commented Jun 13, 2024

Currently, Forklift treats the Docker daemon's image storage as the only place where container images are cached. This means that Forklift can only download container images when the Docker daemon is running and only when we have permissions to talk to the Docker daemon (i.e. with root permissions or the docker usergroup).

If we want to pre-cache container images (e.g. to re-use them across many CI jobs without hitting Docker Hub's public rate limits), currently we temporarily download and run crane (and also rush) and run a shell script to download container images; and then we run another shell script to load container images into the Docker daemon. Forcing the OS maintainer to include and maintain these scripts exposes a lot of complexity which we could instead hide in Forklift (and which would allow that functionality to be reused much more conveniently).

We could modify the [dev] plt cache-img/stage cache-img subcommands so that they download all required container images to a local cache (e.g. /var/cache/forklift/downloads/docker-archives or ~/.cache/forklift/downloads/docker-archives) in a format which can be loaded into Docker:

Then we could add another subcommand (maybe cache load-img) to load cached images into Docker's image storage using https://pkg.go.dev/github.com/docker/docker/client#Client.ImageLoad. Maybe we should also have [dev] plt load-img and stage load-img subcommands to do the same thing but only for cached images required by the pallet or staged pallet bundle?

It would also be useful if we could hide all the complexity currently at https://github.com/PlanktoScope/PlanktoScope/blob/0fde2af225f238380a27b07d5a31cdfa82a75402/.github/workflows/build-os.yml#L167 into a GitHub Action for downloading (with caching) all container images required by a particular pallet.

For exporting files from OCI container images, ideally those container images should be downloaded into the container image cache, and files should be loaded from the container image cache for export. Currently they're downloaded and saved to ~/.cache/forklift/downloads/oci-image-fs-tarballs instead.

@ethanjli ethanjli added the enhancement New feature or request label Jun 13, 2024
@ethanjli ethanjli self-assigned this Jun 14, 2024
@ethanjli ethanjli changed the title Enable caching & loading of container images in Forklift's cache caching: Enable caching & loading of container images in Forklift's cache Jun 14, 2024
@ethanjli ethanjli added this to the Backlog milestone Jun 14, 2024
@ethanjli
Copy link
Member Author

ethanjli commented Dec 4, 2024

An alternative UX could be to make use of Forklift's container image cache optional, more like a transparent/read-through cache than as the primary location where contaimer images are stored: for example, maybe it's only used if there's no internet access; or it's preferentially used, but with a fallback when there's no internet; and then there could be optional flags about whether to actually save images in the Forklift cache when running cache-img

Or we could add a command to re-copy all needed container images from docker/containerd into the Forklift container image cache, and run that on first boot (after multi-user.target). This way, we can keep SD card images under 2 GB while still populating Forklift's container image cache for subsequent use

@ethanjli
Copy link
Member Author

ethanjli commented Jan 18, 2025

PlanktoScope/PlanktoScope#520 is relevant to this issue: it attempts to migrate off QEMU, by running Docker natively in an unbooted systemd-nspawn container on an arm64 GitHub-hosted runner. If/when it's merged, we can just interface with docker and forget about anything related to containerd. That way, we don't need to configure Docker to use the containerd storage driver. The drawback is that we can only load container images when the Docker daemon is running, but I think that's worth the simplicity we get that way.

@ethanjli
Copy link
Member Author

ethanjli commented Jan 24, 2025

If gzip compression makes a big difference in the cached size of the container images, since it's relatively easy to read gzipped tar archives with dive (e.g. dive --source docker-archive <(gunzip -c container-image.tar.gz)), then it might be worth storing the archives as .tar.gz archives to minimize the size of the container images cache in any OS images (such as the PlanktoScope OS SD card images). On the other hand, maybe leaving them uncompressed will enable better deduplication of data when compressing the overall OS image file - that's something that should be tested with PlanktoScope OS SD card images before a final decision is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant