Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache backend for GitHub Actions #1947

Closed
tonistiigi opened this issue Jan 16, 2021 · 4 comments · Fixed by #1974
Closed

cache backend for GitHub Actions #1947

tonistiigi opened this issue Jan 16, 2021 · 4 comments · Fixed by #1974

Comments

@tonistiigi
Copy link
Member

Currently, a common method to reuse build cache in Github actions is to use cache action together with local cache exporter/importer.

This is inefficient as full cache needs to be saved/loaded every time, and tracking does not happen per blob. Also, weirdnesses like double compression happen because systems do not know about each other.

There is also currently a problem in local exporter where the cache is appended; meaning the directory keeps growing unless it is manually replaced. This should be fixed separately.

We can attempt to write a special cache backend for Github actions case where layer blobs are written to the cache service similar way as they are currently written to the registry on remote cache.

I started writing Go library that allows writing and loading blobs from the cache service. https://github.com/tonistiigi/go-actions-cache There are no docs afaics, so it is reversed engineered from typescript implementation.

Problems/Open topics:

  • When parallel jobs are exporting cache together, there needs to be some kind of synchronization mechanism. Not sure what is the best method to achieve locking.
  • Github cache service can not be accessed then doing local builds, even if you would provide your personal github token. So it is not a replacement for cache sharing from your CI to the users.
  • Tokens for the cache service seem to be only exposed to the actions and not the inline workflow scripts. This means we can use it in docker/build-push-action but not when docker buildx is run inline. Maybe need to make a helper action that would just make the tokens available to next commands. Don't see security implication.
  • Ideally, cache should use zStd compression by default as gzip is very slow. This requires some low-level changes as multiple compression variants for a blob is not currently allowed. Because gzip is not deterministic, once gzip blob has been created it has to be used in cache instead as it can't be recreated.

@crazy-max

@crazy-max
Copy link
Member

crazy-max commented Jan 16, 2021

@tonistiigi I've created an helper action to expose the ACTIONS_ vars injected to docker/node context: https://github.com/crazy-max/ghaction-github-runtime/runs/1714290329?check_suite_focus=true#step:5:8

@tonistiigi
Copy link
Member Author

Pushed new commits to the library. Fixed some bugs and added (fake) mutable blob support that is needed to upload the cache manifest and update it later. The atomic guarantees in the "reserve API" seem to be good enough for the synchronization issue atm. Although one problem is that there does not seem to be a way to handle upload errors and release the reserved key so it can be reused again. I added some retry and timeout logic to cover these cases.

@crazy-max
Copy link
Member

I just took a look at your commits and looks great thanks!

I wonder about the maximum size accepted for the cache at the API level. From what I've read, the total size of all caches in a repository is limited to 5 GB. So if one cache entry exceed 5GB it should fail. I haven't found any implementation reflecting this on the GitHub toolkit repo though.

Maybe we should add a test for that and see how the API behaves?

@tonistiigi
Copy link
Member Author

@crazy-max Yes we could check it one time to see how it fails and if the error is handled cleanly. I also haven't tested what would happen if you for example push 3x2GB. Does the first blob get deleted right away? Is it always the oldest blob that gets deleted or is there some internal stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants