Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an "eager fetch/validate" mode to remote caching #11330

Closed
stuhood opened this issue Dec 17, 2020 · 0 comments · Fixed by #11396
Closed

Add an "eager fetch/validate" mode to remote caching #11330

stuhood opened this issue Dec 17, 2020 · 0 comments · Fixed by #11396
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented Dec 17, 2020

When using only remote caching (and not remote execution), the single Store instance is configured with a remote backend as well as a local backend. This means that all usage of the Store (by design) will potentially result in remote lookups if a Digest cannot be found locally.

Having the Store backed with a remote is currently necessary, because when the remote_cache::Runner hits the cache, it does not eagerly download all of the content represented by the cache entry. This is generally good for performance: similar to with remote execution, if you hit the cache a few times in sequence, you can skip ever downloading the actual cache entries to disk. And if you do need to download the content later, you can do so lazily.

But fetching artifacts lazily from the cache can be problematic if any of the cache content is missing from the remote store: the remote_cache module recovers from errors if data is missing, and converts the error into a cache miss. But elsewhere in the codebase, usages of the Store will not have any way (currently: longer term ticket forthcoming) to recover from a missing Digest.

To resolve this in the short term, we should add either an "eager validate" or "eager fetch" toggle for remote caching.

  • To "eagerly fetch", the remote_cache::Runner would be adjusted to eagerly download all of the cache content when it hits (very similar to what the local cache does).
    • Additionally, when constructing the Store instance for remote caching, we would give the remote_cache::Runner a Store with a remote backend configured, but we would give the wrapped/inner Runner a clone of the Store without the remote backend via the Store::into_local_only method: let local_store = remote_and_local_store.clone().into_local_only();.
  • To "eagerly validate", the remote_cache::Runner would make network calls to validate that all content existed, without fetching all of it. This would reduce the risk of lazily fetching cache content, because errors due to missing content could only happen if content disappeared from the cache between the cache hit and when it was used later.
    • This approach is not as foolproof as the "eagerly fetch" mode: 1) there is still a chance that the content drops out before it can be lazily fetched, 2) the Store would still need to be configured with a remote backend, and infrastructure errors in that backend elsewhere in the code could still cause issues. The latter issue could be ameliorated with retry, but the former error would require deeper changes.
@Eric-Arellano Eric-Arellano self-assigned this Dec 17, 2020
Eric-Arellano added a commit that referenced this issue Jan 7, 2021
Closes #11330. As explained there, lazy fetching is best for performance, but as currently factored, it can result in unrecoverable errors when remote caching fails, e.g. a digest is missing.

By eagerly fetching, we limit the surface area of where remote caching is used; this allows for our error catching logic to always be used, which gracefully falls back to re-running the process locally.

This is likely a temporary solution and this option will be likely deprecated in the future. The more robust solution is to implement #11331.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants