You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using only remote caching (and not remote execution), the single Store instance is configured with a remote backend as well as a local backend. This means that all usage of the Store (by design) will potentially result in remote lookups if a Digest cannot be found locally.
Having the Store backed with a remote is currently necessary, because when the remote_cache::Runnerhits the cache, it does not eagerly download all of the content represented by the cache entry. This is generally good for performance: similar to with remote execution, if you hit the cache a few times in sequence, you can skip ever downloading the actual cache entries to disk. And if you do need to download the content later, you can do so lazily.
But fetching artifacts lazily from the cache can be problematic if any of the cache content is missing from the remote store: the remote_cache module recovers from errors if data is missing, and converts the error into a cache miss. But elsewhere in the codebase, usages of the Store will not have any way (currently: longer term ticket forthcoming) to recover from a missing Digest.
To resolve this in the short term, we should add either an "eager validate" or "eager fetch" toggle for remote caching.
To "eagerly fetch", the remote_cache::Runner would be adjusted to eagerly download all of the cache content when it hits (very similar to what the local cache does).
Additionally, when constructing the Store instance for remote caching, we would give the remote_cache::Runner a Storewith a remote backend configured, but we would give the wrapped/inner Runner a clone of the Storewithout the remote backend via the Store::into_local_only method: let local_store = remote_and_local_store.clone().into_local_only();.
To "eagerly validate", the remote_cache::Runner would make network calls to validate that all content existed, without fetching all of it. This would reduce the risk of lazily fetching cache content, because errors due to missing content could only happen if content disappeared from the cache between the cache hit and when it was used later.
This approach is not as foolproof as the "eagerly fetch" mode: 1) there is still a chance that the content drops out before it can be lazily fetched, 2) the Store would still need to be configured with a remote backend, and infrastructure errors in that backend elsewhere in the code could still cause issues. The latter issue could be ameliorated with retry, but the former error would require deeper changes.
The text was updated successfully, but these errors were encountered:
Closes#11330. As explained there, lazy fetching is best for performance, but as currently factored, it can result in unrecoverable errors when remote caching fails, e.g. a digest is missing.
By eagerly fetching, we limit the surface area of where remote caching is used; this allows for our error catching logic to always be used, which gracefully falls back to re-running the process locally.
This is likely a temporary solution and this option will be likely deprecated in the future. The more robust solution is to implement #11331.
When using only remote caching (and not remote execution), the single
Store
instance is configured with a remote backend as well as a local backend. This means that all usage of theStore
(by design) will potentially result in remote lookups if aDigest
cannot be found locally.Having the
Store
backed with a remote is currently necessary, because when theremote_cache::Runner
hits the cache, it does not eagerly download all of the content represented by the cache entry. This is generally good for performance: similar to with remote execution, if you hit the cache a few times in sequence, you can skip ever downloading the actual cache entries to disk. And if you do need to download the content later, you can do so lazily.But fetching artifacts lazily from the cache can be problematic if any of the cache content is missing from the remote store: the
remote_cache
module recovers from errors if data is missing, and converts the error into a cache miss. But elsewhere in the codebase, usages of theStore
will not have any way (currently: longer term ticket forthcoming) to recover from a missingDigest
.To resolve this in the short term, we should add either an "eager validate" or "eager fetch" toggle for remote caching.
remote_cache::Runner
would be adjusted to eagerly download all of the cache content when it hits (very similar to what the localcache
does).Store
instance for remote caching, we would give theremote_cache::Runner
aStore
with a remote backend configured, but we would give the wrapped/innerRunner
a clone of theStore
without the remote backend via theStore::into_local_only
method:let local_store = remote_and_local_store.clone().into_local_only();
.remote_cache::Runner
would make network calls to validate that all content existed, without fetching all of it. This would reduce the risk of lazily fetching cache content, because errors due to missing content could only happen if content disappeared from the cache between the cache hit and when it was used later.Store
would still need to be configured with a remote backend, and infrastructure errors in that backend elsewhere in the code could still cause issues. The latter issue could be ameliorated with retry, but the former error would require deeper changes.The text was updated successfully, but these errors were encountered: