Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedupe load_bytes_with calls #15524

Closed
stuhood opened this issue May 18, 2022 · 1 comment · Fixed by #15901
Closed

Dedupe load_bytes_with calls #15524

stuhood opened this issue May 18, 2022 · 1 comment · Fixed by #15901
Assignees
Milestone

Comments

@stuhood
Copy link
Member

stuhood commented May 18, 2022

In order to fix #11331, we'll need to dedupe load_bytes_with calls (similar to what #12087 did for store_bytes).

Since Store::load_bytes_with backfills to the local store, the deduping should likely involve a hashmap of async mutexes keyed by digest (or a cheaper equivalent) which prevent multiple callers from concurrently fetching a single digest, and re-checking the local store afterward.

@stuhood
Copy link
Member Author

stuhood commented Jun 17, 2022

Having reviewed the code, I think that most of #12087 should be reusable, with some exceptions. The implementation for uploads:

  • Is not cancellation safe: if an attempt to upload bytes is cancelled, there is no Drop-guard that will remove the attempt from the in_flight_uploads map.
  • Doesn't allow concurrent attempts to wait. Because it short-circuits a duplicated attempt rather than causing the caller to "block" for the attempt to complete, a caller would need to poll the method until a condition was true.

To resolve this, I'd suggest:

  1. refactor in_flight_uploads into a parking_lot::Mutex<HashMap<Digest, tokio::sync::OnceCell<()>>>. The actual upload attempt would be guarded under the OnceCell, which would allow a caller to wait for the upload attempt to have completed successfully, or otherwise execute it itself.
  2. add a second in_flight_downloads map of the same shape, and use it in Store::load_bytes_with, with the OnceCell body wrapping downloading, verifying, and storing the content.
    • Rather than having the OnceCell hold the content in memory (i.e. to continue to use OnceCell<()>), add an infallible local.load_bytes_with call after local.store_bytes. This will have the effect of copying to local disk, and then loading from disk to actually run the user function.

@stuhood stuhood self-assigned this Jun 22, 2022
stuhood added a commit that referenced this issue Jun 23, 2022
As described in #15524: `remote::ByteStore::load_bytes_with` calls are not deduped currently, meaning that if multiple consumers identify a `Digest` which is missing from the local store, they might concurrently fetch it from the remote store.

This is primarily an issue with `--remote-cache-eager-fetch=false`, as the laziness means that all consumers of a process output might consider whether to download it simultaneously (rather than the output always being downloaded before the process is called complete).

Fixes #15524.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Jun 24, 2022
As described in pantsbuild#15524: `remote::ByteStore::load_bytes_with` calls are not deduped currently, meaning that if multiple consumers identify a `Digest` which is missing from the local store, they might concurrently fetch it from the remote store.

This is primarily an issue with `--remote-cache-eager-fetch=false`, as the laziness means that all consumers of a process output might consider whether to download it simultaneously (rather than the output always being downloaded before the process is called complete).

Fixes pantsbuild#15524.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Jun 24, 2022
…) (#15915)

As described in #15524: `remote::ByteStore::load_bytes_with` calls are not deduped currently, meaning that if multiple consumers identify a `Digest` which is missing from the local store, they might concurrently fetch it from the remote store.

This is primarily an issue with `--remote-cache-eager-fetch=false`, as the laziness means that all consumers of a process output might consider whether to download it simultaneously (rather than the output always being downloaded before the process is called complete).

Fixes #15524.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant