Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject/warn/monitor on large objects entering the cache #440

Closed
alexeagle opened this issue May 14, 2021 · 5 comments
Closed

Reject/warn/monitor on large objects entering the cache #440

alexeagle opened this issue May 14, 2021 · 5 comments

Comments

@alexeagle
Copy link
Contributor

Our cache deployment has suffered badly from docker images being added by users as inputs to actions. Something like 600MB object then needs to get fetched by CI agents and it overloads the cache with requests.

I'm not exactly sure what the cache could do differently to help. At least showing the largest object, with some hint about the action that generated it.

The feedback loop would result in us making sure these targets are tagged no-remote-cache so the actions they produce don't upload.

@mostynb
Copy link
Collaborator

mostynb commented May 14, 2021

Hi,

In cache-only mode (ie without remote execution) the client doesn't upload actions to the cache server- the only information that the server could report is the sizes of the blobs and their sha256 hashes, and if it were to scan the entire cache lookup table it could report the lookup keys for ActionResult messages that refer to large blobs. But that would be slow and still be difficult to find the corresponding actions on the client side.

I suppose the cache server could choose to reject blobs over a certain size limit, but I think most clients would just log a warning and continue, and then the cache hit rate would drop until someone notices. So this wouldn't be a great solution either.

Perhaps this is best solved on the client side? I don't know if bazel has a way to do this.

@alexeagle
Copy link
Contributor Author

thanks for the reply @mostynb . That makes sense, I agree that Bazel is the more likely place to make such a feature.

@alexeagle
Copy link
Contributor Author

Based on discussion on that linked Bazel issue I think this is actually a desirable feature.

@mostynb would you accept a PR adding a flag to reject large WriteResource requests, basically in this spot https://gist.github.com/alexeagle/71d39f470ba45c7f8a7fcc38a70bfd8c ?

@mostynb
Copy link
Collaborator

mostynb commented May 20, 2021

Sure, PRs welcome.

I guess --max_blob_size <bytes> (default 0, unlimited) would be a reasonable command line flag/config file setting.

Please make this apply to the HTTP interface as well as gRPC.

There's a special case to consider- what should we do if the client uploads an ActionCache blob with inlined CAS blobs, with total size greater than the limit, but each individual blob below the size limit? gRPC UpdateActionResult calls are limited to 4M in practice by convention (and that's probably lower than any max_blob_size value that would be set), but HTTP AC uploads can be arbritrarily large.

alexeagle added a commit to alexeagle/bazel-remote that referenced this issue May 24, 2021
This causes both a gRPC and HTTP endpoint to reject blob writes that are larger than the configured size (in MB)

Fixes buchgr#440
alexeagle added a commit to aspect-forks/bazel-remote that referenced this issue Jun 2, 2021
This causes both a gRPC and HTTP endpoint to reject blob writes that are larger than the configured size (in MB)

Fixes buchgr#440
mostynb pushed a commit to mostynb/bazel-remote that referenced this issue Jun 4, 2021
This flag specifies the maximum (logical) blob size that the cache
will accept from clients. This limit is not applied to preexisting
blobs in the cache.

Implements buchgr#440
mostynb pushed a commit to mostynb/bazel-remote that referenced this issue Jun 4, 2021
This flag specifies the maximum (logical) blob size that the cache
will accept from clients. This limit is not applied to preexisting
blobs in the cache.

Implements buchgr#440
mostynb pushed a commit that referenced this issue Jun 10, 2021
This flag specifies the maximum (logical) blob size that the cache
will accept from clients. This limit is not applied to preexisting
blobs in the cache.

Implements #440
@mostynb
Copy link
Collaborator

mostynb commented Jun 10, 2021

This feature is now available in the v2.1.0 release.

@mostynb mostynb closed this as completed Jun 10, 2021
alexeagle added a commit to bazelbuild/rules_docker that referenced this issue Mar 16, 2022
These were introduced to reduce load on a remote-cache instance to avoid network saturation.
A month later, a feature was added in one remote-cache implementatation which provides a different fix:
buchgr/bazel-remote#440 rejects large input files on upload.

In practice, while these action do often produce huge outputs, they are also slow to re-execute.
In many cases it's worth it to use a remote-cache for RunAndCommitLayer in particular to avoid a local rebuild
even though it's a large network fetch.
Currently users can't configure this because we've hardcoded the values. If they do want to keep the
no-remote-cache execution requirement, they can do this via a tag (provided they opt-in to
experimental_allow_tags_propagation, see bazelbuild/bazel#8830)

#1856 (comment) is an example of a user
asking for these to be removed.
alexeagle added a commit to bazelbuild/rules_docker that referenced this issue Mar 16, 2022
These were introduced to reduce load on a remote-cache instance to avoid network saturation.
A month later, a feature was added in one remote-cache implementatation which provides a different fix:
buchgr/bazel-remote#440 rejects large input files on upload.

In practice, while these action do often produce huge outputs, they are also slow to re-execute.
In many cases it's worth it to use a remote-cache for RunAndCommitLayer in particular to avoid a local rebuild
even though it's a large network fetch.
Currently users can't configure this because we've hardcoded the values. If they do want to keep the
no-remote-cache execution requirement, they can do this via a tag (provided they opt-in to
experimental_allow_tags_propagation, see bazelbuild/bazel#8830)

#1856 (comment) is an example of a user
asking for these to be removed.
gravypod pushed a commit to bazelbuild/rules_docker that referenced this issue Mar 16, 2022
These were introduced to reduce load on a remote-cache instance to avoid network saturation.
A month later, a feature was added in one remote-cache implementatation which provides a different fix:
buchgr/bazel-remote#440 rejects large input files on upload.

In practice, while these action do often produce huge outputs, they are also slow to re-execute.
In many cases it's worth it to use a remote-cache for RunAndCommitLayer in particular to avoid a local rebuild
even though it's a large network fetch.
Currently users can't configure this because we've hardcoded the values. If they do want to keep the
no-remote-cache execution requirement, they can do this via a tag (provided they opt-in to
experimental_allow_tags_propagation, see bazelbuild/bazel#8830)

#1856 (comment) is an example of a user
asking for these to be removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants