-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject/warn/monitor on large objects entering the cache #440
Comments
Hi, In cache-only mode (ie without remote execution) the client doesn't upload actions to the cache server- the only information that the server could report is the sizes of the blobs and their sha256 hashes, and if it were to scan the entire cache lookup table it could report the lookup keys for ActionResult messages that refer to large blobs. But that would be slow and still be difficult to find the corresponding actions on the client side. I suppose the cache server could choose to reject blobs over a certain size limit, but I think most clients would just log a warning and continue, and then the cache hit rate would drop until someone notices. So this wouldn't be a great solution either. Perhaps this is best solved on the client side? I don't know if bazel has a way to do this. |
thanks for the reply @mostynb . That makes sense, I agree that Bazel is the more likely place to make such a feature. |
Based on discussion on that linked Bazel issue I think this is actually a desirable feature. @mostynb would you accept a PR adding a flag to reject large WriteResource requests, basically in this spot https://gist.github.com/alexeagle/71d39f470ba45c7f8a7fcc38a70bfd8c ? |
Sure, PRs welcome. I guess Please make this apply to the HTTP interface as well as gRPC. There's a special case to consider- what should we do if the client uploads an ActionCache blob with inlined CAS blobs, with total size greater than the limit, but each individual blob below the size limit? gRPC UpdateActionResult calls are limited to 4M in practice by convention (and that's probably lower than any max_blob_size value that would be set), but HTTP AC uploads can be arbritrarily large. |
This causes both a gRPC and HTTP endpoint to reject blob writes that are larger than the configured size (in MB) Fixes buchgr#440
This causes both a gRPC and HTTP endpoint to reject blob writes that are larger than the configured size (in MB) Fixes buchgr#440
This flag specifies the maximum (logical) blob size that the cache will accept from clients. This limit is not applied to preexisting blobs in the cache. Implements buchgr#440
This flag specifies the maximum (logical) blob size that the cache will accept from clients. This limit is not applied to preexisting blobs in the cache. Implements buchgr#440
This flag specifies the maximum (logical) blob size that the cache will accept from clients. This limit is not applied to preexisting blobs in the cache. Implements #440
This feature is now available in the v2.1.0 release. |
These were introduced to reduce load on a remote-cache instance to avoid network saturation. A month later, a feature was added in one remote-cache implementatation which provides a different fix: buchgr/bazel-remote#440 rejects large input files on upload. In practice, while these action do often produce huge outputs, they are also slow to re-execute. In many cases it's worth it to use a remote-cache for RunAndCommitLayer in particular to avoid a local rebuild even though it's a large network fetch. Currently users can't configure this because we've hardcoded the values. If they do want to keep the no-remote-cache execution requirement, they can do this via a tag (provided they opt-in to experimental_allow_tags_propagation, see bazelbuild/bazel#8830) #1856 (comment) is an example of a user asking for these to be removed.
These were introduced to reduce load on a remote-cache instance to avoid network saturation. A month later, a feature was added in one remote-cache implementatation which provides a different fix: buchgr/bazel-remote#440 rejects large input files on upload. In practice, while these action do often produce huge outputs, they are also slow to re-execute. In many cases it's worth it to use a remote-cache for RunAndCommitLayer in particular to avoid a local rebuild even though it's a large network fetch. Currently users can't configure this because we've hardcoded the values. If they do want to keep the no-remote-cache execution requirement, they can do this via a tag (provided they opt-in to experimental_allow_tags_propagation, see bazelbuild/bazel#8830) #1856 (comment) is an example of a user asking for these to be removed.
These were introduced to reduce load on a remote-cache instance to avoid network saturation. A month later, a feature was added in one remote-cache implementatation which provides a different fix: buchgr/bazel-remote#440 rejects large input files on upload. In practice, while these action do often produce huge outputs, they are also slow to re-execute. In many cases it's worth it to use a remote-cache for RunAndCommitLayer in particular to avoid a local rebuild even though it's a large network fetch. Currently users can't configure this because we've hardcoded the values. If they do want to keep the no-remote-cache execution requirement, they can do this via a tag (provided they opt-in to experimental_allow_tags_propagation, see bazelbuild/bazel#8830) #1856 (comment) is an example of a user asking for these to be removed.
Our cache deployment has suffered badly from docker images being added by users as inputs to actions. Something like 600MB object then needs to get fetched by CI agents and it overloads the cache with requests.
I'm not exactly sure what the cache could do differently to help. At least showing the largest object, with some hint about the action that generated it.
The feedback loop would result in us making sure these targets are tagged
no-remote-cache
so the actions they produce don't upload.The text was updated successfully, but these errors were encountered: