-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax input tree requirements #207
Conversation
To reduce the workload of the remote execution clients, the requirement of constructing Merkle trees is removed. Instead, the client can choose any structure of it input file tree it would like. A client might choose to replicate the internal dependency graph. The down side is that different clients, even different versions of the clients, might not get cache hit on effectively the same action. The use case of sharing the cache for different clients is not a realistic use case. Also, clients are most likely stable in their serialization between versions. Fixes bazelbuild#141.
a70b2c6
to
ff9cba7
Compare
// The paths MUST use `/` as path component separator, even if the environment | ||
// where the Action is executed use other separators. If the path starts with | ||
// `/` it is considered to be refering to the root of the file tree, otherwise | ||
// the path is relative to the parent directory. The path MUST NOT contain any | ||
// `.` or `..` components. An empty string refers to the same directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the RelaxedFileTree
contains a base path
, one can require all children paths to be relative to that base path.
At least long term (v3) I'm in favor of relaxing the input tree requirements. However, my preferred approach would be to modify |
Any thoughts on the output side? I think if we add support for a flexible directory tree, we should fix the asymmetry between inputs and outpus and also replace This may need a flag in the |
Call me crazy, but I've never understood this discussion. Nobody seems interested in presenting in actual detail what the problem is. And also very importantly: nobody has even described why this cannot be fixed by optimizing the client. For example, I can imagine that Bazel has a very hard time with setups like the following. Assume there are a million files named N = 1000
[
filegroup(
name = "fg%d" % j,
srcs = [
"%d/%d" % (i, j)
for i in range(N)
],
)
for j in range(N)
] If we then merge these file groups all over the place, stuff gets complicated. Computing Merkle trees for that is simply hard. But that's not how any normal source tree works, right? In reality, dependency graphs roughly mirror the file system layout. Because they do, you rarely need to construct full Merkle trees. You can often get away with merging existing ones together. Ones that don't tend to have too much nested overlap, meaning that merging is cheap. My observation has always been that the slowness is not caused by using a Merkle tree per se. It's because the Merkle tree generation is currently just bolted on as part of Bazel's REv2 client code. For every action that gets run, Bazel starts from a virtually clean slate when generating the Merkle tree. This is just silly, as actions often share large parts of the file system. If Bazel's core data structures could somehow memoize (abridged versions of) REv2 Merkle trees, then the process would become a lot more light-weight than what it currently is. I think we should first go back and write a report on what the problem actually is. What is the (amortized) algorithmic complexity of Bazel's Merkle tree computation right now? What is optimal, given the existing protocol? What is the expected improvement, if any, of having this change applied? |
I appreciate that you are honest. Thank you!
The profiling I've done in bazelbuild/bazel#10875 points to two specific functions in Bazel that lists all files and creates the Merkle tree. The most expensive operation there is My idea has been to only create the Assuming 20 files per folder, 200 folders => 4000 files where up to 40 merges are needed, then Measurements show that we save 30% of Bazel's execution time when removing the current Merkle tree code (measured on our production CI over two days). Let's see what I end up with when merging Merkle trees. |
I've created bazelbuild/bazel#13879 which implements Merkle tree memoization and merging in Bazel. One example of 3000 inputs cut the calculation time from 78 ms to 0.7 ms. With the fast alias hash I've tried before, I got down to 0.1 ms. Bazel has more overhead, in the range of 10 ms for my example action, so 1 ms or 0.1 ms doesn't really make any difference. Now, let me do some measurements on this patch for a few days before I post real world results. |
Wow! That's great news, @moroten. Thanks a lot for working on this. |
Closing this issue as bazelbuild/bazel#13879 solves the problem. Further discussions about closing #141 can be held there. |
To reduce the workload of the remote execution clients, the requirement of constructing Merkle trees is removed. Instead, the client can choose any structure of it input file tree it would like. A client might choose to replicate the internal dependency graph.
The down side is that different clients, even different versions of the clients, might not get cache hit on effectively the same action. The use case of sharing the cache for different clients is not a realistic use case. Also, clients are most likely stable in their serialization between versions.
Fixes #141.
This first draft basically duplicates the file tree messages, i.e.
DirectoryNode
gets a copy calledRelaxedDirectoryNode
. Another option is to reuse the existing file tree messages (Directory
,FileNode
etc.) but letname
represent a "relaxed path" instead.