Skip to content
This repository has been archived by the owner on Sep 24, 2021. It is now read-only.

Latest commit

 

History

History
49 lines (38 loc) · 3.82 KB

README.md

File metadata and controls

49 lines (38 loc) · 3.82 KB

Bazel Remote Cache (macos only)

This repository implements POC for a Bazel wrapper script that automatically sets shared remote HTTP cache flags based on local environment fingerprint. Implemented for MacOS only.

Why Use Remote Cache?

Building large projects can take a lot of time, even with Bazel. Remote caching with Bazel, allows any build machine to reuse targets that have already been built by other (or the same) build machines and reduce build time significantly. That is true especially in projects that share a lot of common in-house frameworks and libraries as sources. However, in order to be able to share a cache, the binaries that are uploaded to the cache must be compatible with all the build machines that can potentially consume them and unfortunately, that might not be the case if you are using incompatible environments to build those targets. The term "Incompatible environments" refers to any component on the build environment that might have an effect on the built targets and that depends on your shell environment, installed OS packages, programming language tools and what not.

This POC demonstrates an approach that gives correctness a top priority at the price of cache misses and storage space. It does so by segregating build environments' cache area based on a fingerprint that is based on MacOS version and Clang version and is calculated in real time (every time a Bazel command is executed). The finger print calculation demonstrated here is just an example and of course it can easily be changed/extended to more variables, as long as it doesn't have a significant performance impact.

Other considerable approaches:

  1. Have all build and development machines fully managed and aligned with the same OS and tools
  2. Run all build on remote managed environments

How To Use

  1. [prerequisites] This implementation assumes your workstation has access to a Google Cloud Storage bucket names bazel-dev-remote-cache. You can either set it up, or change the implementation, to use another target. See Remote Caching for more details about how it works.
  2. Make sure you have Python 3 in your PATH
  3. Copy bazel, bazelwrapper.py and remotecache.py into your <REPO_HOME>/tools directory
  4. [recommended] Use build --incompatible_strict_action_env in .bazelrc to fix the PATH on all workstations. Bazel uses /usr/local/bin:/usr/bin:/bin by default on non Windows systems. You should try to live with as few path elements as possible, but if that's not possible, you can use --action_env=PATH=<your path>
  5. That's it! Bazel will execute the bazel script every time you run a Bazel command.

Additional Considerations

It is possible to eliminate caching based on action mnemonics using the --modify_execution_info flag. This approach is somewhat nasty, but since it operates on the meta-data level, it is very powerful and can be elegant, especially in large projects. I successfully applied this technique to implement the following:

Turning off caching for docker layers and images

--modify_execution_info=JoinLayers=+no-remote-cache,ImageLayer=+no-remote-cache,GUNZIP=+no-remote-cache,GZIP=+no-remote-cache

This significantly improved build performance, because uploads and downloads of such large files evidently take more time than building them locally.

Turning off caching for all CPP related targets

--modify_execution_info=Cpp.*=+no-remote-cache,CcStrip=+no-remote-cache

This helped me avoid cache poisoning on MacOS and Linux systems that build primarily Java/Scala code and allow all machines with the same OS family to share the same cache (unlike what this wrapper do)