Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs on how to run Pants in CI. #16503

Merged
merged 6 commits into from
Aug 13, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion docs/markdown/Using Pants/remote-caching-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,18 @@ Server compatibility

In order to use remote caching or remote execution, Pants will need access to a server that complies with REAPI. Pants is known to work with:

**SaaS**:
- [Toolchain](https://www.toolchain.com), a remote caching service designed by several of the lead maintainers of Pants specifically to work seamlessly with it.

benjyw marked this conversation as resolved.
Show resolved Hide resolved
**Self-hosted**:
- [BuildBarn](https://github.com/buildbarn/bb-remote-execution)
- [Buildfarm](https://github.com/bazelbuild/bazel-buildfarm/)
- [BuildGrid](https://buildgrid.build/)

**Note**: Setup of a remote execution server is beyond the scope of this documentation. All three server projects have support channels on the BuildTeamWorld Slack. [Go here to obtain an invite to that Slack.](https://bit.ly/2SG1amT)
**Note**: Setup of a self-hosted REAPI server is beyond the scope of this documentation. All these server projects have support channels on the BuildTeamWorld Slack. [Go here to obtain an invite to that Slack.](https://bit.ly/2SG1amT)

There are a few [other](https://github.com/bazelbuild/remote-apis) systems and services in this space, but
they have not, to our knowledge, been tested with Pants. Let us know if you have any experience with them!

Resources
=========
Expand Down
51 changes: 42 additions & 9 deletions docs/markdown/Using Pants/using-pants-in-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,46 @@ Directories to cache

In your CI's config file, we recommend caching these directories:

- `$HOME/.cache/pants/setup`: the initial bootstrapping of Pants.
- `$HOME/.cache/pants/named_caches`: caches of tools like pip and PEX.
- `$HOME/.cache/pants/lmdb_store`: cached content for prior Pants runs, e.g. prior test results.

- `$HOME/.cache/pants/setup`<br>
This is the Pants bootstrap directory. Cache this against the version, as specified
in `pants.toml`. E.g., if using GitHub Actions you can use:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The named caches are harder, but for this it seems we could provide a step or two that used python + toml to parse out the version and set it as an output for the cache step to use as the right most key instead. In fact, we could can that as an action in pants build/actions and just point folks to use that in this setup rec. For later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but let's get this in for now, as it's a big improvement on what we had before, and we're not trying to be everyone's devops people, just give a sense of what can profitably be cached.

```yaml
path: |
~/.cache/pants/setup
key: pants-setup-${{ runner.os }}-${{ hashFiles('pants.toml') }}
restore-keys: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is left as a marker for all restore-key related comments which I won't repeat from pantsbuild/example-python#104 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment about not being comfortable with using restore-keys)

pants-setup-${{ runner.os }}-${{ hashFiles('pants.toml') }}
pants-setup-${{ runner.os }}-
```
- `$HOME/.cache/pants/named_caches`<br>
Caches used by some underlying tools. Cache this against the inputs to those tools.
For the `pants.backend.python` backend, named caches are used by PEX, and therefore
its inputs are your lockfiles. For example:
```yaml
path: |
~/.cache/pants/named_caches
key: named-caches-${{ runner.os }}-${{ hashFiles('pants.toml') }}-${{ hashFiles('python-default.lock') }}
restore-keys: |
named-caches-${{ runner.os }}-${{ hashFiles('pants.toml') }}-${{ hashFiles('python-default.lock') }}
named-caches-${{ runner.os }}-${{ hashFiles('pants.toml') }}-
named-caches-${{ runner.os }}-
```

If you're not using a fine-grained [remote caching](doc:remote-caching-execution) service,
then you may also want to preserve the local Pants cache at `$HOME/.cache/pants/lmdb_store`.
This has to be invalidated on any file that can affect any process, e.g., `hashFiles('**/*')`.
Computing such a coarse hash, and saving and restoring large directories, can be unwieldy.
So this may be impractical and slow on medium and large repos.
A [remote cache service](doc:remote-caching-execution) integrates with Pants's fine-grained
invalidation and avoids these problems, and is recommended for the best CI performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to skin this cat that I don't see being discussed it self-hosted GitHub Runners, which can leverage a persistent filesystem.

https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have zero experience doing that, and again, it is out of scope for this change I think. We couldn't document that until we tried it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file an issue and we can document later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you've actually done? It's a really interesting approach if so!


See [Troubleshooting](doc:troubleshooting#how-to-change-your-cache-directory) for how to change these cache locations.

> 📘 Nuking the cache when too big
>
> In CI, the cache must be uploaded and downloaded every run. This takes time, so there is a tradeoff where too large of a cache will slow down your CI.
> In CI, the cache must be uploaded and downloaded every run. This takes time, so there is a tradeoff where too
> large a cache will slow down your CI.
>
> You can use this script to nuke the cache when it gets too big:
>
Expand All @@ -37,8 +68,7 @@ See [Troubleshooting](doc:troubleshooting#how-to-change-your-cache-directory) fo
> rm -rf ${path}
> fi
> }
>
> nuke_if_too_big ~/.cache/pants/lmdb_store 2048
>
> nuke_if_too_big ~/.cache/pants/setup 256
> nuke_if_too_big ~/.cache/pants/named_caches 1024
> ```
Expand All @@ -58,7 +88,8 @@ See [Troubleshooting](doc:troubleshooting#how-to-change-your-cache-directory) fo

> 👍 Remote caching
>
> Rather than storing your cache with your CI provider, remote caching stores the cache in the cloud, using gRPC and the open-source Remote Execution API for low-latency and fine-grained caching.
> Rather than storing your cache with your CI provider, remote caching stores the cache in the cloud,
> using gRPC and the open-source Remote Execution API for low-latency and fine-grained caching.
>
> This brings several benefits over local caching:
>
Expand All @@ -67,12 +98,14 @@ See [Troubleshooting](doc:troubleshooting#how-to-change-your-cache-directory) fo
> - No download and upload stage for your cache.
> - No need to "nuke" your cache when it gets too big.
>
> See [Remote Caching](doc:remote-caching) for more information.
> See [Remote Caching and Execution](doc:remote-caching-execution) for more information.

Recommended commands
--------------------

With both approaches, you may want to shard the input targets into multiple CI jobs, for increased parallelism. See [Advanced Target Selection](doc:advanced-target-selection#sharding-the-input-targets). (This is typically less necessary when using [remote caching](doc:remote-caching).)
With both approaches, you may want to shard the input targets into multiple CI jobs, for increased parallelism.
See [Advanced Target Selection](doc:advanced-target-selection#sharding-the-input-targets).
(This is typically less necessary when using [remote caching](doc:remote-caching-execution).)

### Approach #1: only run over changed files

Expand Down