Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local actions in a remote execution build should be cachable #7932

Closed
EricBurnett opened this issue Apr 3, 2019 · 23 comments
Closed

Local actions in a remote execution build should be cachable #7932

EricBurnett opened this issue Apr 3, 2019 · 23 comments
Assignees
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@EricBurnett
Copy link

Description of the problem / feature request:

Actions that are run locally via "local" or "noremote" within the context of a remote execution build do not get looked up or cached in a remote cache. This means that for non-incremental builds incorporating some local actions, those actions must always be executed.

Use-cases:

  • Running a mostly-remote build, but some actions locally that are not remote compatible. (E.g. actions needing access to a system the workers don't have, or needing more cores, etc)
  • Running hybrid builds with remote compilation and local testing (e.g. because the local system has a GPU).

Workarounds:

  • Make it possible for all actions to be run remotely, and avoid the use of "local" at all.
  • Run multiple sequential builds, where the first does remote execution of the applicable subset, then a second incremental build that is is configured for local execution + remote caching and tries to run "everything" (with all the remotable actions already present in the local bazel cache, and so not re-run).

What operating system are you running Bazel on?

Linux

Bazel versions

Reports from 0.16.1 and 0.19.2, but I believe it holds true through at least 0.22 as well.

Notes

For now, non-critical feature-request - we've been able to find workarounds for our users stuck on this so far. But with the push towards more hybrid builds (e.g. dynamic strategy), it'll probably keep coming up.

@irengrig irengrig added team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged labels Apr 8, 2019
@buchgr
Copy link
Contributor

buchgr commented May 2, 2019

no-remote applies to both remote caching and remote execution. If we want to have this feature we'd need to have at least two different tags.

My understand of your request is that locally executed actions should always be uploaded to the remote cache when using remote execution. We have talked about this particular issue last december and the consensus between RBE and Bazel was to no support mixing remote execution and caching of locally executed results due to concerns about cache poisoning.

I am happy to reconsider if your stance has changed. Do you find the local tag being used for build actions or mostly for test actions?

@agoulti @ishikhman

@AustinSchuh
Copy link
Contributor

We have a couple of actions which require weird permissions or access. One of our actions needs to hit a server internal to our network which we can't make accessible to the workers. Another needs some KVM permissions. We have done a fair amount of testing of those actions and have determined they are reproducible (enough, but moving them to RBE won't fix that part). They are also rather expensive to rebuild. Today, there is no way to say "I understand the risks, but trust me, please put this in the cache". These are on CI machines which are pretty well controlled and understood.

Another use case which we have today which I can't support is that I have a custom piece of hardware that needs to be present to run a test attached to the machine running the bazel server. (Running tests against a uC.) It would save me countless hours of build time if the tests which need that hardware can run against the local machine and push the results from those tests to the cache. Again, I'm willing to spend the time to verify that the action is reproducible and worth caching before turning that feature on. Without this feature, all I can do is to wait until RBE supports my weird use cases, or implement my own build cluster. Neither of which are very good.

@agoulti
Copy link

agoulti commented May 2, 2019

Jakob, that's not how I remember our December conversation.

We agreed that local exec/remote cache combinations should not be easy to accidentally turn on but should be available to those who know what they are doing.

I thought we agreed to split "no-remote" tag into "no-remote-exec" and "no-remote-cache-upload" (with "no-remote" being a shortcut for both of them).

@buchgr
Copy link
Contributor

buchgr commented May 6, 2019

@agoulti thanks for pointing this out. My memory was wrong. sgtm.

@ishikhman
Copy link
Contributor

@EricBurnett, @agoulti, @buchgr: Do we know a use case for a flag --no-remote-cache?

no-remote - no interactions with remote systems for this target
no-remote-exec - no remote executions for this target, but we still want to use remote cache for local actions (both read and write). For the cases when some actions are always executed locally.

no-remote-cache - would mean smth like: execute action remotely, but do not cache it. Is it even possible? Could we just use no-cache instead?

@EricBurnett
Copy link
Author

EricBurnett commented May 23, 2019 via email

@buchgr
Copy link
Contributor

buchgr commented May 23, 2019

no-cache has a local meaning in that it prevents Bazel from caching the target locally. So I think we'll need no-cache and no-remote-cache

@ishikhman
Copy link
Contributor

@EricBurnett @buchgr thanks!

@ishikhman
Copy link
Contributor

TODO (during implementation): check whether new tags need to be white-listed (see #8612 for details)

@SrodriguezO
Copy link
Contributor

Is the design for this finalized somewhere? We have a huge interest in seeing this done soon, and I'd be happy to open a Pull Request :)

My understanding of the desired behaviors is:

  • no-remote-cache: prevent remote caching, but allow local caching (I believe this is the current behavior of no-cache)
  • no-remote-exec: prevent remote execution, but allow remote caching
  • no-remote: combine no-remote-cache and no-remote-exec
  • no-cache: extend no-remote-cache to also prevent local caching

I was poking around at the code on Friday and it seems that the behavior of no-remote-cache will match the current behavior of no-cache. Is this correct?

@buchgr
Copy link
Contributor

buchgr commented Jun 18, 2019

@SrodriguezO you are correct. I believe we agreed on the design and it's just a matter of implementation. @ishikhman would be the person to synchronize with about this. She'd also be handling reviews.

@SrodriguezO
Copy link
Contributor

Awesome. I'll create a Draft PR when I get a chance over the next couple days. I'm currently trying to familiarize myself with the code and with how the Bazel project is structured. I'll keep you posted :)

@ishikhman
Copy link
Contributor

Great, thanks! looking forward to it! ;) If you have any question - feel free to ping me here :)

@SrodriguezO
Copy link
Contributor

I'm noticing that there are two types of cache: one for Spawns (the SpawnCache), and another for Actions (the AbstractRemoteActionCache).

I'm not familiar with how these two types of cache interface.. The --disk_cache and --remote_cache options seem to exclusively affect the Action caches, but the current no-cache tag seems to exclusively affect the Spawn cache.

Is part of the goal to have these tags also influence Action caching somehow? If not, how can we control caching at that level?

The description for the no-cache tag states that the tag may also be set on an action to disable caching for the action, but I don't see anywhere in the code that supports that claim.

Thanks for any insight :)

(@ishikhman :))

@SrodriguezO
Copy link
Contributor

@ishikhman friendly ping ^ :)

@ishikhman
Copy link
Contributor

I'm noticing that there are two types of cache: one for Spawns (the SpawnCache), and another for Actions (the AbstractRemoteActionCache).

I'm not familiar with how these two types of cache interface.. The --disk_cache and --remote_cache options seem to exclusively affect the Action caches, but the current no-cache tag seems to exclusively affect the Spawn cache.

It actually affect both :)
Think of the SpawnCache as of a kind of wrapper around the AbstractRemoteActionCache, which allows an execution module to interact with a remote cache without exposing what kind of cache it is.
This is why only the SpawnCache is aware of those tags.

SpawnCache might be using GrpcRemoteCache inside, as well as SimpleBlobStoreActionCache which can use Http(--remote_cache=https://some.url), Disk(--disk_cache=) or Combined Blob Store (both--remote_cache and --disk_cache=).

Is part of the goal to have these tags also influence Action caching somehow? If not, how can we control caching at that level?

See above, no-cache affects the ActionCache via the SpawnCache.

The description for the no-cache tag states that the tag may also be set on an action to disable caching for the action, but I don't see anywhere in the code that supports that claim.

See here: to identify whether an action might be cached, we do check so-called executionInfo which is constructed by merging target-level tags with action or rule level execution requirements. Therefore it is possible to apply no-cache to the target(tags) or to the rule(execution_requirements). Or to the action, which is created inside the rule (well, if you have your own rule you can do that a well).

Might be useful
In case the changes are affecting the current behavior of one of the tags, please have a look at the incompatible changes policy.

@ishikhman
Copy link
Contributor

For the history purposes:
no-cache - forbids remote (both http and grpc) and no local(disk) cache
no-remote-cache - forbids remote (both http and grpc) cache only
no-remote-exec - forbids remote execution

When we refer to local cache in most cases we mean disk cache.

@buchgr
Copy link
Contributor

buchgr commented Jul 11, 2019

Awesome! Bonus points for adding this information to our documentation!

@SrodriguezO
Copy link
Contributor

I'll update the docs as part of #8710 :)

@SrodriguezO
Copy link
Contributor

This might still be an issue. Targets tagged with no-remote-exec still get cached if only --remote_cache is specified, but if --remote_executor is specified, then no-remote-exec also prevents caching.

Looking at RemoteSpawnRunner.java in my PR from a year ago, we made actions run locally before checking the remote cache if the action couldn't run remotely. I suspect this was a mistake.

The code's changed quite substantially since then, but the behavior seems to remain--the no-remote-exec tag prevents remote caching if --remote_executor is specified. This means no-remote-exec can't be used to cleanly disable remote execution for remote-incompatible targets while still caching their outputs, which was the goal of this issue.

Minimal repo to reproduce the behavior on Bazel 3.1.0

@SrodriguezO
Copy link
Contributor

Never mind, this seems to have been fixed in Bazel 3.2.0. I cannot reproduce the issue on either 3.2.0 or 3.4.1 :)

@keithl-stripe
Copy link
Contributor

@buchgr is there a ticket tracking the combination of remote execution and --disk_cache?

@brentleyjones
Copy link
Contributor

brentleyjones commented Aug 26, 2021

I believe that was just resolved in HEAD: cf57d03

Edit: And I now see you are also from stripe. I'm not sure if there was a ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants