Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel uploads outputs of no-remote actions when used in combination with a disk-cache #14900

Closed
BalestraPatrick opened this issue Feb 24, 2022 · 7 comments
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@BalestraPatrick
Copy link
Member

BalestraPatrick commented Feb 24, 2022

Description of the problem / feature request:

I've seem to have a discovered a bug that is slowing down our remote execution builds. Our build has some pretty large blobs that we don't want to upload to the remote cache since it would take way too long, so this is an example of how we disable those specific actions from running remotely and uploading to the remote cache:

build --modify_execution_info=^(CppLink|ObjcLink)$=+no-remote

With Bazel 5, we also enabled the use of a disk cache with our remote exec build. This is now causing long uploads for actions that are marked as no-remote.

What seems to be happening is the following:

  1. When using a combined cache, Bazel wrongly asks the remote cache if it contains a certain blob without respecting the no-remote tag.
  2. If the remote cache doesn't contain it, Bazel will then upload it even though it shouldn't.

I've verified that disabling the disk cache or setting the action mnemonics to no-cache works around the issue.

Feature requests: what underlying problem are you trying to solve with this feature?

Building with remote execution and disk cache should be efficient. This means being able to prevent certain action mnemonics inputs/outputs to be uploaded without disabling the disk cache completely.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This needs a working remote exec cluster, but I believe it should be reproducible by building https://github.com/bazelbuild/rules_apple/blob/master/examples/ios/HelloWorld/BUILD with the following flags:

build --disk_cache=~/.cache/bazel_disk
build --modify_execution_info=^(CppLink|ObjcLink)$=+no-remote
build --remote_executor=your.remote.exec
build --remote_cache=your.remote.cache

What operating system are you running Bazel on?

macOS 12.1

What's the output of bazel info release?

release 5.0.0

Any other information, logs, or outputs that you want to share?

This is a snippet of our gRPC log with some of the interesting fields:

---------------------------------------------------------
metadata {
  tool_details {
    tool_name: "bazel"
    tool_version: "5.0.0"
  }
  action_mnemonic: "ObjcLink"
  target_id: "//:target"
}
status {
}
method_name: "build.bazel.remote.execution.v2.ContentAddressableStorage/FindMissingBlobs"
details {
  find_missing_blobs {
    request {
      blob_digests {
        hash: "1a0fe7ea9f46605fa721fd83d8498ccbae3b2bfa25c98420f429980076022c88"
        size_bytes: 324400280
      }
    }
    response {
      missing_blob_digests {
        hash: "1a0fe7ea9f46605fa721fd83d8498ccbae3b2bfa25c98420f429980076022c88"
        size_bytes: 324400280
      }
    }
  }
}

---------------------------------------------------------
metadata {
  tool_details {
    tool_name: "bazel"
    tool_version: "5.0.0"
  }
  action_mnemonic: "ObjcLink"
  target_id: "//:target"
}
status {
}
method_name: "google.bytestream.ByteStream/Write"
details {
  write {
    resource_names: "uploads/7c80b2be-c6fc-409a-accf-1715bc09417e/compressed-blobs/zstd/1a0fe7ea9f46605fa721fd83d8498ccbae3b2bfa25c98420f429980076022c88/324400280"
    resource_names: ""
    num_writes: 5084
    bytes_sent: 83291264
    response {
      committed_size: 83291264
    }
    offsets: 0
    finish_writes: 83291264
  }
}
@brentleyjones
Copy link
Contributor

cc: @coeuvre

@brentleyjones
Copy link
Contributor

@bazel-io fork 5.1

@aiuto aiuto added area-Bzlmod Bzlmod-specific PRs, issues, and feature requests team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged and removed area-Bzlmod Bzlmod-specific PRs, issues, and feature requests labels Feb 26, 2022
@coeuvre
Copy link
Member

coeuvre commented Mar 10, 2022

Thanks for the detailed report. I can reproduce this bug.

The root cause is findMissingDigests in combined cache doesn't know whether the request is from remote execution or remote cache and combined cache should behave differently in the two scenarios (not to mention the BES case).

I have to admit the design that hiding combined cache behind a common interface introduces lots of troubles and I have done many quick&dirty hacks to work around edge cases in the past.

I am going to explicitly introduce the concept of disk cache to a higher level so we can handle those differently.

@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) type: bug and removed untriaged labels Mar 10, 2022
@brentleyjones
Copy link
Contributor

@coeuvre Once you have a fix, can you re-open #14901?

@brentleyjones
Copy link
Contributor

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 13, 2022
@ckolli5
Copy link

ckolli5 commented Apr 13, 2022

@bazel-io fork 5.2.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 13, 2022
@brentleyjones
Copy link
Contributor

@coeuvre looks like this isn't a clean cherry-pick onto 5.2. Can you take a look please 🙏.

coeuvre added a commit to coeuvre/bazel that referenced this issue May 10, 2022
...ploaded to remote cache when remote execution is enabled.

Fixes bazelbuild#14900.

Also fixes an issue that action result from just remotely executed action is not saved to disk cache. The root cause is the action result is inlined in the execution response hence not downloaded through remote cache, hence not saved to disk cache. This results in the second build misses the disk cache, but it can still hit the remote cache and fill the disk cache. The third build can hit disk cache.

Closes bazelbuild#15212.

PiperOrigin-RevId: 441426469
coeuvre added a commit to coeuvre/bazel that referenced this issue May 27, 2022
…ploaded to remote cache when remote execution is enabled.

Fixes bazelbuild#14900.

Also fixes an issue that action result from just remotely executed action is not saved to disk cache. The root cause is the action result is inlined in the execution response hence not downloaded through remote cache, hence not saved to disk cache. This results in the second build misses the disk cache, but it can still hit the remote cache and fill the disk cache. The third build can hit disk cache.

Closes bazelbuild#15212.

PiperOrigin-RevId: 441426469
ckolli5 added a commit that referenced this issue May 27, 2022
… are u... (#15453)

* Remote: Fix a bug that outputs of actions tagged with no-remote are u…

…ploaded to remote cache when remote execution is enabled.

Fixes #14900.

Also fixes an issue that action result from just remotely executed action is not saved to disk cache. The root cause is the action result is inlined in the execution response hence not downloaded through remote cache, hence not saved to disk cache. This results in the second build misses the disk cache, but it can still hit the remote cache and fill the disk cache. The third build can hit disk cache.

Closes #15212.

PiperOrigin-RevId: 441426469

* Fix test

Co-authored-by: Chenchu K <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants