Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve symlink structure of remote execution inputs #23620

Open
tpudlik opened this issue Sep 13, 2024 · 4 comments
Open

Preserve symlink structure of remote execution inputs #23620

tpudlik opened this issue Sep 13, 2024 · 4 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@tpudlik
Copy link
Contributor

tpudlik commented Sep 13, 2024

Description of the feature request:

If source artifact inputs of build actions include symlinks, these symlinks are represented as regular files when the build action is executed remotely. This can break certain inputs, in particular LLVM built in the "busybox" configuration. The FR is to preserve the symlink structure instead.

Let me unpack this a little bit.

Current Bazel behavior

Let me quote @tjgq from an internal conversation we've had about this:

  1. Every artifact tracked by Bazel has a type (file, directory or symlink) associated with it, which defines how its identity is determined - for artifacts of file or directory type, the content matters; for artifacts of symlink type, only the textual target path of the symlink does

  2. However (and this is where it gets complicated) the "Bazel type" of an artifact is somewhat independent from its filesystem representation. It's possible to have an artifact of file or directory type which is materialized in the filesystem as a symlink, in which case Bazel cares about the bytes at the other end of the symlink, not the path the symlink points to

  3. Source artifacts are, at the moment, always of file type; only output artifacts can be of directory or symlink type (that would be a ctx.actions.declare_directory or a ctx.actions.declare_symlink, respectively)

  4. For artifacts of file type, when we construct the input representation of a remote action, we always represent them as regular files, never symlinks, even if their filesystem representation is a symlink; so their "symlinkness" is effectively lost when the action is executed remotely

How this breaks LLVM toolchains

We use a hermetic LLVM toolchain, and that toolchain is part of the build inputs. The toolchain includes a bunch of "binaries" like bin/clang, bin/clang++, bin/lld, etc. But in fact, the LLVM version we use employs a "busybox" architecture, where these binaries are all symlinks to bin/llvm. However! Invoking bin/clang is not actually equivalent to invoking bin/llvm: the binary examines its argv[0], and behaves differently when invoked via symlink.

What is more, in some situations llvm will re-invoke itself. On the first invocation, we need the argv[0] to be clang. On the re-invocation, llvm will use the path from /proc/self/exe, which needs to end in llvm. If we merely have a copy, argv[0] is clang both times, producing errors like https://pwbug.dev/issues/364781685. I am not a toolchain expert, but I discussed this with some, and they assure me this behavior (i.e., reading /proc/self/exe an assuming it points to llvm and not e.g. clang, rather than just setting it to llvm) is unfortunately necessary due to the treatment of Clang reproducers and -canonical-prefixes (although I confess I could not follow their explanation).

Workarounds

There are workarounds for this issue:

  1. Replace bin/clang (etc) with symlinks created by ctx.actions.declare_symlink. Such Bazel-created symlinks will be faithfully sent to RBE.

  2. Wrap bin/clang (etc) in bash scripts like,

    #!/bin/bash
    exec -a clang $(location //:bin/llvm)  "$@"
    

    This has the advantage that no custom rules are required, you just genrule the wrapper scripts into existence. These wrapper scripts (thanks to the exec -a clang) have the same magic property as the symlinks, i.e. that argv[0] is different from the actual executed binary basename. However, this requires bash (/bin/sh doesn't support the -a flag).

However, this is definitely a sharp edge and it would be nice to remove it.

Further reading for Googlers

See internal discussions of this problem for more details:

Which category does this issue belong to?

Remote Execution

What underlying problem are you trying to solve with this feature?

No response

Which operating system are you running Bazel on?

No response

What is the output of bazel info release?

development version

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

I'm on d62e0a0, fetched by Bazelisk (so, Bazel 8 pre-release).

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

Remarkably, not really, this seems to be a pretty edge-case issue!

Any other information, logs, or outputs that you want to share?

For folks who run into similar issues in the future to find this: the cryptic errors produced by clang are,

clang: error: unknown argument: '-cc1'
clang: error: unknown argument '-triple'; did you mean '-Xclang -triple'?
clang: error: unknown argument '-clear-ast-before-backend'; did you mean '-Xclang -clear-ast-before-backend'?
clang: error: unknown argument '-main-file-name'; did you mean '-Xclang -main-file-name'?
clang: error: unknown argument '-mrelocation-model'; did you mean '-Xclang -mrelocation-model'?
clang: error: unknown argument '-pic-level'; did you mean '-Xclang -pic-level'?
clang: error: unknown argument '-fhalf-no-semantic-interposition'; did you mean '-Xclang -fhalf-no-semantic-interposition'?
clang: error: unknown argument '-mframe-pointer=all'; did you mean '-Xclang -mframe-pointer=all'?
clang: error: unknown argument '-nostdsysteminc'; did you mean '-Xclang -nostdsysteminc'?
clang: error: unknown argument '-target-cpu'; did you mean '-Xclang -target-cpu'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-crc'; did you mean '-mcrc'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument: '-ras'
clang: error: unknown argument '-target-feature'; did you mean '-Xclang -target-feature'?
clang: error: unknown argument: '-sb'

To make progress debugging this issue, you need to run clang under strace.

@tjgq
Copy link
Contributor

tjgq commented Sep 16, 2024

Related Slack discussion: https://bazelbuild.slack.com/archives/C01E7TH8XK9/p1720796367796929 (interestingly, it's also an issue with clang, but in a different setting)

To recap the discussion on that thread: my opinion is that, if we're going to fix this, we should do it by providing some sort of API to mark certain source artifacts as "potentially symlinks". For these artifacts, a symlink would be textually sent to the remote execution environment; if it's meant to be resolved remotely, it would be the user's responsibility to ensure that the file at the other end is also present in the action inputs (as well as any other intervening symlinks, if there are multiple layers of indirection). In particular, this probably means that only relative symlinks would be expected to work.

I'm not convinced that we can redefine the behavior for all source artifacts, as it's possible that someone might be using symlinks that can only be resolved locally (and relying on the implicit conversion to a regular file). I'd also prefer to avoid solutions that require Bazel to interpret or transform the text of the symlink in any way.

@tjgq tjgq added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Sep 17, 2024
@tjgq
Copy link
Contributor

tjgq commented Sep 24, 2024

#16712 is the missing feature I think we'd need before we can tackle this.

@tpudlik
Copy link
Contributor Author

tpudlik commented Sep 24, 2024

Why is #16712 required? Since "the user's responsibility to ensure that the file at the other end is also present in the action inputs", we don't expect these symlinks to be dangling.

@tjgq
Copy link
Contributor

tjgq commented Sep 24, 2024

I think it's just difficult to phrase this accurately sometimes. The important distinction is between "Bazel cares about the contents" vs. "Bazel cares solely about the result of readlink()". In the second case, whether the symlink dangles or not doesn't matter (it looks the same from Bazel's perspective). If the user does want to dereference it, they have that responsibility, but it could equally be used in situations where it's expected to dangle.

github-merge-queue bot pushed a commit to carbon-language/carbon-lang that referenced this issue Nov 8, 2024
As described in symlink_helpers.bzl, copied here for visibility:

Symlinking busybox things needs special logic.

This is because Bazel doesn't cache the actual symlink, resulting in
essentially resolved symlinks being produced in place of the expected
tool. As a consequence, we can't rely on the symlink name when dealing
with busybox entries.

An example repro of this using a local build cache is:

    bazel build //toolchain
    bazel clean
    bazel build //toolchain

We could in theory get reasonable behavior with
`ctx.actions.declare_symlink`, but that's disallowed in our `.bazelrc`
for cross-environment compatibility.

The particular approach here uses the Python script as a launching pad
so that the busybox still receives an appropriate location in argv[0],
allowing it to find other files in the lib directory. Arguments are
inserted to get equivalent behavior as if symlink resolution had
occurred.

The underlying bug is noted at:
bazelbuild/bazel#23620
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

5 participants