-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve symlink structure of remote execution inputs #23620
Comments
Related Slack discussion: https://bazelbuild.slack.com/archives/C01E7TH8XK9/p1720796367796929 (interestingly, it's also an issue with clang, but in a different setting) To recap the discussion on that thread: my opinion is that, if we're going to fix this, we should do it by providing some sort of API to mark certain source artifacts as "potentially symlinks". For these artifacts, a symlink would be textually sent to the remote execution environment; if it's meant to be resolved remotely, it would be the user's responsibility to ensure that the file at the other end is also present in the action inputs (as well as any other intervening symlinks, if there are multiple layers of indirection). In particular, this probably means that only relative symlinks would be expected to work. I'm not convinced that we can redefine the behavior for all source artifacts, as it's possible that someone might be using symlinks that can only be resolved locally (and relying on the implicit conversion to a regular file). I'd also prefer to avoid solutions that require Bazel to interpret or transform the text of the symlink in any way. |
#16712 is the missing feature I think we'd need before we can tackle this. |
Why is #16712 required? Since "the user's responsibility to ensure that the file at the other end is also present in the action inputs", we don't expect these symlinks to be dangling. |
I think it's just difficult to phrase this accurately sometimes. The important distinction is between "Bazel cares about the contents" vs. "Bazel cares solely about the result of |
As described in symlink_helpers.bzl, copied here for visibility: Symlinking busybox things needs special logic. This is because Bazel doesn't cache the actual symlink, resulting in essentially resolved symlinks being produced in place of the expected tool. As a consequence, we can't rely on the symlink name when dealing with busybox entries. An example repro of this using a local build cache is: bazel build //toolchain bazel clean bazel build //toolchain We could in theory get reasonable behavior with `ctx.actions.declare_symlink`, but that's disallowed in our `.bazelrc` for cross-environment compatibility. The particular approach here uses the Python script as a launching pad so that the busybox still receives an appropriate location in argv[0], allowing it to find other files in the lib directory. Arguments are inserted to get equivalent behavior as if symlink resolution had occurred. The underlying bug is noted at: bazelbuild/bazel#23620
Description of the feature request:
If source artifact inputs of build actions include symlinks, these symlinks are represented as regular files when the build action is executed remotely. This can break certain inputs, in particular LLVM built in the "busybox" configuration. The FR is to preserve the symlink structure instead.
Let me unpack this a little bit.
Current Bazel behavior
Let me quote @tjgq from an internal conversation we've had about this:
How this breaks LLVM toolchains
We use a hermetic LLVM toolchain, and that toolchain is part of the build inputs. The toolchain includes a bunch of "binaries" like
bin/clang
,bin/clang++
,bin/lld
, etc. But in fact, the LLVM version we use employs a "busybox" architecture, where these binaries are all symlinks tobin/llvm
. However! Invokingbin/clang
is not actually equivalent to invokingbin/llvm
: the binary examines itsargv[0]
, and behaves differently when invoked via symlink.What is more, in some situations llvm will re-invoke itself. On the first invocation, we need the
argv[0]
to beclang
. On the re-invocation, llvm will use the path from/proc/self/exe
, which needs to end inllvm
. If we merely have a copy,argv[0]
isclang
both times, producing errors like https://pwbug.dev/issues/364781685. I am not a toolchain expert, but I discussed this with some, and they assure me this behavior (i.e., reading/proc/self/exe
an assuming it points tollvm
and not e.g.clang
, rather than just setting it tollvm
) is unfortunately necessary due to the treatment of Clang reproducers and-canonical-prefixes
(although I confess I could not follow their explanation).Workarounds
There are workarounds for this issue:
Replace
bin/clang
(etc) with symlinks created byctx.actions.declare_symlink
. Such Bazel-created symlinks will be faithfully sent to RBE.Wrap
bin/clang
(etc) in bash scripts like,This has the advantage that no custom rules are required, you just genrule the wrapper scripts into existence. These wrapper scripts (thanks to the
exec -a clang
) have the same magic property as the symlinks, i.e. thatargv[0]
is different from the actual executed binary basename. However, this requires bash (/bin/sh
doesn't support the-a
flag).However, this is definitely a sharp edge and it would be nice to remove it.
Further reading for Googlers
See internal discussions of this problem for more details:
Which category does this issue belong to?
Remote Execution
What underlying problem are you trying to solve with this feature?
No response
Which operating system are you running Bazel on?
No response
What is the output of
bazel info release
?development version
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.I'm on d62e0a0, fetched by Bazelisk (so, Bazel 8 pre-release).
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
Remarkably, not really, this seems to be a pretty edge-case issue!
Any other information, logs, or outputs that you want to share?
For folks who run into similar issues in the future to find this: the cryptic errors produced by clang are,
To make progress debugging this issue, you need to run clang under
strace
.The text was updated successfully, but these errors were encountered: