-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflicting actions for cc_proto_library
with transition in Starlark rule dependency
#13464
Comments
For context: This does break our build and blocks us from moving to Bazel 4. Since the breaking commit has been identified, could it be reverted? |
I debugged this a bit more and have some more information:
|
@sdtwigg will offer some guidance on what's going on for your build. |
@sdtwigg Friendly ping |
@gregestren Could you offer some guidance? Our current workaround is to use the target rather than the exec config, but that will no longer be feasible with cross-compilation. |
I'll up the visibility on this and we'll get you some kind of a response. |
There seems to be a broad class of issues related to action conflict detection when the owning 'rule' is an Aspect. Unfortunately, the actual mechanism of this has been tricky to track down since the systems for both aspects and action conflict detection are a bit obscured. |
I may have identified the root cause of this issue. The cc_proto_library rule depends on @sdtwigg Would you review a PR? If so, would you also like me to include the reproducer as a test? |
When the proto rules' depdencies were switched over to the exec config in 7acf9ea, the dependency of CcProtoAspect on the proto C++ toolchain was not updated. Due to this, cc_proto_library had two dependencies on @com_google_protobuf//:protobuf (through :protoc and directly) with potentially incompatible configurations, which lead to conflicting actions. Fixes bazelbuild#13464.
When the proto rules' depdencies were switched over to the exec config in 7acf9ea, the dependency of CcProtoAspect on the proto C++ toolchain was not updated. Due to this, cc_proto_library had two dependencies on @com_google_protobuf//:protobuf (through :protoc and directly) with potentially incompatible configurations, which lead to conflicting actions. Fixes bazelbuild#13464.
After a first misguided attempt to fix the issue, I now took a deeper look and produced a Starlark-only reproducer that does not rely on aspects (see below). The cause of the issue rather appears to be the particular chain of transitions in which the target, in this case
Based on this analysis, the root cause of this issue appears to be that the logic that updates @gregestren @sdtwigg If you could provide some guidance, I would happily research this further and see whether I can prepare a fix. The reproducer: #/usr/bin/env sh
touch WORKSPACE
cat << EOF > lib.h
int lib();
EOF
cat << EOF > lib.cc
#include "lib.h"
int lib() {
return 0;
}
EOF
cat << EOF > main.cc
#include "lib.h"
int main() {
return lib();
}
EOF
cat << EOF > rules.bzl
_COPT_BUILD_SETTING = "//command_line_option:copt"
def _set_copt_impl(settings, attr):
return {_COPT_BUILD_SETTING: ["-DFOOBAR"]}
_set_copt = transition(
implementation = _set_copt_impl,
inputs = [],
outputs = [_COPT_BUILD_SETTING],
)
def _apply_copt_transition_impl(ctx):
pass
apply_copt_transition = rule(
implementation = _apply_copt_transition_impl,
attrs = {
"target": attr.label(
cfg = _set_copt,
),
"_allowlist_function_transition": attr.label(
default = "@bazel_tools//tools/allowlists/function_transition_allowlist",
),
},
)
def _fake_cc_proto_library_impl(ctx):
return [
ctx.attr._runtime[DefaultInfo],
ctx.attr._runtime[CcInfo],
]
fake_cc_proto_library = rule(
_fake_cc_proto_library_impl,
attrs = {
"_compiler": attr.label(
default = "//:fake_protoc",
cfg = "exec",
executable = True,
),
"_runtime": attr.label(
default = "//:fake_protobuf",
),
},
)
def _fake_mock_impl(ctx):
ctx.actions.write(ctx.outputs.out, "")
fake_mock = rule(
_fake_mock_impl,
attrs = {
"library": attr.label(
cfg = "exec",
),
"out": attr.output(
mandatory = True,
),
},
)
EOF
cat << EOF > BUILD
load(":rules.bzl", "apply_copt_transition", "fake_cc_proto_library", "fake_mock")
cc_library(
name = "fake_protobuf",
srcs = [
"lib.cc",
],
hdrs = [
"lib.h",
],
)
cc_binary(
name = "fake_protoc",
srcs = [
"main.cc",
],
deps = [
":fake_protobuf",
],
)
fake_cc_proto_library(
name = "cc_proto_lib",
)
apply_copt_transition(
name = "cc_proto_lib_with_copt_transition",
target = ":cc_proto_lib",
)
fake_mock(
name = "mock",
out = "empty.gen",
library = ":cc_proto_lib_with_copt_transition",
)
EOF
bazel build //:mock |
With this commit, an execution transition will recompute the transition hash appended to the output directory name if set by any previous Starlark transition. This also takes into account that the Starlark transition may transitively affect an option --foo_bar after the exec transition if it previously affected the corresponding host option --host_foo_bar. Previously, an execution transition would reuse the existing hash, which no longer reflected the current state of native options correctly and thus led to action conflicts. Fixes bazelbuild#13464.
I prepared #13915, which resolves both this issue and the analogous one with |
With this commit, an execution transition will recompute the transition hash appended to the output directory name if set by any previous Starlark transition. This also takes into account that the Starlark transition may transitively affect an option --foo_bar after the exec transition if it previously affected the corresponding host option --host_foo_bar. Previously, an execution transition would reuse the existing hash, which no longer reflected the current state of native options correctly and thus led to action conflicts. Fixes bazelbuild#13464.
I will take a look at your PR today. It is surprising because we had just
(late last week) identified a separate issue with AspectFunction failing to
ask for a proper, post-transition value but this seems like a different bug
entirely.
…On Sun, Aug 29, 2021 at 3:13 AM Fabian Meumertzheim < ***@***.***> wrote:
I prepared #13915 <#13915>, which
resolves both this issue and the analogous one with --host_copt instead
of --copt. It updates the transitionDirectoryNameFragment also taking
into account that --host_copt becomes --copt under the exec transition.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13464 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHX76GMOTYW7X4N4YGILT3T7HMZ5ANCNFSM44UMY4LA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Why would users be modifying the host_* values in a transitions? Edit: Oh, I think I misread your description and see now that you are also introducing a mechanism to modify the exec transition via host options. I absolutely have to discuss this with jcater before proceeding. Edit2: I might have further misread and instead that behavior already exists but is not accounted for in the exec transition renaming. I still want to double check that this behavior is expected and not an artifact of the exec transition being built off infrastructure for the host transition. |
With this commit, an execution transition will recompute the transition hash appended to the output directory name if set by any previous Starlark transition. This also takes into account that the Starlark transition may transitively affect an option --foo_bar after the exec transition if it previously affected the corresponding host option --host_foo_bar. Previously, an execution transition would reuse the existing hash, which no longer reflected the current state of native options correctly and thus led to action conflicts. Fixes bazelbuild#13464.
Re your edit2: In fact, I also only learned about this being possible only while writing the tests for this PR. I'm not sure whether its intended or not. I could see the following use cases:
If it turns out that exec transitions should not behave this way, I could remove the host flag propagation from the PR. The call to |
@sdtwigg and I discussed this more today. Thanks particularly for your spot-on summary at #13464 (comment).
I agree completely. 99f8a8f#diff-85e08f9833efccc5033b847d466a7d3b6b7ed82c41755fcc2d1d14b3da3b8c55 was not the most rigorous patch and you're clearly showing its holes.
This makes sense. More broadly, I believe the biggest problem is that native transitions (transitions written directly in Java) have their own, independent algorithms for determining output paths vs. Starlark transitions. So when you interleave them you get the possibility of corner cases like you've described. Also see bazel/src/main/java/com/google/devtools/build/lib/analysis/config/OutputDirectories.java Line 172 in 09c621e
So I agree with your intuition. I'm just unclear on the right way to merge them. Calling into FunctionTransitionUtils from ExecutionTransitionFactory is one approach, but that's also muddling up a bit of the sequencing of
I remain confused about this too. Also see #13580. It doesn't fix this issue but it changes and improves how |
Thanks for providing the context.
I think that there is a trade-off to be made here:
If potentially wasteful recomputations of
I agree that added rule metadata would help here. It would first need to be clarified though whether separate
Thanks, this PR is indeed very important for us as we encounter frequent rebuilds that would be prevented by it. |
With this commit, an execution transition will recompute the transition hash appended to the output directory name if set by any previous Starlark transition. This also takes into account that the Starlark transition may transitively affect an option --foo_bar after the exec transition if it previously affected the corresponding host option --host_foo_bar. Previously, an execution transition would reuse the existing hash, which no longer reflected the current state of native options correctly and thus led to action conflicts. Fixes bazelbuild#13464.
With this commit, an execution transition will recompute the transition hash appended to the output directory name if set by any previous Starlark transition. This also takes into account that the Starlark transition may transitively affect an option --foo_bar after the exec transition if it previously affected the corresponding host option --host_foo_bar. Previously, an execution transition would reuse the existing hash, which no longer reflected the current state of native options correctly and thus led to action conflicts. Fixes bazelbuild#13464.
We may be arguing for the same thing. I can't remember where I mentioned this, but I'd like to prioritize a refactoring that computes There are some technical complications to realize that but I believe they're manageable. I should update in more detail here. I share your intuition that this would clean up a large class of performance degradations and action conflicts that we experience today. If that works I'd prefer it over #13915, although it's a deeper effort. Re: 5.0 - it may be possible to get a short-term cherrypick in but I don't want to rush something we're not fully comfortable with. I also don't think approach 2 above completely eliminates action conflict risk. There should still be ways, I believe, for native + Starlark transitions to sabotage each others' invariants. I can trace up an example demonstrating. I'm not worried about approach 2 wastefully recomputing work, particularly if it's computed in |
This is slated to be resolved imminently when I'll leave this open until that CL is submitted. |
…nValue As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.) This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment. This fixes and subsumes #13464 and #13915, respectively. This is related to #14023 PiperOrigin-RevId: 407913175
…nValue As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.) This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment. This fixes and subsumes bazelbuild#13464 and bazelbuild#13915, respectively. This is related to bazelbuild#14023 PiperOrigin-RevId: 407913175
* Move transitionDirectoryNameFragment calculation to BuildConfigurationValue As per discussion in b/203470434, transitionDirectoryNameFragment should completely depend on the current values of the (rest of) the BuildOptions class. Thus, it is far better to have this always computed from BuildOptions when building a BuildConfigurationValue than rely on users keeping it consistent. (Other results of BuildConfigurationValue itself are themselves wholly computed from BuildOptions so placement there is a natural fit.) This naturally fixes the exec transition forgetting to update transitionDirectoryNameFragment. This fixes and subsumes #13464 and #13915, respectively. This is related to #14023 PiperOrigin-RevId: 407913175 * Properly account for StarlarkOptions at their default (=null) when calculating ST-hash Previously, the hash calculation was skipping including StarlarkOptions that happened to be at their default values. This is wrong since those values may still be in "affected by Starlark transition" (because either the commandline set them and the Starlark transition reset them to their Starlark defaults thus still requiring a hash change OR the commandline did not set them but a series of Starlark transitions did an default->B->default anyways causing the Starlark option to still be 'stuck' in "affected by Starlark transition"). Resolves #14239 PiperOrigin-RevId: 408701552 Co-authored-by: twigg <[email protected]>
Fixed by 711c44e. |
Description of the problem / feature request:
As of 7acf9ea, Bazel generates conflicting actions when a custom Starlark rule depends on a
cc_proto_library
target in theexec
configuration if there is a transition applied to the library.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
You can also clone https://github.com/fmeum/cc_proto_library_transition_issue.
The command will generate the following error:
What operating system are you running Bazel on?
Ubuntu 20.10
What's the output of
bazel info release
?The issue reproduces with
release 4.0.0
as well as master @ 7caa01f, but not withrelease 3.7.2
.Using
bazel-bisect.sh
, I have identified the breaking commit as 7acf9ea, which enabled exec transitions on proto rules.The text was updated successfully, but these errors were encountered: