Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag flip for --experimental_sibling_repository_layout and --legacy_external_runfiles #12821

Closed
lberki opened this issue Jan 13, 2021 · 21 comments
Assignees
Labels
incompatible-change Incompatible/breaking change P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: process

Comments

@lberki
Copy link
Contributor

lberki commented Jan 13, 2021

We are changing the way external repositories are handled: they will be a sibling of the main repository instead living in the external/ directory under it, both in the execroot and in runfiles trees.

That is, files in them will be accessible at ../$REPO instead of external/$REPO.

This is a tracking bug for the imminent flag flip and a place to list tips for the migration and things that broke.

@lberki lberki added P2 We'll consider working on this in future. (Assignee optional) incompatible-change Incompatible/breaking change team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels Jan 13, 2021
@TLATER
Copy link

TLATER commented Mar 31, 2022

This was merged in 0080572 (and is present in 5.0.0 onwards, but missing in the release notes)

Nevermind, that's different. I guess the behavior just changed to the above for tests only, but without a feature flag in the first place? I'm not sure, trying to find out why that change happened, why it breaks our tests and what expected migration looks like.

@TLATER
Copy link

TLATER commented Apr 1, 2022

I've dug a bit more into this, and I'm convinced that is a regression by now.

In a nutshell, we're building this test from the aos repository, and we're seeing this test error:

[2022-03-31T10:42:05Z] (cd /mnt/buildkite-agent/builds/buildkite-prd-g-i-00afc7d8bffa69495-1/k8_output_base/execroot/shasta && \
[2022-03-31T10:42:05Z]   exec env - \
[2022-03-31T10:42:05Z]     EXPERIMENTAL_SPLIT_XML_GENERATION=1 \
[2022-03-31T10:42:05Z]     JAVA_RUNFILES=bazel-out/k8-dbg/bin/external/aos/aos/util/scoped_pipe_test.runfiles \
[2022-03-31T10:42:05Z]     PATH=/bin:/usr/bin:/usr/local/bin \
[2022-03-31T10:42:05Z]     PYTHON_RUNFILES=bazel-out/k8-dbg/bin/external/aos/aos/util/scoped_pipe_test.runfiles \
[2022-03-31T10:42:05Z]     RUNFILES_DIR=bazel-out/k8-dbg/bin/external/aos/aos/util/scoped_pipe_test.runfiles \
[2022-03-31T10:42:05Z]     RUN_UNDER_RUNFILES=1 \
[2022-03-31T10:42:05Z]     TEST_BINARY=../aos/aos/util/scoped_pipe_test \
[2022-03-31T10:42:05Z]     TEST_INFRASTRUCTURE_FAILURE_FILE=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.infrastructure_failure \
[2022-03-31T10:42:05Z]     TEST_LOGSPLITTER_OUTPUT_FILE=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.raw_splitlogs/test.splitlogs \
[2022-03-31T10:42:05Z]     TEST_NAME=@aos//aos/util:scoped_pipe_test \
[2022-03-31T10:42:05Z]     TEST_PREMATURE_EXIT_FILE=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.exited_prematurely \
[2022-03-31T10:42:05Z]     TEST_SHARD_INDEX=0 \
[2022-03-31T10:42:05Z]     TEST_SIZE=medium \
[2022-03-31T10:42:05Z]     TEST_SRCDIR=bazel-out/k8-dbg/bin/external/aos/aos/util/scoped_pipe_test.runfiles \
[2022-03-31T10:42:05Z]     TEST_TARGET=@aos//aos/util:scoped_pipe_test \
[2022-03-31T10:42:05Z]     TEST_TIMEOUT=300 \
[2022-03-31T10:42:05Z]     TEST_TMPDIR=_tmp/69b36daf54d68eeb8c4fe94db8b8ac98 \
[2022-03-31T10:42:05Z]     TEST_TOTAL_SHARDS=0 \
[2022-03-31T10:42:05Z]     TEST_UNDECLARED_OUTPUTS_ANNOTATIONS=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.outputs_manifest/ANNOTATIONS \
[2022-03-31T10:42:05Z]     TEST_UNDECLARED_OUTPUTS_ANNOTATIONS_DIR=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.outputs_manifest \
[2022-03-31T10:42:05Z]     TEST_UNDECLARED_OUTPUTS_DIR=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.outputs \
[2022-03-31T10:42:05Z]     TEST_UNDECLARED_OUTPUTS_MANIFEST=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.outputs_manifest/MANIFEST \
[2022-03-31T10:42:05Z]     TEST_UNDECLARED_OUTPUTS_ZIP=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.outputs/outputs.zip \
[2022-03-31T10:42:05Z]     TEST_UNUSED_RUNFILES_LOG_FILE=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.unused_runfiles_log \
[2022-03-31T10:42:05Z]     TEST_WARNINGS_OUTPUT_FILE=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.warnings \
[2022-03-31T10:42:05Z]     TEST_WORKSPACE=shasta \
[2022-03-31T10:42:05Z]     TZ=UTC \
[2022-03-31T10:42:05Z]     XML_OUTPUT_FILE=bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.xml \
[2022-03-31T10:42:05Z]   external/bazel_tools/tools/test/generate-xml.sh bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.log bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.xml 0 127)
[2022-03-31T10:42:05Z] # Configuration: f450dad876624742a3daaa97dcb304a242804d936865dd06302275a46c80bfe8
[2022-03-31T10:42:05Z] # Execution platform: @aos//tools/platforms:linux_x86
[2022-03-31T10:42:05Z] �[32m[126 / 127]�[0m Testing @aos//aos/util:scoped_pipe_test; 0s remote
[2022-03-31T10:42:05Z] 
�[1A�[K�[32m[126 / 127]�[0m Testing @aos//aos/util:scoped_pipe_test; 0s remote
[2022-03-31T10:42:06Z] 
�[1A�[K�[31m�[1mFAIL: �[0m@aos//aos/util:scoped_pipe_test (see /mnt/buildkite-agent/builds/buildkite-prd-g-i-00afc7d8bffa69495-1/k8_output_base/execroot/shasta/bazel-out/k8-dbg/testlogs/external/aos/aos/util/scoped_pipe_test/test.log)
[2022-03-31T10:42:06Z] �[32m[126 / 127]�[0m Testing @aos//aos/util:scoped_pipe_test; 0s remote
[2022-03-31T10:42:06Z] 
�[1A�[K�[32mINFO: �[0mFrom Testing @aos//aos/util:scoped_pipe_test:
[2022-03-31T10:42:06Z] ==================== Test output for @aos//aos/util:scoped_pipe_test:
[2022-03-31T10:42:06Z] /var/lib/engflow/worker/work/3/exec/bazel-out/k8-dbg/bin/external/aos/aos/util/scoped_pipe_test.runfiles/shasta/../aos/aos/util/scoped_pipe_test: error while loading shared libraries: libexternal_Saos_Saos_Sutil_Slibscoped_Upipe.so: cannot open shared object file: No such file or directory
[2022-03-31T10:42:06Z] ================================================================================

The shared library files are in the runfiles, but not found because the test is executed from a path where it can't see those libraries anymore.

We're using upstream rules_cc, no changes to LD variables or anything else strange - I believe this should be reproducible by anyone using the aos repository as an external dependency. I guess nobody besides us is testing external tests?

This seems like a regression, and our only recourse was reverting 0080572 - is there any reason this wasn't behind a feature flag, or noted in release notes or something?

@Wyverald
Copy link
Member

Wyverald commented May 3, 2022

@AustinSchuh @TLATER I've reported the breakage to the author of the original commit.

@Wyverald
Copy link
Member

Wyverald commented May 3, 2022

Their response:

it seems like a similar case in rules_rust was fixed easily by adding "linkstatic = 1" to their test.
https://github.com/bazelbuild/rules_rust/pull/763/commits/5b6e0059b6fcf7705b83218652983e1ff04d0cf0

Could you try the same trick?

@AustinSchuh
Copy link
Contributor

AustinSchuh commented May 3, 2022

Their response:

it seems like a similar case in rules_rust was fixed easily by adding "linkstatic = 1" to their test.
https://github.com/bazelbuild/rules_rust/pull/763/commits/5b6e0059b6fcf7705b83218652983e1ff04d0cf0

Could you try the same trick?

I tried linkstatic = 1 and while it works, I literally need to modify every cc_test in all external repositories that we run. That's going to mean touching grpc, flatbuffers, protobuf and beyond. I don't want to have to maintain patched versions of those, so I'd have to get all their tests updated too. I don't think that's going to work.

This suggests to me that the quoted patch is incompatible with the way the C++ rules setup the shared library path.

@Wyverald
Copy link
Member

@oquenchil Do you have any comments about how C++ rules set up the shared library path? (see above)

@lberki @jin The commit in question was authored by someone on the Roboleaf team (IIRC) who has since left the team. Do you have any comments on the breakage?

@jin
Copy link
Member

jin commented May 25, 2022

Is this an error that triggers without the flags associated to this issue? (or one, or both?)

@AustinSchuh
Copy link
Contributor

Correct, this triggers without the flags.

@AustinSchuh
Copy link
Contributor

@jin , what's the correct course of action here? I can send a revert out. We missed 5.2.

@AustinSchuh
Copy link
Contributor

Ping, @jin , @lberki , what's the best course of action? I can send a revert out. 5.2 and 5.1 are broken, and this looks to be the culprit.

@AustinSchuh
Copy link
Contributor

FYI, @TLATER , #16008 appears to fix the failed test for me! Very exciting.

@thirtyseven
Copy link
Contributor

thirtyseven commented Dec 16, 2022

The $(JAVABASE) make variable as documented by https://bazel.build/reference/be/make-variables#custom_variables appears to be broken with --nolegacy_external_runfiles.

Minimal reproducible BUILD file with empty WORKSPACE:

genrule(
  name = "foo",
  outs = ["javabase.sh"],
  toolchains = ["@bazel_tools//tools/jdk:current_java_runtime"],
  cmd = "echo $(JAVABASE)/bin/jar > $@"
)

sh_binary(
  name = "javabase",
  srcs = ["javabase.sh"],
  data = ["@bazel_tools//tools/jdk:current_java_runtime"],
)

Running on 6.0.0rc5 with default flags:

% bazel run :javabase
...
INFO: Running command line: bazel-bin/javabase
Usage: jar [OPTION...] [ [--release VERSION] [-C dir] files] ...
Try `jar --help' for more information.

With --nolegacy_external_runfiles:

% bazel run :javabase --nolegacy_external_runfiles
...
INFO: Running command line: bazel-bin/javabase
/private/var/tmp/_bazel_tkaplan/01714b17dfc74692be6b1cae94835333/execroot/__main__/bazel-out/darwin_arm64-fastbuild/bin/javabase: line 1: external/local_jdk/bin/jar: No such file or directory

@martis42
Copy link

Is there any chance --legacy_external_runfiles can be flipped in Bazel 8.0.0.
Duplication all symlinks for external dependencies in the runfiles tree is wasting resources throughout many Bazel projects.
This is especially annoying when working with the hermetic Python toolchains of rules_python, see bazelbuild/rules_python#1653 and https://github.com/martis42/show_hermetic_python_overhead.

@lberki
Copy link
Contributor Author

lberki commented Dec 21, 2023

Update: it looks like there is a general feeling that this flag should be flipped and also a general feeling that no one wants to invest into doing so.

If that remains the case, there is no point in pretending that we'll ever flip the flag since the cost of flipping will not be lower in the future so the calculus will not shift towards flipping it, unless something really important comes along that warrants that investment.

Given that this is unlikely, let's do this: we wait until Bazel 7.1 arrives, then delete --experimental_sibling_repository_layout without flipping it. It's sad, but it's better than keeping it on life support. If we ever need its code again, we can resurrect it from source control.

@martis42
Copy link

@lberki to my understanding --experimental_sibling_repository_layout and --legacy_external_runfiles are related as they both change how external dependencies are managed. However, they can be discussed independently from each other as --legacy_external_runfiles can be flipped no matter what happens to --experimental_sibling_repository_layout. Is this understanding correct?

While it is sad that --experimental_sibling_repository_layout will be dropped, I have no opinion here.
What I care about is --legacy_external_runfiles. There we are now for years in a migration phase which wastes resources for all projects with external dependencies. Symlinks are cheap, but they are not for free.

No matter what happens to --experimental_sibling_repository_layout we should flip --legacy_external_runfiles in Bazel 8.0.0. At least we then have a state where we no longer wastefully put each external dependency twice into the runfiles tree. Also flipping the flag will raise awareness on this issue. If projects rely on the legacy location they can easily set the current value of the flag, no harm done.

fmeum added a commit to fmeum/bazel that referenced this issue Dec 27, 2023
Prepares for the flip (bazelbuild#12821)
by ensuring that Bazel itself doesn't regress.
fmeum added a commit to fmeum/bazel that referenced this issue Dec 27, 2023
Prepares for the flip (bazelbuild#12821)
by ensuring that Bazel itself doesn't regress.
fmeum added a commit to fmeum/bazel that referenced this issue Dec 27, 2023
Prepares for the flip (bazelbuild#12821)
by ensuring that Bazel itself doesn't regress.
@fmeum
Copy link
Collaborator

fmeum commented Dec 27, 2023

I also support the --legacy_external_runfiles flip. I have submitted a PR to flip it for Bazel tests, which caught an actual bug and prevents further regressions: #20680.

I can start a downstream run when the mac runner situation has improved. I have fixed issues in rules_go and will fix them in gazelle soon.

fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Prepares for the flip (bazelbuild#12821)
by ensuring that Bazel itself doesn't regress.
@lberki
Copy link
Contributor Author

lberki commented Jan 2, 2024

Apparently, --legacy_external_runfiles is the silver lining of the cloud that is the fact that --experimental_sibling_repository_layout will apparently be left unflipped. I thought that we can't get away with flipping one but not the other, but #20680 is feasible, at least it's not a complete loss.

I am somewhat worried about the fact that if we do this, the paths source artifacts are accessible under will be inconsistent between runfiles trees and action inputs, but I guess that ship has already sailed with bzlmod.

@Wyverald @meteorcloudy @fmeum @martis42 WDYT?

@martis42
Copy link

martis42 commented Jan 2, 2024

My perspective is we have those options:

  1. Ignore the problem and do nothing
  2. Revert the whole progress until now. In other words remove both --experimental_sibling_repository_layout and --legacy_external_runfiles. However, this also requires a new breaking change flag which disables the "put each external file into 2 places" behavior and makes sure only the legacy location below /external is available.
  3. Accept the inconsistency and flip --legacy_external_runfiles. Open discussion point would be if we give completely up on --experimental_sibling_repository_layout or keep it around another year and see if the inconsistency increases the demand to work on this.

My favorite is option 3) and keeping --experimental_sibling_repository_layout around a bit longer.
Option 1) is undesired from a user perspective as it means continuing to wastefully create superfluous symlinks.
Option 2) is as well a breaking change to the runfiles tree layout. If we have to invest work either way, we should invest into the presumably better option as presented in https://github.com/bazelbuild/bazel/wiki/Updating-the-runfiles-tree-structure.

I have no good answer for the inconsistency issue. All I can say here is that to me it seems acceptable as action input and runfile tree are either way different worlds.
And who knows, maybe the inconsistency triggers someones OCD and --experimental_sibling_repository_layout will even be worked on by someone eventually.

@lberki
Copy link
Contributor Author

lberki commented Jan 2, 2024

Agreed; @Wyverald and @meteorcloudy might know something that makes (3) infeasible, though.

fmeum added a commit to fmeum/bazel that referenced this issue Jan 2, 2024
Prepares for the flip (bazelbuild#12821)
by ensuring that Bazel itself doesn't regress.
fmeum added a commit to fmeum/bazel that referenced this issue Jan 25, 2024
Prepares for the flip (bazelbuild#12821)
by ensuring that Bazel itself doesn't regress.
copybara-service bot pushed a commit that referenced this issue Jan 26, 2024
This requires fixing an actual bug that causes the Windows Java launcher to not find the Java runtime with `--nolegacy_external_runfiles`.

Work towards #12821

Closes #20680.

PiperOrigin-RevId: 601826685
Change-Id: Ib45e60901d47efc06bf875c334edafcaeabc5f1e
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 26, 2024
This requires fixing an actual bug that causes the Windows Java launcher to not find the Java runtime with `--nolegacy_external_runfiles`.

Work towards bazelbuild#12821

Closes bazelbuild#20680.

PiperOrigin-RevId: 601826685
Change-Id: Ib45e60901d47efc06bf875c334edafcaeabc5f1e
github-merge-queue bot pushed a commit that referenced this issue Jan 27, 2024
This requires fixing an actual bug that causes the Windows Java launcher
to not find the Java runtime with `--nolegacy_external_runfiles`.

Work towards #12821

Closes #20680.

Commit
3d4491c

PiperOrigin-RevId: 601826685
Change-Id: Ib45e60901d47efc06bf875c334edafcaeabc5f1e

Co-authored-by: Fabian Meumertzheim <[email protected]>
@lberki
Copy link
Contributor Author

lberki commented Sep 10, 2024

Closing this change in favor of #23574 and #23576 , which track the flipping of these two flags separately.

@lberki lberki closed this as completed Sep 10, 2024
@meteorcloudy meteorcloudy closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2024
@meteorcloudy
Copy link
Member

@lberki Thanks for filing the new issues, I'll follow up to flip them before Bazel 8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incompatible-change Incompatible/breaking change P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: process
Projects
None yet
Development

No branches or pull requests