--enable_runfiles is not applied correctly without manual cleaning #19333

hofst · 2023-08-25T07:19:46Z

Description of the bug:

--enable_runfiles controls whether bazel will fill the runfiles directory with the specified dependencies (symlinked or copied).
It defaults to off on posix and to off on Windows.
When it is off, tests need to find their dependencies on their own, e.g., by inspecting the environment variables that bazel provides for this case.
Switching this flag, however, currently has a bug: It looks like bazel will do nothing unless the directory is manually cleaned.
E.g., if you previously did a build with runfiles enabled, and then turn it off, your test will run inside a filled runfiles directory. On the other hand, if you started with the flag off and then turned it on, your test will run in an empty runfiles directory. At least that is what I observe on Windows.
This is very problematic since it will cause test failures (very confusing ones).

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a test with runfiles dependencies. Turn the flag off, run the test, turn the flag on, run the test.

Which operating system are you running Bazel on?

Windows

What is the output of `bazel info release`?

6.3.1

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

tjgq · 2023-08-25T09:48:31Z

My suspicion is that this occurs because, even though flipping --enable_runfiles does invalidate the SymlinkTreeAction, only the runfiles output manifest is declared an output of the action, so Skyframe doesn't believe that the entire runfiles directory must be cleared prior to action (re-)execution.

A possible solution is to use a tree artifact to track the entire runfiles directory.

tjgq · 2023-08-25T09:50:22Z

Actually, there's a simpler solution: maybe SymlinkTreeAction should clear the runfiles directory (instead of simply emitting the output manifest) if it's running in --noenable_runfiles mode.

hofst · 2023-08-25T16:30:44Z

@tjgq wow that was a fast turn-around time, thank you very much.

Just to clarify: Both directions are important: when turning runfiles off, the directory should be deleted. But also when turning it on, the directory should be popuplated. The latter direction was also not working in case the runfiles directory pre-existed in an empty state. In your change, I only saw that you tested the "turning runfiles off" direction. The "turn runfiles on" direction would probably also make a good test.

tjgq · 2023-08-25T18:01:55Z

@hofst You're welcome! I happened to be working on something else that touches this part of the codebase, so I knew exactly what the problem was and how to fix it.

I'm happy to add a test for the reverse case, but I couldn't construct a scenario where it doesn't work already. Are you saying you have an example where building first with --noenable_runfiles and then wIth --enable_runfiles results in an empty symlink tree?

hofst · 2023-08-25T20:51:25Z

yes that is correct. I just reproed it again:
First I bazel test mytest with the Windows default parameters. I guess this implies --enable_runfiles=auto which translates to disabled runfiles on Windows. This will create a runfiles directory with only an empty __main__ folder and a manifest file inside of it.
Then I run bazel test mytest but with build --enable_runfiles added to my rcfiles.
This will run the tests again. Before running, it will recompile most things (which are consumed from my disk cache). The runfiles directory will not be filled with the test dependencies. If I delete the runfiles directory manually (or run bazel clean, it will popuplate the runfiles directory correctly.

Not sure if this is a caching issue, a Windows issue, or something else.

tjgq · 2023-08-28T09:59:10Z

@hofst, can I trouble you for a concrete repro case and the Bazel version you're using?

Here's how I attempted to repro it on Windows with Bazel 6.3.2:

sh_test(
  name = "test",
  srcs = ["test.sh"], # empty file
  data = ["data.txt"], # empty file
)

bazel test --noenable_runfiles --disk_cache=.disk_cache //:test

ls bazel-bin/test.exe.runfiles/__main__  # runfiles not present

bazel test --enable_runfiles --disk_cache=.disk_cache //:test

ls bazel-bin/test.exe.runfiles/__main__  # runfiles present

hofst · 2023-08-28T15:53:13Z

@tjgq
I'll try, but it may take some time to strip our quite complex test cases down to a minimal repro.

hofst · 2023-08-30T09:54:38Z

@tjgq
I was able to reproduce it inside our build system with your minimal test.
I listed all the additional flags that we set and played around with them.
It looks like we set the additional flag --nobuild_runfile_links which causes this behavior.
From reading the documentation I believe this is still a bug: I believe this flag should only prevent runfiles to be created during the build phase. But when we actually run test the runfiles should be created and updated. This appears to be broken: The runfiles appear to be created but they are not updated when the directory already exists and the enable_runfiles flag changes.

tjgq · 2023-08-31T10:49:29Z

Thanks, --nobuild_runfile_links was the missing bit. I'll also note that Windows and the disk cache are irrelevant; it repros just fine on Linux/MacOS without a disk cache. And, most importantly, it's not necessary to build with --noenable_runfiles first; running bazel clean && bazel test --enable_runfiles --nobuild_runfile_links //:test produces the same result. So this is really about what --nobuild_runfile_links is supposed to do, not about incremental correctness when flipping --enable_runfiles back and forth.

The reason you're seeing this behavior is that --nobuild_runfile_links is subtler than the documentation implies: the runfile symlinks will indeed be created just-in-time for actions that run locally, but they're created in the environment where the action runs - which, with sandboxing enabled, will be the sandbox, not the output tree. (You can convince yourself that the test is still able to access the runfiles by using[[ -f $RUNFILES_DIR/__main__/data.txt ]] as the test.sh; also observe that invoking Bazel with --spawn_strategy=standalone, which disables sandboxing, will cause the runfile symlinks to actually appear in the output tree.)

hofst · 2023-08-31T10:53:39Z

Hmm, if I understand you correctly you are saying that the --spawn_strategy=standalone would cause the runfiles symlinks to actually appear (i.e., my bug should not reproduce without sandboxing). However, on Windows, sandboxing is not supported anyways, but I still see the problem.

tjgq · 2023-08-31T11:26:30Z

Sorry, I should have actually tested on Windows before drawing conclusions from my experiments on Linux/MacOS. You're completely correct, the issue is definitely there on Windows, and it only occurs when incrementally toggling runfiles on (i.e., bazel clean && bazel test --enable_runfiles --nobuild_runfile_links //:test produces the runfiles, but bazel clean && bazel test --noenable_runfiles --nobuild_runfile_links //:test && bazel test --enable_runfiles --nobuild_runfile_links //:test does not.

tjgq · 2023-08-31T12:21:41Z

I suspect it has to do this code that decides whether to create runfiles just-in-time before action execution. It's checking whether the output manifest is a symlink, but on Windows we implement symlinks as a file copy by default. Setting the --windows_enable_symlinks startup option (which AIUI requires admin rights) fixes the bug, as does skipping the check entirely (i.e., unconditionally recreating the runfiles). But I don't fully understand the consequences of the latter; I worry that it might cause performance degradation in the case where the runfiles don't need to be recreated.

@hofst is --windows_enable_symlinks a viable workaround for you?

hofst · 2023-08-31T12:31:07Z

Unfortunately, --windows_enable_symlinks is already enabled in our build system but the bug still reproes for me.

Fixes two separate but related issues: * On all OSes, with `--nobuild_runfile_links`, flipping `--enable_runfiles` from on to off did not result in runfiles being cleared due to a missing call to `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to `SymlinkTreeStrategy` in f84329e. * On Windows only, with `--nobuild_runfile_links`, flipping `--enable_runfiles` from off to on did not result in runfiles being created since the logic in `RunfilesTreeUpdater` would see a copy of the manifest instead of a symlink. Work towards bazelbuild#20676 Fixes bazelbuild#19333

hofst added type: bug untriaged labels Aug 25, 2023

hofst assigned iancha1992, Pavank1992 and sgowroji Aug 25, 2023

Pavank1992 added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Aug 25, 2023

Pavank1992 unassigned sgowroji, Pavank1992 and iancha1992 Aug 25, 2023

copybara-service bot closed this as completed in f84329e Aug 25, 2023

tjgq reopened this Aug 25, 2023

tjgq mentioned this issue Sep 7, 2023

Tracking issue for "Remote Builds without the Bytes" #6862

Closed

19 tasks

fmeum mentioned this issue Dec 29, 2023

Make runfiles incrementally correct with --nobuild_runfile_links #20695

Closed

copybara-service bot closed this as completed in 84d1a72 Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--enable_runfiles is not applied correctly without manual cleaning #19333

--enable_runfiles is not applied correctly without manual cleaning #19333

hofst commented Aug 25, 2023

tjgq commented Aug 25, 2023

tjgq commented Aug 25, 2023

hofst commented Aug 25, 2023

tjgq commented Aug 25, 2023 •

edited

Loading

hofst commented Aug 25, 2023 •

edited

Loading

tjgq commented Aug 28, 2023

hofst commented Aug 28, 2023

hofst commented Aug 30, 2023

tjgq commented Aug 31, 2023

hofst commented Aug 31, 2023

tjgq commented Aug 31, 2023

tjgq commented Aug 31, 2023

hofst commented Aug 31, 2023

--enable_runfiles is not applied correctly without manual cleaning #19333

--enable_runfiles is not applied correctly without manual cleaning #19333

Comments

hofst commented Aug 25, 2023

Description of the bug:

Which category does this issue belong to?

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

tjgq commented Aug 25, 2023

tjgq commented Aug 25, 2023

hofst commented Aug 25, 2023

tjgq commented Aug 25, 2023 • edited Loading

hofst commented Aug 25, 2023 • edited Loading

tjgq commented Aug 28, 2023

hofst commented Aug 28, 2023

hofst commented Aug 30, 2023

tjgq commented Aug 31, 2023

hofst commented Aug 31, 2023

tjgq commented Aug 31, 2023

tjgq commented Aug 31, 2023

hofst commented Aug 31, 2023

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

tjgq commented Aug 25, 2023 •

edited

Loading

hofst commented Aug 25, 2023 •

edited

Loading