Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--enable_runfiles is not applied correctly without manual cleaning #19333

Closed
hofst opened this issue Aug 25, 2023 · 13 comments
Closed

--enable_runfiles is not applied correctly without manual cleaning #19333

hofst opened this issue Aug 25, 2023 · 13 comments
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged

Comments

@hofst
Copy link

hofst commented Aug 25, 2023

Description of the bug:

--enable_runfiles controls whether bazel will fill the runfiles directory with the specified dependencies (symlinked or copied).
It defaults to off on posix and to off on Windows.
When it is off, tests need to find their dependencies on their own, e.g., by inspecting the environment variables that bazel provides for this case.
Switching this flag, however, currently has a bug: It looks like bazel will do nothing unless the directory is manually cleaned.
E.g., if you previously did a build with runfiles enabled, and then turn it off, your test will run inside a filled runfiles directory. On the other hand, if you started with the flag off and then turned it on, your test will run in an empty runfiles directory. At least that is what I observe on Windows.
This is very problematic since it will cause test failures (very confusing ones).

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a test with runfiles dependencies. Turn the flag off, run the test, turn the flag on, run the test.

Which operating system are you running Bazel on?

Windows

What is the output of bazel info release?

6.3.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@Pavank1992 Pavank1992 added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Aug 25, 2023
@tjgq
Copy link
Contributor

tjgq commented Aug 25, 2023

My suspicion is that this occurs because, even though flipping --enable_runfiles does invalidate the SymlinkTreeAction, only the runfiles output manifest is declared an output of the action, so Skyframe doesn't believe that the entire runfiles directory must be cleared prior to action (re-)execution.

A possible solution is to use a tree artifact to track the entire runfiles directory.

@tjgq
Copy link
Contributor

tjgq commented Aug 25, 2023

Actually, there's a simpler solution: maybe SymlinkTreeAction should clear the runfiles directory (instead of simply emitting the output manifest) if it's running in --noenable_runfiles mode.

@hofst
Copy link
Author

hofst commented Aug 25, 2023

@tjgq wow that was a fast turn-around time, thank you very much.

Just to clarify: Both directions are important: when turning runfiles off, the directory should be deleted. But also when turning it on, the directory should be popuplated. The latter direction was also not working in case the runfiles directory pre-existed in an empty state. In your change, I only saw that you tested the "turning runfiles off" direction. The "turn runfiles on" direction would probably also make a good test.

@tjgq
Copy link
Contributor

tjgq commented Aug 25, 2023

@hofst You're welcome! I happened to be working on something else that touches this part of the codebase, so I knew exactly what the problem was and how to fix it.

I'm happy to add a test for the reverse case, but I couldn't construct a scenario where it doesn't work already. Are you saying you have an example where building first with --noenable_runfiles and then wIth --enable_runfiles results in an empty symlink tree?

@tjgq tjgq reopened this Aug 25, 2023
@hofst
Copy link
Author

hofst commented Aug 25, 2023

yes that is correct. I just reproed it again:
First I bazel test mytest with the Windows default parameters. I guess this implies --enable_runfiles=auto which translates to disabled runfiles on Windows. This will create a runfiles directory with only an empty __main__ folder and a manifest file inside of it.
Then I run bazel test mytest but with build --enable_runfiles added to my rcfiles.
This will run the tests again. Before running, it will recompile most things (which are consumed from my disk cache). The runfiles directory will not be filled with the test dependencies. If I delete the runfiles directory manually (or run bazel clean, it will popuplate the runfiles directory correctly.

Not sure if this is a caching issue, a Windows issue, or something else.

@tjgq
Copy link
Contributor

tjgq commented Aug 28, 2023

@hofst, can I trouble you for a concrete repro case and the Bazel version you're using?

Here's how I attempted to repro it on Windows with Bazel 6.3.2:

sh_test(
  name = "test",
  srcs = ["test.sh"], # empty file
  data = ["data.txt"], # empty file
)

bazel test --noenable_runfiles --disk_cache=.disk_cache //:test

ls bazel-bin/test.exe.runfiles/__main__  # runfiles not present

bazel test --enable_runfiles --disk_cache=.disk_cache //:test

ls bazel-bin/test.exe.runfiles/__main__  # runfiles present

@hofst
Copy link
Author

hofst commented Aug 28, 2023

@tjgq
I'll try, but it may take some time to strip our quite complex test cases down to a minimal repro.

@hofst
Copy link
Author

hofst commented Aug 30, 2023

@tjgq
I was able to reproduce it inside our build system with your minimal test.
I listed all the additional flags that we set and played around with them.
It looks like we set the additional flag --nobuild_runfile_links which causes this behavior.
From reading the documentation I believe this is still a bug: I believe this flag should only prevent runfiles to be created during the build phase. But when we actually run test the runfiles should be created and updated. This appears to be broken: The runfiles appear to be created but they are not updated when the directory already exists and the enable_runfiles flag changes.

@tjgq
Copy link
Contributor

tjgq commented Aug 31, 2023

Thanks, --nobuild_runfile_links was the missing bit. I'll also note that Windows and the disk cache are irrelevant; it repros just fine on Linux/MacOS without a disk cache. And, most importantly, it's not necessary to build with --noenable_runfiles first; running bazel clean && bazel test --enable_runfiles --nobuild_runfile_links //:test produces the same result. So this is really about what --nobuild_runfile_links is supposed to do, not about incremental correctness when flipping --enable_runfiles back and forth.

The reason you're seeing this behavior is that --nobuild_runfile_links is subtler than the documentation implies: the runfile symlinks will indeed be created just-in-time for actions that run locally, but they're created in the environment where the action runs - which, with sandboxing enabled, will be the sandbox, not the output tree. (You can convince yourself that the test is still able to access the runfiles by using[[ -f $RUNFILES_DIR/__main__/data.txt ]] as the test.sh; also observe that invoking Bazel with --spawn_strategy=standalone, which disables sandboxing, will cause the runfile symlinks to actually appear in the output tree.)

@hofst
Copy link
Author

hofst commented Aug 31, 2023

Hmm, if I understand you correctly you are saying that the --spawn_strategy=standalone would cause the runfiles symlinks to actually appear (i.e., my bug should not reproduce without sandboxing). However, on Windows, sandboxing is not supported anyways, but I still see the problem.

@tjgq
Copy link
Contributor

tjgq commented Aug 31, 2023

Sorry, I should have actually tested on Windows before drawing conclusions from my experiments on Linux/MacOS. You're completely correct, the issue is definitely there on Windows, and it only occurs when incrementally toggling runfiles on (i.e., bazel clean && bazel test --enable_runfiles --nobuild_runfile_links //:test produces the runfiles, but bazel clean && bazel test --noenable_runfiles --nobuild_runfile_links //:test && bazel test --enable_runfiles --nobuild_runfile_links //:test does not.

@tjgq
Copy link
Contributor

tjgq commented Aug 31, 2023

I suspect it has to do this code that decides whether to create runfiles just-in-time before action execution. It's checking whether the output manifest is a symlink, but on Windows we implement symlinks as a file copy by default. Setting the --windows_enable_symlinks startup option (which AIUI requires admin rights) fixes the bug, as does skipping the check entirely (i.e., unconditionally recreating the runfiles). But I don't fully understand the consequences of the latter; I worry that it might cause performance degradation in the case where the runfiles don't need to be recreated.

@hofst is --windows_enable_symlinks a viable workaround for you?

@hofst
Copy link
Author

hofst commented Aug 31, 2023

Unfortunately, --windows_enable_symlinks is already enabled in our build system but the bug still reproes for me.

fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Fixes two separate but related issues:
* On all OSes, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from on to off did not result in runfiles
  being cleared due to a missing call to
  `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to
  `SymlinkTreeStrategy` in f84329e.
* On Windows only, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from off to on did not result in runfiles
  being created since the logic in `RunfilesTreeUpdater` would see a
  copy of the manifest instead of a symlink.

Work towards bazelbuild#20676
Fixes bazelbuild#19333
fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Fixes two separate but related issues:
* On all OSes, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from on to off did not result in runfiles
  being cleared due to a missing call to
  `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to
  `SymlinkTreeStrategy` in f84329e.
* On Windows only, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from off to on did not result in runfiles
  being created since the logic in `RunfilesTreeUpdater` would see a
  copy of the manifest instead of a symlink.

Work towards bazelbuild#20676
Fixes bazelbuild#19333
fmeum added a commit to fmeum/bazel that referenced this issue Dec 29, 2023
Fixes two separate but related issues:
* On all OSes, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from on to off did not result in runfiles
  being cleared due to a missing call to
  `SymlinkTreeHelper#clearRunfilesDirectory` that was only added to
  `SymlinkTreeStrategy` in f84329e.
* On Windows only, with `--nobuild_runfile_links`, flipping
  `--enable_runfiles` from off to on did not result in runfiles
  being created since the logic in `RunfilesTreeUpdater` would see a
  copy of the manifest instead of a symlink.

Work towards bazelbuild#20676
Fixes bazelbuild#19333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants