Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel: PATH differs between ctx.actions.run and ctx.actions.run_shell #94222

Open
aherrmann opened this issue Jul 30, 2020 · 10 comments
Open

Bazel: PATH differs between ctx.actions.run and ctx.actions.run_shell #94222

aherrmann opened this issue Jul 30, 2020 · 10 comments
Labels
0.kind: bug Something is broken

Comments

@aherrmann
Copy link
Member

Describe the bug
The environment, $PATH specifically, differs between build actions that use ctx.action.run and those that use ctx.action.run_shell when Bazel is provided by nixpkgs. Using run_shell $PATH will point to tools in the Nix store, using run instead will point to tools on the host system. This is particularly an issue for MacOS users where coreutils provided by Nix are GNU coreutils, while system coreutils are BSD flavored. E.g. readlink -f works with Nix coreutils, but fails with stock MacOS coreutils.

To Reproduce
Steps to reproduce the behavior:

  1. Checkout aherrmann/nixpkgs-bazel-run-env@077e2bd
  2. Build targets calling a shell script using ctx.actions.run and ctx.actions.run_shell respectively.
    $ nix-shell --pure --run 'bazel build //:all'
    
  3. Observe the difference in outcome.
    $ cat bazel-bin/using-run.txt
    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    readlink is /usr/bin/readlink
    $ cat bazel-bin/using-run-shell.txt
    /nix/store/58in3hsawf6g1adx99sphi460nzmaqxq-bash-4.4-p23/bin:/nix/store/vssdbs9s059qm5rqpw5q25z3c2d065f7-coreutils-8.31/bin:/nix/store/i13xf7dasqs66ylbnkbprg742a5izh8r-findutils-4.7.0/bin:/nix/store/nw699blhmjls6j1xbb938a1rjq6bw3jf-gawk-5.1.0/bin:/nix/store/dh5d3cpxwk5aixgphx0b61rxq9bfdmlr-gnugrep-3.4/bin:/nix/store/4mwsggbn3gijwixc09jwzh8l42h15n4m-gnutar-1.32/bin:/nix/store/zx7fxdqymhhgrqv6gfynsqzp7dmgn2hs-gnused-4.8/bin:/nix/store/bryws27lqm1j7vlssrd9xzqk9pf5rvlz-gzip-1.10/bin:/nix/store/vb47vacr597kabig0gvaqq2lk2wjfxnw-which-2.21/bin:/nix/store/jvbw3n6v3mp4yx33h7fa2x1zdak40g2a-unzip-6.0/bin:/nix/store/b081hh3zm012vqnwr3srgkvsbzmngkzf-file-5.39/bin:/nix/store/4ipxmrbv36fnaw6i5nfv2ccqdb4c94c2-zip-3.0/bin
    readlink is /nix/store/vssdbs9s059qm5rqpw5q25z3c2d065f7-coreutils-8.31/bin/readlink
    

Expected behavior
I would expect the environment to be consistent between both flavors of actions ctx.actions.run and ctx.actions.run_shell. I would expect $PATH to point to tools in the Nix store in both cases.

Notify maintainers

@Profpatsch @mboes

Metadata
Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 5.3.0-64-generic, Ubuntu, 19.10 (Eoan Ermine)`
 - multi-user?: `no`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.2`
 - channels(aj): `"nixpkgs-20.09pre214374.1fe82110feb"`
 - nixpkgs: `/home/aj/.nix-defexpr/channels/nixpkgs`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
@stale
Copy link

stale bot commented Jan 26, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 26, 2021
@ylecornec
Copy link
Contributor

On NixOS, this problem is even more important.

If we follow the steps to reproduce the bug, the build of the //:using-run target fails with:

bazel-out/host/bin/script: line 3: type: readlink: not found

whereas the build of the //:using-run-shell target works fine.

What seems to be happening

Using the bazel -s option, we can see that in the case that uses actions.run, the script is invoked with an empty environment as such:

  exec env - \
  bazel-out/host/bin/script bazel-out/k8-fastbuild/bin/using-run.txt

Since the script starts with #!/usr/bin/env bash and the PATH variable is not defined, env looks for bash in the system default directories returned by confstr(_CS_PATH). (We can observe this value with the getconf PATH command which returns /run/current-system/sw/bin:/bin:/usr/bin on NixOS).

Since PATH is still not set, the system bash will use its hard coded default, which is /no-such-path on NixOS and readlink is not found.

What to do ?

In the actions.run_shell case, it seems that the customBash binary is explicitly called, which is a wrapper that adds nix specific default folders to PATH and ensures it is set before calling bash.

Should we do something similar in the action.run case, i.e. find a way to modify the call to env so that PATH is always defined and contains these default folders ?

If so do you have some opinions/pointers about the best way to go about it ?

Metadata

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.10.57, NixOS, 21.11 (Porcupine)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.15`
 - channels(root): `"nixos-21.11pre309670.253aecf69ed"`
 - channels(stan): `""`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 31, 2021
@gergelyfabian
Copy link

gergelyfabian commented Nov 17, 2021

Is there any workaround available for this issue? E.g. what about built in rules, like sh_binary?
I also think, that most of the rule definitions for most plugins use .run() in their implementation, thus making coreutils unavailable when running Bazel over Nix (even though I'm wondering, should you need coreutils, when running with run() instead of run_shell()?).

@gergelyfabian
Copy link

In our case, we had an implementation of the following rule:

https://github.com/lucidsoftware/rules_play_routes/blob/9e425415f60035fd23ec3b275d94d33208d1796e/play-routes/play-routes.bzl#L50-L62

This rule uses run() instead of run_shell(), but it does not have access to shell environment (I guess this is expected, but only made evident when using Nix).
There is a parameter for run(), called use_default_shell_env that you can set to True:

https://docs.bazel.build/versions/main/skylark/lib/actions.html#run

Once this is True, Nix doesn't experience the reported problem any more. I guess any rule implementations, that use run() and also depend on the shell environment should set this option.

@aherrmann
Copy link
Member Author

There is a parameter for run(), called use_default_shell_env that you can set to True:

We have found in the past that use_default_shell_env = True causes frequent cache invalidation as it invalidates build actions whenever an env-var like PATH changes. See e.g. tweag/rules_haskell#1096. In that specific case we worked around the issue by providing standard shell tools through a Bazel toolchain that is imported from Nix in a controlled way using rules_sh. See tweag/rules_haskell#1136 and tweag/rules_haskell#1143. rules_sh has since been factored out into a standalone rule set https://github.com/tweag/rules_sh with Nix integration implemented in rules_nixpkgs.

@ylecornec
Copy link
Contributor

I opened the draft PR #146389 which fixed this problem for me, but was not extensively tested.

@gergelyfabian
Copy link

There is a parameter for run(), called use_default_shell_env that you can set to True:

We have found in the past that use_default_shell_env = True causes frequent cache invalidation as it invalidates build actions whenever an env-var like PATH changes. See e.g. tweag/rules_haskell#1096. In that specific case we worked around the issue by providing standard shell tools through a Bazel toolchain that is imported from Nix in a controlled way using rules_sh. See tweag/rules_haskell#1136 and tweag/rules_haskell#1143. rules_sh has since been factored out into a standalone rule set https://github.com/tweag/rules_sh with Nix integration implemented in rules_nixpkgs.

Another way to fix the changing PATH issue affecting cache usage is to use --incompatible_strict_action_env, and this is the way that I have seen work well.

@aherrmann
Copy link
Member Author

Another way to fix the changing PATH issue affecting cache usage is to use --incompatible_strict_action_env, and this is the way that I have seen work well.

Agreed, generally I'd recommend setting this flag. AFAIK there are some cases where it can cause issues (bazelbuild/bazel#7026), so it may not be compatible with all use-cases. But, it has worked well in the cases where I used it as well.

It's worth pointing out that in this case changes to --action_env env-vars still invalidate cached actions with use_default_action_env=True (as is to be expected). So, there is still a benefit to keeping use_default_action_env=False from a caching perspective.

ylecornec added a commit to ylecornec/nixpkgs that referenced this issue Dec 13, 2021
@googleson78
Copy link

Writing here so that my temporary knowledge is not lost - --incompatible_strict_action_env might not work properly with a bazel from nixpkgs (it doesn't seem to be working for me), possibly due to this patch.

Strum355 added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this issue Jan 16, 2024
bazel (and its community of rules) is a bit of a mess when it comes to passing around env values, including $PATH, resulting in some actions not receiving the $PATH value set via `--action_path=PATH=...` that we rely on. Bazel 6 in nixpkgs had [the following patch](https://sourcegraph.com/github.com/NixOS/nixpkgs/-/blob/pkgs/development/tools/build-managers/bazel/bazel_6/actions_path.patch) that hardcodes a set of packages to include in that case, which is currently missing from Bazel 7 in nixpkgs, so we (temporarily, hopefully) reintroduce that patch locally. 

See: NixOS/nixpkgs#262152 (comment) and NixOS/nixpkgs#94222 for some reading

I plan to move more bazel stuff out of shell.nix perhaps, into bazel.nix, as its a tad messy all-in-one

## Test plan

`bazel build //client/web/dist` is successful
@Silic0nS0ldier
Copy link

There are 2 things which may be leading to this issue.

  1. execlp(), execvp(), and execvpe() have a OS-specified default for PATH (usually /bin:/usr/bin). See the docs.
  2. Bash (and other shells) have a default value for PATH (not this is not actually an environment variable, just a shell variable). For Bash 5.2 this is /usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:. https://github.com/bminor/bash/blob/4b0f8ba28449c60d6b3436d0c98b14e016b232c2/config-top.h#L66

These defaults can be avoided with PATH="". Best I can tell, this cannot be added to actions with use_default_action_env=False (the default) globally (--action_env is ignored). env = { "PATH": "" } has to be added manually to every affected action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

5 participants