Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken nix flake on aarch64-darwin #1262

Open
SchahinRohani opened this issue Aug 13, 2024 · 4 comments
Open

Broken nix flake on aarch64-darwin #1262

SchahinRohani opened this issue Aug 13, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@SchahinRohani
Copy link
Contributor

SchahinRohani commented Aug 13, 2024

Description

I'm encountering issues when running nix flake show --allow-import-from-derivation on MacOS (aarch64-darwin). The command results in an error that halts the process. It appears that there are issues in building the nixpkgs-patched with the wrong architecture.

Warnings & Errors

error:
       … while evaluating the attribute 'optionalValue.value'

         at /nix/store/l04ifn08y80fcd3iblpzfr0fj0pqc58z-source/modules.nix:856:5:

          855|
          856|     optionalValue =
             |     ^
          857|       if isDefined then { value = mergedValue; }

       … while evaluating a branch condition

         at /nix/store/l04ifn08y80fcd3iblpzfr0fj0pqc58z-source/modules.nix:857:7:

          856|     optionalValue =
          857|       if isDefined then { value = mergedValue; }
             |       ^
          858|       else {};

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: a 'aarch64-linux' with features {} is required to build '/nix/store/jrhk2sdi1dij7hka434irlb1rnw4xv51-nixpkgs-patched.drv', but I am a 'aarch64-darwin' with features {apple-virt, benchmark, big-parallel, nixos-test}

Environment

Operating System: MacOS
Architecture: aarch64-darwin
Nix Version: nix (Nix) 2.19.2
Nativelink Version: 0.5.1

Steps to Reproduce

Run nix flake show --allow-import-from-derivation on MacOS with aarch64-darwin architecture.
Observe the warnings and errors during the evaluation.

Expected Behavior

The nix flake show --allow-import-from-derivation command should complete successfully or provide actionable feedback without encountering fatal errors.

@aaronmondal aaronmondal added bug Something isn't working good first issue Good for newcomers labels Aug 13, 2024
@aaronmondal
Copy link
Member

I can reproduce on x86_64-linux. This is caused by the fact that it's not possible to crosscompile for darwin, not even between aarch64-darwin and x86_64-darwin. I thought I cut these paths entirely in #1233 but apparently I overlooked something.

The solution to this is to remove the darwin flake outputs from the packages section on systems that don't support them. I.e.

  • remove both if the host is linux
  • remove x86_64-darwin if the host is aarch64-darwin
  • remove aarch64-darwin if the host is x86_64-darwin

@SchahinRohani Do you want to try solving this issue?

@SchahinRohani
Copy link
Contributor Author

Thanks for the feedback on the flake error. I couldn't manage the quick fix due to the high complexity involved, so I decided to redesign the basic structure and work with Nix scripts. This led me to the following proposal:

Screenshot 2024-08-20 at 15 51 45

Setting up a multi-shell environment consisting of three shells: baseShell, nativeShell (for Rust developers), and webShell (for web projects like the docs). The baseShell would include only the essentials like linting and formatting tools, while the nativeShell would provide the full Rust development environment, including the necessary toolchains and bazel. Similarly, the webShell would include system toolchains required for things like Playwright. But the most important thing is to always modularize all scripts to an external place so that the flake.nix remains clean and only organizes the outputs.

However, given the complexity involved, I believe this is something we should definitely tackle together as a team. There's a lot going on here, and having everyone's input will be crucial to get this right.

Additionally, the CI would benefit from this setup, as it could be configured to run with only the necessary packages for each task, leading to more efficient and faster builds.

@aaronmondal Let me know what you think!

@aaronmondal
Copy link
Member

I can see that initially it might seem a bit curious why all our CI jobs take so long to "boot up" and it's tempting to try to reduce the size of the devshell to make some jobs quicker. However, there is actually not too much to gain here.

The baseShell would include only the essentials like linting and formatting tools

The pre-commit hooks already skip the devshell entirely since they use nix flake check. So for the pre-commit hooks the workflow is already optimal and does what a "baseShell" would do. Hence the ~1 min runtime for pre-commit hooks in CI on cache hits.

the nativeShell would provide the full Rust development environment, including the necessary toolchains and bazel. Similarly, the webShell would include system toolchains required for things like Playwright.

Compared to the "nativeShell" parts, the "webShell" is almost the same in size since both run a Bazel build in the same stdenv (which makes up the majority of the devshell size). Compared to that, the additional space requirements from webdev-specific tools (playwright being the large one here) are negligible.

You also wouldn't get too much of a speedup since setting up the devshell in CI is acutally fairly fast. In this job it takes about 6 minutes: https://github.com/TraceMachina/nativelink/actions/runs/10424002033/job/28871873664#step:6:1678. Of that, 2.5 minutes are spent on building the native-cli due to a cache miss and 1 minute for playwright, also due to a cache miss. If we ignore those two cache misses we're at ~2.5 min startup time. Optimizing here doesn't seem too useful.

Maintenance-wise it's also a nice property to have to only check a single devshell. If we had multiple devshells we'd have to test each one. Since Nix's "core" dependencies like gcc, glibc, etc are shared, we'd end up with an overall net increase of CPU cycles instead of a reduction. To elegantly handle multiple devshells we'd likely have to use multiple envrcs. But then it becomes unclear what happens when you're in one devshell and need to invoke a tool that might behave differentlly in another devshell.

In general, when improving CI performance it should be an "efficient" improvement. That is, a speedup of 2x isn't a "real" speedup if it requires 2x the compute resources. If however we could get a 10% CPU cycle reduction in CI that would be a fantastic gain. The tricky thing here to keep in mind is that parallelizing certain jobs is only useful when we don't regress in coverage. So initially it might seem like multiple devshells speed things up, but because of the added CI jobs required to cover all usecases we'd end up with a net loss of efficiency.

@aaronmondal aaronmondal mentioned this issue Aug 24, 2024
4 tasks
@aaronmondal aaronmondal removed the good first issue Good for newcomers label Oct 25, 2024
@aaronmondal
Copy link
Member

Removing "good first issue" label for now as this might be a potentially highly complex thing to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants