Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustPlatform.fetchCargoVendor: init #349360

Merged
merged 7 commits into from
Nov 16, 2024

Conversation

TomaSajt
Copy link
Contributor

@TomaSajt TomaSajt commented Oct 17, 2024

Related: #327063


This PR adds an alternative method to vendor the dependencies of a cargo project.
It's currently implemented in python, but could be changed later, if need be.

I converted some packages to use this fetcher as an example.

If you can come up with a good name for this fetcher, please share it below.

If we use this fetcher we won't have to worry about cargo changing its vendoring implementation (since this is completely custom), thus we can replace all importCargoLock usages where it was only required because of the presence of git dependencies.

The main idea is that we try to create a minimal staging FOD.
We fetch the tarballs into $out/tarballs/${name}-${version}.tar.gz and we fetch the git repos to $out/git/${git_sha_rev}.

This is the most important part to get right, since we will only be able to minimally change this in the future.
We would still be able to add some extra behaviours in cases that currently just throw an error (e.g. fetching other registries).


The staging FOD will then be used to create the actual vendor dir.

Since this no longer requires network access, we can freely change the implementation of this part without worrying about that hash.

The cargo workspace inheritance logic was already reimplemented by a script written for importCargoLock, so I just reused that.


Since this is a two-step process and only the first step is a FOD, we won't have outputHash and friends on the result. If we want to override the outputHash, we would have to do so through the vendorStaging attribute.

TODO: it might be useful to add an easier way to override the outputHash of vendorStaging.


I added a commit that makes buildCargoPackage swap over to this new fetcher implementation. This is currently done through a boolean flag.
Should it maybe be an enum instead? e.g. cargoFetcher = "custom" vs cargoFetcher = "default";


Sidenote:
The current vendoring script only requires the Cargo.lock file and nothing else. This would mean that fetchCargoVendor may easily be changed to not take in src and friends, but only a path to the cargo lock, like ${src}/Cargo.lock. This is not IFD, since we never import the path from nix, only from a script. This would be similar to how it's done with fetchYarnDeps.

Having it as a normal mkDerivation, like it is now, we can also apply patches and use hooks, which might be preferred.


It would be possible to skip the staging phase and make the vendor dir the FOD, however, this would lock our output format to never be able to change again.


The current implementation uses nix-prefetch-git to fetch a reproducible git tree, as I wanted to make sure I didn't make any implementation mistakes. I wonder if there's a better method than this...


Nothing is stopping us from also compressing the final vendor directory into a tarball, just like with fetchCargoTarball.
Tarballs - obviously - take up less space, so ideally we would do this.
Though a better fetcher name might be needed in that case.
I'm keeping the output as a directory for now, until the testing is done.


I also worry about the fact that non-FOD derivations will rebuild whenever the inputs change. This would mean that whenever python needs to be rebuilt, every vendor directory created by this would also need to be rebuilt. The rebuilds of the vendor directories would not take much time, but the used up space would be a concern.
It would be great if we could tell hydra not to cache certain derivations.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch 2 times, most recently from 8f86bba to 8581797 Compare October 17, 2024 21:00
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
pkgs/build-support/rust/mk-cargo-vendor-deps-script.py Outdated Show resolved Hide resolved
@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch 3 times, most recently from 143eb2c to eac2cca Compare October 17, 2024 21:33
@ofborg ofborg bot requested a review from lucastso10 October 17, 2024 23:54
@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch from eac2cca to 06035ed Compare October 18, 2024 10:17
@ofborg ofborg bot requested review from figsoda and winterqt October 18, 2024 11:24
@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch from 06035ed to e4ece40 Compare October 18, 2024 11:48
@ofborg ofborg bot added the ofborg-internal-error Ofborg encountered an error label Oct 18, 2024
@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch from e4ece40 to 440b366 Compare October 18, 2024 14:40
@TomaSajt TomaSajt changed the title WIP: rustPlatform.mkCargoVendorDeps: init WIP: rustPlatform.fetchCargoVendor: init Oct 18, 2024
@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch from 440b366 to ca03b0b Compare October 18, 2024 14:41
@TomaSajt TomaSajt changed the title WIP: rustPlatform.fetchCargoVendor: init rustPlatform.fetchCargoVendor: init Oct 18, 2024
@TomaSajt TomaSajt marked this pull request as ready for review October 18, 2024 14:46
@nix-owners nix-owners bot requested review from Mic92 and zowoq October 18, 2024 14:47
@TomaSajt
Copy link
Contributor Author

@ofborg build granian popsicle

@TomaSajt
Copy link
Contributor Author

TomaSajt commented Oct 18, 2024

I wonder why ofborg is failing...
Edit: looks like #331167 hit master, so I need config.toml instead of config.

@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch 2 times, most recently from 6533da7 to 260c771 Compare October 18, 2024 21:47
@TomaSajt TomaSajt removed the ofborg-internal-error Ofborg encountered an error label Oct 18, 2024
@TomaSajt
Copy link
Contributor Author

@ofborg build granian popsicle

@TomaSajt TomaSajt force-pushed the new-cargo-vendor-fetcher branch from 260c771 to d3d5f70 Compare October 18, 2024 22:02
@Atemu
Copy link
Member

Atemu commented Nov 17, 2024

Nothing is stopping us from also compressing the final vendor directory into a tarball, just like with fetchCargoTarball.
Tarballs - obviously - take up less space, so ideally we would do this.

I don't think we should do that btw. Handling plain files is simpler and faster.

If you cared about space usage of intermediate build files etc. you should just use a filesystem that supports transparent compression. It's a >50% reduction in space usage on my Nix store with only zstd:1.
You'd also simply GC if you need space anyways which would clean up this build-time dep in any case.

The rebuilds of the vendor directories would not take much time, but the used up space would be a concern.

It's actually slightly worse for hydra/cache.nixos.org as it has to cache this for every system separately, so take everything here x4. And of course rebuilds, so multiply by ~20-30 per year.

For "home users" the same argument w.r.t. space use applies here though.

Tis and the hydra issue could be remedied by unpacking the tarballs in the stable FOD format and them simply symlinking the actual sources into the unstable format. That way the "heavy-weight" ground truth data is in the FOD while the system- and python-specific derivations are just a light-weight translation layers.
This might be something to consider for a V2.

This would also protect us against crates.io possibly changing their gzip implementation or version of downloaded tarballs. I don't know whether there is any "contract" around that. This has bitten us with GitHub tarballs before for instance.

It would be great if we could tell hydra not to cache certain derivations.

That's not possible with the current implementation of nix remote builds AFAIK.

@TomaSajt
Copy link
Contributor Author

TomaSajt commented Nov 17, 2024

Tis and the hydra issue could be remedied by unpacking the tarballs in the stable FOD format and them simply symlinking the actual sources into the unstable format. That way the "heavy-weight" ground truth data is in the FOD while the system- and python-specific derivations are just a light-weight translation layers.
This might be something to consider for a V2.

This would also protect us against crates.io possibly changing their gzip implementation or version of downloaded tarballs. I don't know whether there is any "contract" around that. This has bitten us with GitHub tarballs before for instance.

The cargo tarballs downloaded are checksummed, so it should be fine.

I did not think of the symlinking option, seems like a promising idea. The unmodified stuff would be symlinks (non-git) and the normalized stuff would be just a copy (git with workspace stuff).

Though, yes, that would require unpacking in the fod, which I was afraid to do, but should be viable.

@Atemu
Copy link
Member

Atemu commented Nov 17, 2024

that would require unpacking in the fod, which I was afraid to do

Is there any particular reason you were "afraid" to do it?

@TomaSajt
Copy link
Contributor Author

TomaSajt commented Nov 17, 2024

that would require unpacking in the fod, which I was afraid to do

Is there any particular reason you were "afraid" to do it?

I wanted to do the minimum possible processing in the staging phase, to lower the possibility of messing something up in the implementation.

@TomaSajt
Copy link
Contributor Author

Also, importCargoLock should also "suffer" from the same storage problem, since it doesn't use its internal FODs directly either.
(for git, it's workspace normalization, using python) (for non-git, it just uses tar, so it should encounter less rebuilds).

This means that fetchCargoVendor should not be worsening the storage situation by much, compared to importCargoLock

@Atemu
Copy link
Member

Atemu commented Nov 19, 2024

I wanted to do the minimum possible processing in the staging phase, to lower the possibility of messing something up in the implementation.

I see. That's reasonable but I also think unpacking a tarball is hard to get wrong. (If you remember the command)

importCargoLock should also "suffer" from the same storage problem, since it doesn't use its internal FODs directly either

Indeed. Given that it has more pressing issues, I don't think we should place much importance on it for sustainable use within Nixpkgs.

This means that fetchCargoVendor should not be worsening the storage situation by much, compared to importCargoLock

You are right of course. This is merely a low-hanging fruit optimisation beyond the status quo.

I'll open a separate issue to discuss optimisations for fetchCargoVendor.

@nix-backports
Copy link

nix-backports bot commented Dec 6, 2024

Successfully created backport PR for release-24.11:

@getchoo
Copy link
Member

getchoo commented Dec 6, 2024

Making a backport as this is going to prevent a lot of other backports from going through, and shouldn't be a breaking change in any way. It can also lead to some nasty surprises on stable, as only setting useFetchCargoVendor = true won't cause an evaluation failure about fetchCargoVendor missing (found this in #362266)

@donovanglover donovanglover mentioned this pull request Dec 9, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: python 6.topic: rust 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 12.approvals: 1 This PR was reviewed and approved by one reputable person backport release-24.11 Backport PR automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants