-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate packaging Rust crates separately #333702
Comments
I also forgot to mention that we could maybe share more actual crate builds between packages if we did this, though I don’t know how much that’d actually help or how tricky it’d be to set up the machinery for it. |
My idea would be to have Rust programs depend on packages containing source code for their dependencies, like "time_0_3". We'd have one package for each semver boundary, so "time_0_3", "time_0_4", "time_1", etc. cargoSetupHook would learn to assemble all Rust source inputs into a vendor directory. We'd have a script that, given a package for Rust program, generated the required packages, and output the list of dependencies to paste into the expression for the Rust program. The source packages could have an updateScript, so updating Rust packages could be taken care of via r-ryantm. I think this would be relatively straightforward to implement, and it's attractive because it makes Rust dependencies feel as much like normal dependencies as possible, even though they would just be source packages under the hood (because Cargo forces this). |
If we could make a scheme like that work, it’d be fantastic. I don’t want to add more generated Nix code in‐tree, or hand‐written boilerplate, than is necessary, though. If we can generate non‐code spec files for packages that Nix code turns into packages that work like that, that’d be fantastic. For instance, we could have a generator that turns a hand‐written |
Rather than one big lock file, sharding crates in smaller files by some name prefix make be both easier review (i.e. editors and github ui doesn't go mad because of the size) and more efficient to store in git (I remember @alyssais talking about some in-efficiencies with all-packages.nix). |
Right; my proposal in the original post (which I realize is tl;dr) was actually one file per package, to avoid Git conflicts. It’s only conceptually one big |
Specifically git stores files, not diffs/changes, so anything that reduces the filesize in a given commit will help massively. (Identical files are deduped.) |
Git is a lot smarter than that; files that are only slightly different are also deduped. In fact quite efficiently so: I have yet to find a more efficient storage format than a bare git repository for a bunch of large text files that differ slightly in a bunch of places. Having one big file is actually quite efficient. I have not measured it but I'd expect the per-file overhead (obj ID, file name, mode) and lack of huffman coding between the files' contents to be quite a lot less efficient than One Big File. |
I think you might both be right and it depends on whether the refs have been packed or not yet? But I don’t know that much about Git’s storage layer, so my knowledge might be terribly out of date. One single file doesn’t seem good for conflict handling, anyway. I think one file per package would be fine if it’s deduplicated across the whole tree, because after all Nixpkgs already consists basically entirely of files for individual packages. The more I think about (my version of) @alyssais’ proposal the more I am tempted to try and implement it. It would make things very normal. The only question is whether it can be automated enough to be comparably seamless to the status quo. |
You can basically always expect git objects to be packed when size is of importance. As for conflict handling: I don't see why it'd be a factor as the file should basically never be edited by hand, always by automation. We don't worry about conflicts in i.e. the hackage packages file either. |
It’d be edited by automation in independent PRs pretty frequently as crates are added to satisfy new dependencies in packages. That results in a lot of opportunity for conflicts because of mismatching diff context. Anyway, the main problem is that we need to avoid changes to the locked package set rebuilding all Rust crates, which is hard without CA derivations. Alyssa’s approach avoids that in a very simple way. |
(Note: git does not deduplicate within |
I like what the Ruby ecosystem in nixpkgs does, which is have a version-independent builders for gems. There are some crates I have had to fix over and over across different packages because they need some help building on Darwin (e.g., #328588, #328598, #328593). It would be better if fixes like that need only be done once. |
Yes, I am hoping that we can attach native libraries and other build instructions to the relevant crates. I am hopeful that if we deduplicate packages by SemVer major by default that will be all the sharing we need; we should carry as few incompatible versions as possible, and those are likely to differ enough that sharing is less of a concern. I think my attempt to nerd‐snipe someone into working on this has successfully boomeranged back onto me… |
I don't quite get why non-update operations on the package set would cause rebuilds. When you add or remove packages from the set, the existing entries should stay the same. Updates would of course need to be done in a staged manner like all of our package sets already do.
We're talking about large text files here, not large binary files (Binary Large OBject). Those get deduped just fine. I've deduped a dozen or two ~30MiB files into a few MiBs using a git repo before where tar.xz, borg, bup and zpaq would all produce something on the order of 200MiB. You should generally never store BLOBs in git unless they're really tiny and perhaps not even then. We should probably enforce this in Nixpkgs btw. but that's for another topic. |
I am sceptical of this. Using the python package set is extremely annoying because I end up overwriting it to lock dependencies to the version the package needs. So the approach hinted at earlier with one package per minor version would be much appreciated. |
If we have one Unfortunately we don’t have content‐addressed derivations, so we have to do the narrowing at Nix evaluation time without access to the In other words, we need explicit dependency lists of some kind. The trivial solution would be to just list every entry that would be relevant to the
The Python ecosystem is much worse about following SemVer and avoiding gratuitous breaking changes than Rust. Cargo assumes SemVer, and the convention in Rust is to just pin a minimum version and let Cargo automatically pick higher versions within that major bound. Rust developers are generally sticklers enough about breaking changes that this just works. The idea is that we would package one minor version of every SemVer‐major version we need (i.e. 0.1.*, 0.2.*, 1.*, 2.*, …). There are still opportunities for Hyrum’s law issues when we pick versions that aren’t the exact ones pinned in upstream lock files, so we may end up having to package multiple minor versions of the same major version sometimes, but that will hopefully be rare enough that the small amount of manual intervention required won’t be too annoying. Also, to be clear, I’m solely focused on in‐tree Nixpkgs use right now. It’s not (yet) my expectation that anyone outside of Nixpkgs would consume this package set rather than doing the same things they’d do now. |
The biggest files in the git history are the generated node-packages and the hackage file.
Or minor. Also some people love to pint exact patch versions of crates for no real reason which would make this more difficult than necessary. We could end up in situation like in python land where version constraints are recommendations if tests fail. |
One thing that I have not yet seen brought up is how backporting package updates to stable would work. I suppose the easiest answer is, that would only be done manually. One reason I bring this up is because, while the Rust community is generally good about respecting semver, increasing the minimum required compiler version is often not considered a "breaking change". So any backport to stable also requires figuring out if dependencies can be built on the compiler version in stable. The rust-version key in Cargo.toml makes this easier if it's included. |
I think these issues can be avoided with the approach I intend to explore.
Right. The architecture I have in mind could support arbitrarily‐precise version requirements if needed, but it’d be good to avoid if necessary. I don’t yet have an idea of how much of a problem it’d be and whether we’d feel the desire to patch
I hadn’t really thought about backports but I guess they should probably just be handled by running the automation from scratch on the release branch (and, yeah, making sure it takes MSRV into account). |
I should say: I don’t expect the Rust package set to be small, necessarily. I’m sure it will still take up a meaningful portion of the repository, even as we will be able to get rid of the The hope is that we will get a less redundant package set, deduplicating versions where possible instead of vendoring entirely separate lock files, while gaining insight into dependency trees for all packages rather than hiding them behind opaque FODs, and allowing the Rust package maintenance to scale better by being able to apply patches, version bumps, and build tweaks on a per‐crate basis. It remains to be seen exactly how it will pan out, but I am optimistic that we will get much more value out of it than we currently do from the space we spend on In any case, there will definitely not be one huge file prone to merge conflicts. In that sense, this issue is pretty badly named. (I just find One Big Lock File funny, and can’t think of a particularly good title.) |
Please let us know if there are cargo or rustc features that could help with this situation. I know that in general cargo has been growing more features to handle the MSRV thing. |
How would this work in a nix shell? I do most of my rust development this way and it seems like there would need to be tooling to sync Cargo.toml to match nixpkgs? |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
For some crates we should be able to enable superset of features. This doesn't work for crates that have mutual exclusive features though. |
Isn't that something dynamic derivations are meant to address? As far as I understand, that would let us process I'm not very in-tune with nix-interpreter development, but checking the relevant issues it seems to be actively happening: the last necessary change is written but blocked on the resolution of another bug which it exposed. (sorry for the repost, split it from the threat I tagged as off-topic) |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/state-of-haskell-nix-ecosystem-2024/53740/9 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/what-is-nixpkgs-preferred-programming-language/53848/33 |
Right now, every Rust package is an ecosystem unto itself, with dependency versions being selected from each package’s upstream
Cargo.lock
file (or a vendored one if none is present). This stands in contrast to how many language ecosystems in Nixpkgs work, and has caused us problems:Cargo.lock
considered harmful #327063 – keepingCargo.lock
files in the repository bloats its size, but gives us more static insight into the dependencies used by packages and avoids hacky FOD duplication.Rust 1.80.0 breaks some packages #332957 – in an ideal world, we could bump
time
once to the fixed version, rather than playing whack‐a‐mole with all the broken packages.This hasn’t happened yet, but I’m reading dealing with all the pinned
ffmpeg-sys-next
dependencies when I upgrade the default FFmpeg to 7. In general, it’s just pretty painful for us to patch Rust dependencies in a way that it isn’t for many other language ecosystems.I don’t think it’s practical for us to manually maintain a Rust package set like the Python one. However, I think we could do better here. The idea is that we could essentially have one big
Cargo.lock
file that pins versions of all crates used across the tree, and abandon bothcargoHash
and vendoredCargo.lock
files. The hope is that this would give us the static advantages of vendoredCargo.lock
files, let us reduce the number of versions of the same packages that we ship, and do treewide bumps of package versions with much less pain, while (I hope) still taking up less space in the repository than the status quo.It wouldn’t be feasible to maintain exactly one version of every package. Many popular packages have incompatible major versions, and we may not want to keep a package exclusively on an older version just for some outdated software that pins an older version than most software is compatible with. However, I suspect we could vastly reduce the proliferation of alternate crate versions across the tree.
A downside would be that we would no longer be using the “known good” versions of packages from our various upstreams. For some packages with incompletely‐declared dependency ranges, this could result in broken builds or functionality. In those cases, we would still have the option to vendor a package‐specific
Cargo.lock
file. Note that this is how it works in most Linux distributions, so although we might package more Rust software than the average Linux distribution, these challenges aren’t unique to us.This would not necessarily have to literally be one huge
Cargo.lock
file; we just need something we can turn intoCargo.lock
files or a Cargosource
to replacecrates.io
, as suggested in #327063 (comment). As @alyssais pointed out in the issue I linked, we don’t need thedependencies
array, and as I pointed out, one single file is conflict‐prone. I suspect we would want something like a JSON or TOML‐based format with one file per package (or package version). That should be comparable in size and organization to our other language package sets, and minimize conflicts.There are unsolved problems here, e.g.:
We probably don’t want every bump of any Rust library to rebuild every Rust application. We’d need to figure out some way to narrow down the rebuilds to what’s required by each package. The best option I can currently think of is adapting the
cargoHash
‐style FOD stuff to the task of selecting the subset of our One Big Lock File that is present in the upstreamCargo.lock
/Cargo.toml
somehow. For instance, it may be acceptable if every crate version bump rebuilds Rust derivations across the tree that check against thesrc
’sCargo.toml
and either succeed because the applicable versions remained constant, or fail because the hash of those versions no longer matches. We just need to be able to short‐circuit the actual builds.We’d need automation to manage the One Big Lock File. In particular, we’d want to be able to tell when a dependency bump is compatible with the version bounds in various packages so that we can decide between bumping vs. adding a new available version, and keep a set of crates that is consistent with the things we package. This could probably be as simple as just automating and unconditionally accepting SemVer‐compatible bumps, dealing with any fallout by hand, and trying SemVer‐incompatible bumps when we feel brave.
Ultimately, though, I think that the current status quo is causing a lot of problems, and that if we can successfully pursue this proposal, we’ll hopefully make all the groups here happier: the people who maintain Rust packages, the people who worry about the repository size and evaluation performance of Nixpkgs, and the people who worry about losing Nix’s static guarantees.
cc @matklad who suggested the
source
approachcc @alyssais who said that we used to do this but stopped
cc @Atemu who opened the issue about
Cargo.lock
The text was updated successfully, but these errors were encountered: