-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flakes: mitigating the n nixpkgs problem by storing full nixpkgs remote locally #4602
Comments
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/flakes-solving-the-n-nixpkgs-problem/11773/6 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/flakes-solving-the-n-nixpkgs-problem/11773/1 |
I think all the logic for what you want is here : The git fetcher already keeps a copy of the git repository to speed up further calls (the logic for that being implemented here). The only reason this doesn't happen by default for nixpkgs though because it's using the |
@regnat, thanks. I understand that this is available as an option already, and am using a local copy right now on my machine. This may not matter too much for stable nix users since they typically only use one channel. But it makes a big difference on my flakes based machine, where each flake references and locks a different revision of nixpkgs. Perhaps we could have the daemon build a copy of the local derivation (as I understand getting the entire history would take a while) in the background, and use default fetching behavior until it's finished. |
What does “as the default” mean? Having If we keep nixpkgs in the nix store, then they are immutable and connot fetch new revisions. Therefore the cache is in As a side-note: Fetching the github tarball of a nixpkgs revision (approx. 17 MB) takes me less time than fetching a new ref via git. Also, you'd need to compare shallow clones to the github fetcher: You can download the tarball over 120 times before you reach the > 2 GB size of the git repo. I'd prefer no automatic background fetches, at least not by default. What if I'm sitting in a train and using my phone's hotspot as Internet? If you like you can always create some dummy flake/nix registry entry and type I'd probably prefer making synchronization of flakes easier over making direct git repos the default. A nix command like |
I figured a lot of users we're probably using a local copy of nixpkgs on their machine for various reasons anyhow, so we might as well take advantage of it. Although maybe as a default is a bit too invasive. Having the option, and allowing nix to search locally if a certain
I like this idea.
I think we may be able to mitigate this somewhat by allowing users the option of only storing certain branches locally. Something like
Not quite sure I follow what you mean by using a dummy registry. Could you, perhaps, elaborate?
Aren't inputs already updated via the registry if their url is not specified and their name exists in the registry? I think the bigger issue with follows, besides it verbosity, is that it breaks reproducibility by changing the inputs that have been tested upstream, making things more difficult to debug and possibly causing a lot of extra builds, as your local version won't match what exists in a binary cache. More than just eliminating the network call, this was actually my main motivation for posting this issue in the first place. I'd much rather use the original artifact intended by upstream and available in a cache. In general you have some valid points, and as I tried to elaborate in my original post, it is already quite possible to use a local git repo and simply pin it in the registry and/or link to it from a flake. I just think making it a discoverable option and allowing a simple update service would help a lot more people take advantage of it. As I recall from last years NixCon, a big focus right now is on making things simpler and less convoluted for newcomers to help with mainstream adoption. While setting all this up manually isn't incredibly difficult, it isn't exactly the best UX either, as many people probably aren't even aware it's an option. |
You mean you could then fetch in your local development repo from the Nix git cache? Yeah, adding this as one remote and fetching from there if it's sufficient is probably efficient.
What do you mean by search locally? Nix can only use the cached git dir if it's using the git scheme, not if it's using github's tarballs.
First, the ref fetch time is not limited by the amount of data. It applies also to branches whose whole history is already locally available. Probably what takes the time for a large repo like nixpkgs is determining what objects are to be sent. Here I timed shallow and unshallow clones and measured their size: $ time git clone --bare --depth 1 --single-branch https://github.com/nixos/nixpkgs.git nixpkgs.shallow.git \
&& du -shc nixpkgs.shallow.git
[...]
git clone --bare --depth 1 --single-branch nixpkgs.shallow.git 2.48s user 1.13s system 20% cpu 17.677 total
28M nixpkgs.shallow.git
28M total
$ time git clone --bare --single-branch https://github.com/nixos/nixpkgs.git nixpkgs.single.git \
&& du -shc nixpkgs.single.git
[...]
git clone --bare --single-branch https://github.com/nixos/nixpkgs.git 490.03s user 41.96s system 87% cpu 10:07.85 total
1.3G nixpkgs.single.git
1.3G total
I don't mean a dummy registry but a dummy registry entry. Dummy here means that it's not for any concrete use other than fetching the master branch of nixpkgs. You could e.g. create a flake with no outputs but all refs you want to cache locally as inputs. Then, by typing
Yes, but I like to pin flakes to the version currently pinned in the registry using
If it is in a cache you have set up. Then you always end up with at least n nixpkgs for n flakes. But I don't think this is obviously the best. Maybe you value more reducing closure size and you test or ignore testing yourself, maybe you value more reproucibility.
As far as I understand, everything except for the auto-caching service/cron job/dummy flake is already setup. But to know about caching, you probably have to read the Nix source. I might write something about this in the NixOS Wiki. One problem of flakes is in my opinion that to discover many subtleties of its features, you have to read the source, and to begin with it's enough that the main Nixpkgs documentation is its source ;) At least for Nix, the people who have read the relevant sources should document flakes, and maybe the best place for such an unfinished feature is the wiki, which has probably the most thorough description of flake usage already. About UX: I'm optional auto-caching is probably a good idea. But I think the distinction between the different schemes (git+*, github) should remain as it is really a functional difference, and git as default scheme wouldn't be so well, because of its obvious bandwidth/speed problems at least on some platforms, and especially for first-time users. However, currently the flake input url scheme is biased towards specific api fetchers like |
{
nix.registry.nixpkgs.flake = nixpkgs;
} in |
@zseri, I have been doing this from the beginning of my flake usage. Unfortunately, this has no effect for nixpkgs that are pinned in a flake.lock, which will still be pulled from GitHub for a given revision, which is what this issue tries to address. Also, if the system flake doesn't refer to a specific ref, your fix makes it difficult to upgrade |
I personally don't like the global flake registry at all and avoid it most of the times by pinning nixpkgs manually in each flake, mostly to benefit more from the binary cache. Thus, the global flake registry / NIX_PATH and stuff for me is mostly relevant for nix shell and nix search... |
UpdateThe original comment is not entirely accurate after all. After some initial testing it did appear to work as advertised, but I started to notice some confusing behavior where a ref like "nixpkgs/nixos-unstable" would first resolve to the local nixpkgs copy (which is what I want) but if I update the lock file a second time, then it calls the network. I thought removing the duplicate registry entry (from the default flake registry) would resolve the issue, but now it can't resolve the ref at all. This would make more sense if it never worked at all. But the fact that it works initially, but then falls back to the network on subsequent attempts is very confusing to me. Original PostI just realized that there is sufficient toolage in flakes to already accomplish exactly what I desired here: # a nixos module that sets the nixpkgs registry value to a local copy of nixpkgs with all refs pulled
{
nix.registry.nixpkgs = {
url = "https://github.com/NixOS/nixpkgs.git";
type = "git";
allRefs = true;
} This allows one to then reference nixpkgs branches/commits from that registry value, and get the resulting nixpkgs ref without calling out to the network at all: # a flake.nix which references the nixpkgs registry value
{
inputs.nixos.url = "nixpkgs/nixos-unstable";
} The two major caveats are:
I'd say both of these caveats could be greatly mitigated if we could ask nix to only pull a list of refs that we are interested in storing locally. Commonly used branches like At the very least, we may want to document this somewhere so users know it's an option. |
This would require a patch to |
POC module that pulls a local copy of nixpkgs and does some work periodically to keep it up to date: |
As an additional note, while it may seem that the issue is fully resolved without adding any features to Nix, there is one big caveat. If you use the module posted above, you're flake.lock files will reference the local directory, which for a most users, probably won't exist. It'd be much nicer if Nix had an inbuild concept of a "flake mirror" where if a flake.lock references a revision of nixpkgs, for example, but a local git remote of nixpkgs has been setup on the machine, nix will try to pull from the local mirror instead of "github:NixOS/nixpkgs". That way, users can get the cache benefits if they want, without breaking the lock file for anyone that doesn't have it setup. Perhaps something like:
For simplicities sake, we could leave the handling of the local remote to the user (and maybe some premade NixOS modules like the POC above), and Nix will just fall back to the network if the revision doesn't exist on the mirror (or the directory doesn't exist, etc). |
This isn't really true, see the benchmarks I did here: https://github.com/ngi-nix/rfcs/blob/54236bd41e4086da0001d2f999c425f9ef8337ec/rfcs/0100-sign-commits.md#speeding-up-git Regardless of the status of the RFC, this ought to be implemented. With his speed-up in-place, I don't see the point of keeping the proprietary GitHub and GitLab fetchers, not to mention that we can likely speed up Git itself too. |
I just realized something that now seems so obvious it's a little embarrassing I didn't include it in the original write up. But if we could somehow leverage git worktrees into this equation, we could essentially get the best of both worlds here. Say we implement the concept of a "flake mirror" as I already suggested above. When this is the case, we could simply change the logic of If this were the case, something like follows may not even be really necessary at all. In principal at least, it is a better solution, since it doesn't invalidate upstream caches as follows do. I think the trickiest part would be how to handle the master copy, since I think it would have to live in the nix store too, in order for the other worktree copies to reference it safetly. I have some ideas of how it might be possible to update this "master" copy without having to do another complete checkout of it though. I have enough of the design in my head at this point, that I may give a PR a shot. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/nixos-flakes-dependency-on-github/16292/6 |
@nrdxp To store an updateable "master copy" in the store, you would have to support storing Git objects in the store natively. This would be a big change. |
I was actually imagining we could store a regular old bare git repo and generate a special fixed-output derivation that pulls in the existing master copy, runs a |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/multi-branch-flake-update-performance-improvement/19310/2 |
closing as this will likely never happen |
Is your feature request related to a problem? Please describe.
The problem is that, in the wild, various flakes will refer to various revisions and branches of nixpkgs in their
flake.lock
. Currently, we then have to rely on the network to pull all those revisions from GitHub. The user can manually alter their flake.nix and add properfollows
to only refer to a single nixpkgs, but this means that the version of nixpkgs the original flake author's have tested against will not be used defeating the reproducibility of flake artifacts.Describe the solution you'd like
Essentially, instead of having to pull in various copies of nixpkgs over the network, as is inevitably going to happen with various flakes refering to various version of nixpkgs, we could store a single, full copy of nixpkgs and all it's branches locally, in a secure location (ideally inside the nix/store). This way, when flakes start refering to various branches and revisions, it can simply pull them from the local remote rather than relying on the network.
This has the potentially to not only be a lot faster, but could skip a network request during evaluation. All that would really need to be modified at this point is for is the default entry for nixpkgs in the registry to point to the locally managed copy.
As an additional, but not required feature, we could have nix call
git fetch
if a flake refers to a revision that is not in the local nixpkgs remote, updating the remote as the user updates his flake. Alternatively, we could ask the user to manually manage the remote, and fall back to pulling from GitHub if the revision isn't stored locally, or we could simply update the remote on a timer.Describe alternatives you've considered
We could leave it as is, and users can pin their default registry manually if they want to take advantage of a local remote. I just feel like having this behavior as the default would save a lot of network bandwidth, and a lot of time.
Additional Context
I started a thread on the discourse about solving the n nixpkgs problem, and after some experimenting I realized most of the hard work is already implemented in Nix, thanks to the
git+file://
flake ref.Possible Implementation
Simply turn off network sandboxing as a special case to allow git to operate over the network while building a derivation, which consists solely of the entire nixpkgs tree. Since git is content addressed, this should be safe as compromised hashes can easily be verified.
For updates, pull the existing derivation as an input into the new derivation to save from downloading nixpkgs from scratch, instead merging changes to all branches.
a
nix.conf
boolean could be created:localpkgs
. When set to true the default value of thenixpkgs
flake in the nix registry points to the users local nixpkgs source derivation.The text was updated successfully, but these errors were encountered: