Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impure derivations #520

Open
copumpkin opened this issue Apr 21, 2015 · 38 comments
Open

Impure derivations #520

copumpkin opened this issue Apr 21, 2015 · 38 comments
Assignees
Labels
feature Feature request or proposal UX The way in which users interact with Nix. Higher level than UI.

Comments

@copumpkin
Copy link
Member

This might reveal a deep misunderstanding on my part, but as far as I can tell, nix fundamentally divides its derivations into "fixed-output" and "deterministic build", based on the presence/absence of outputHash. I'm wondering if there could be a third type of fundamental building block which could allow limited but trackable nondeterministic behavior. The main example I can think of right now is the new fetchTarball builtin, which has its own magic caching strategy, but you could imagine wanting to pull the latest git revision of something using fetchgit and the like. If you use fetchgit as a fixed-output derivation, you can't always get the latest version. If you have it "lie" and pretend not to be a fixed-output derivation, nix will only ever do the work once and not bother refreshing itself.

If nix supported this third type of derivation, I could imagine something like:

{
  fetchTarball = url: builtins.nondetDerivation {
    builder = ./fetchtarball.sh; # contains the actual download logic
    inherit url;
    cachingStrategy = "hourly"; # Perhaps it could take frequency specifiers like this, which would tell nix to incorporate evaluation time into the store hash, or possibly a more flexible mechanism that I haven't yet thought of
  };
}

Of course, it should be possible for you to take an expression and figure out all sources of nondeterminism in it (much like how this source downloader works) so as to better trust the evaluation.

Another possible feature of interest could be the notion of a nondetDerivation optionally (it's not possible with all sources of nondeterminism, but is obviously desirable) emitting some sort of an "anchor" allowing one to tie the nondeterministic evaluation down to something deterministic. Think how ruby's Gemfile ties itself down to Gemfile.lock (but we'd obviously provide hashes), and how when you fetch a git ref you can "lock it down" by resolving that ref to a hash. Another example is how the NixOS channel mechanism resolves the top-level redirect to a precise channel revision. Such an anchor file could then be maintained as a way to lock down nondeterminism to get reproducible system states, but you could also selectively (or in bulk) update the locked things (much like nix-channel --update) to get newer versions.

A last example is just how magic path references in nix copy things into the store for you. We could retain the built-in syntax, but translate the syntax into implicit invocations of the same nondetDerivation primitive.

Is this too weird? I'm just trying to think of a principled way to track my nondeterminism, and possibly to unify the channel world into pure nix.

TBC: I'm not proposing adding more nondeterminism to the system. Just want to be able to track/unify the existing stuff better.

@copumpkin
Copy link
Member Author

cc @edolstra @shlevy

@copumpkin
Copy link
Member Author

To make things even weirder, hydra could use this for its job specification with nondeterministic calls to fetchgit and fetchsvn.

@copumpkin
Copy link
Member Author

Nobody have any comments? I can flesh out the idea more if it would help. I think it could be a pretty cool way to manage the (limited but often necessary) pieces of mutable state in a Nix-based system.

@vcunat
Copy link
Member

vcunat commented May 1, 2015

To sum up, these derivations would:

  • need to have both impure access (mainly internet) and time-changing output, i.e. like fixed-output derivation without fixing the output hash;
  • be useful primarily to fetch latest version of foo from repository;
  • re-run their instantiation when they time-out (perhaps selectable as an attribute) or forced by some command-line flag.

Do I get this right?

Current status of code generators?

I'm certain there are already general tools that prefetch latest source and update hashes in *.nix files – currently I don't see a distinct advantage in having this built in. For example, @MarcWeber has these REGION AUTO UPDATE things IIRC, and there may be others. Putting the nondeterministic part into a separate tool seems easier to update exactly those things you want and let others locked down (shell-scripting your most common use cases).

@Ericson2314
Copy link
Member

I talked about some similar things in my somewhat-recent fetchgitLocal PR: NixOS/nixpkgs#10176 (comment). I think the interplay between the two derivations (a trick that predates my PR to be fair) is like the "anchoring" you mention.

@Ericson2314 Ericson2314 mentioned this issue Nov 6, 2015
7 tasks
@Ericson2314
Copy link
Member

From these issues with my new fetchgitlocal, NixOS/nixpkgs#10873 I am starting to think we need non-deterministic packages which run under the current user to generalize things putting like private directories in the store.

@copumpkin
Copy link
Member Author

I'll probably see if I can drum up some interest about this (and flesh out my proposal) at NixCon in Berlin. @Ericson2314, will you be there?

@Ericson2314
Copy link
Member

That would be great! Unfortunately, school will keep me away from NixCon, but let me know how it goes.

@copumpkin
Copy link
Member Author

copumpkin commented Apr 4, 2016

I've been tinkering with this recently, and might be able to put up a PR for a hypothetical implementation (subject to lots of implementation and design feedback) in the next week or so, if I get some time.

Edit: turned out to be more complicated than expected :(

@copumpkin
Copy link
Member Author

Tagging #904 for posterity.

@shlevy
Copy link
Member

shlevy commented Jan 12, 2017

@edolstra I'm considering working on this. Is there any chance I can get some assurance of a timely review and/or permission to merge myself before I put a large amount of work in?

@copumpkin
Copy link
Member Author

I posted this in another ticket:

part of the reason I'm so interested in #520 is that I think that could be a cool model for channels as well as packages. The main properties I want out of a nondeterministic derivation are the ability to (somehow, programmatically) define how often I want it to update, and (most of the time) give myself a way of pinning to a particular version. Think of Ruby's Gemfile and Gemfile.lock distinction: Gemfile (on some level) defines an update policy (via bounds on package versions), and Gemfile.lock is an instantiation of that policy to exact versions that will be reproducible.

Think of what we want from channels:

I want to point to e.g., github.com/nixos/nixpkgs-channels/tree/nixpkgs-unstable (basically an update policy; I want to update at most as often as the branch updates)
The branch can be resolved to an exact hash for later reproducibility
I want to know explicitly that somewhere in my (otherwise highly deterministic) Nix evaluation, a possibly nondeterministic "moving target" is involved, and be given the opportunity to lock it down to something that point 2 produces
I don't know of a great UI for this, but here's one not-so-great one that might inspire other ideas:

When you write a nondeterministic derivation, you generate a UUID and paste it into the expression source
Any evaluation of that nondeterministic derivation will get added to a top-level list of sources of nondeterminism in your expression, indexed by the associated UUID, and it's very clear when you evaluate an expression that your nondeterminism is included (so like when the top-level list of things to build and things to download from cache is printed, it could include a third category for these)
Any build of a nondeterministic derivation gets a sandbox that allows network access
The interface could (at first at least) basically be one that gives you a little "shim" to decide what to feed into a fixed-output derivation. That is, nondeterministic derivation = deterministic FO derivation + "decide (and record) which version to download". That would accommodate many common cases of git hashes and the like.
Nix maintains a central registry on your machine of current resolved UUIDs, and lets you request that a particular UUID be updated (this is the equivalent of nix-channel --update)
Then this mechanism can be used for channels, Hydra sources (don't have to make VCS into a first-class notion in Hydra anymore), packages that have sensible update semantics, and so on.

I realize this is still pretty sketchy and probably doesn't belong in this ticket, but I do think something in this direction would be a killer feature, allowing us to unify the deterministic Nix world with changing surroundings in a relatively painless manner.

edolstra added a commit that referenced this issue Feb 24, 2017
Impure derivations are derivations that can produce a different result
every time they're built. Example:

  stdenv.mkDerivation {
    name = "impure";
    __impure = true; # marks this derivation as impure
    buildCommand = "date > $out";
  };

Some important characteristics:

* Impure derivations are not "cached". Thus, running "nix-build" on
  the example above multiple times will cause a rebuild every time. In
  the future, we could implement some mechanism for reusing impure
  builds across invocations.

* The outputs of impure derivations are moved to a content-addressed
  location after the build (i.e., the resulting store path will
  correspond to the hash of the contents of the path). This way,
  multiple builds of the same impure derivation do not collide.

* Because of content-addressability, the output paths of an impure
  derivation recorded in its .drv file are "virtual" placeholders for
  the actual outputs which are not known in advance. This also means
  that "nix-store -q bla.drv" gives a meaningless path.

* Pure derivations are not allowed to depend on impure
  derivations. The only exception is fixed-output derivations. Because
  the latter always produce a known output, they can depend on impure
  shenanigans just fine. Also, repeatedly running "nix-build" on such
  a fixed-output derivation will *not* cause a rebuild of the impure
  dependency. After all, if the fixed output exists, its dependencies
  are no longer relevant. Thus, fixed-output derivations form an
  "impurity barrier" in the dependency graph.

* When sandboxing is enabled, impure derivations can access the
  network in the same way as fixed-output derivations. In relaxed
  sandboxing mode, they can access the local filesystem.

* Currently, the output of an impure derivation must have no
  references. This is because the content-addressing scheme must be
  extended to handle references, in particular self-references (as
  described in the ASE-2005 paper.)

* Currently, impure derivations can only have a single output. No real
  reason for this.

* "nix-build" on an impure derivation currently creates a result
  symlink to the incorrect, virtual output.

A motivating example is the problem of using "fetchurl" on a
dynamically generated tarball whose contents are deterministic, but
where the tarball does not have a canonical form. Previously, this
required "fetchurl" to do the unpacking in the same
derivation. (That's what "fetchzip" does.) But now we can say:

  tarball = stdenv.mkDerivation {
    __impure = true;
    name = "tarball";
    buildInputs = [ curl ];
    buildCommand =
      "curl --fail -Lk https://github.com/NixOS/patchelf/tarball/c1f89c077e44a495c62ed0dcfaeca21510df93ef > $out";
  };

  unpacked = stdenv.mkDerivation {
    name = "unpacked";
    outputHashAlgo = "sha256";
    outputHashMode = "recursive";
    outputHash = "1jl8n1n36w63wffkm56slcfa7vj9fxkv4ax0fr0mcfah55qj5l8s";
    buildCommand =
      "mkdir $out; tar xvf ${tarball} -C $out";
  };

I needed this because <nix/fetchurl.nix> does not support unpacking,
and adding untar/unzip functionality would be annoying (especially
since we can't just call "tar" or "unzip" in a sandbox).

#520
@chris-martin
Copy link
Contributor

So Shea told me about fetchgit today and it seems rather upsetting. It seems convenient sometimes, but is there going to be a config option or CLI flag or something to turn determinism back on? When I run a build, how will I be able to tell whether it's a deterministic one or one with unpinned fetches?

@copumpkin
Copy link
Member Author

Yeah, there's --pure as of a couple of day ago, I think. It should turn off all sources of impurity.

@shlevy
Copy link
Member

shlevy commented Jan 25, 2018

Internally at Target we expose fetchGit through an interface that enforces specifying either a revision or a tag (we map tags to tags/${tag} in the ref and they're only trusted for internal repos our team controls)

@edolstra
Copy link
Member

The motivation why fetchGit doesn't require a hash is that file system access doesn't require a hash either.
So evaluation was already impure at that level (you could edit a Nix expression and get a different result).

@Ericson2314
Copy link
Member

@edolstra does --pure affect filesystem access? (E.g. Only paths in already in store, etc.)

edolstra added a commit to edolstra/nix that referenced this issue Jan 31, 2019
Impure derivations are derivations that can produce a different result
every time they're built. Example:

  stdenv.mkDerivation {
    name = "impure";
    __impure = true; # marks this derivation as impure
    buildCommand = "date > $out";
  };

Some important characteristics:

* Impure derivations are not "cached". Thus, running "nix-build" on
  the example above multiple times will cause a rebuild every time. In
  the future, we could implement some mechanism for reusing impure
  builds across invocations.

* The outputs of impure derivations are moved to a content-addressed
  location after the build (i.e., the resulting store path will
  correspond to the hash of the contents of the path). This way,
  multiple builds of the same impure derivation do not collide.

* Because of content-addressability, the output paths of an impure
  derivation recorded in its .drv file are "virtual" placeholders for
  the actual outputs which are not known in advance. This also means
  that "nix-store -q bla.drv" gives a meaningless path.

* Pure derivations are not allowed to depend on impure
  derivations. The only exception is fixed-output derivations. Because
  the latter always produce a known output, they can depend on impure
  shenanigans just fine. Also, repeatedly running "nix-build" on such
  a fixed-output derivation will *not* cause a rebuild of the impure
  dependency. After all, if the fixed output exists, its dependencies
  are no longer relevant. Thus, fixed-output derivations form an
  "impurity barrier" in the dependency graph.

* When sandboxing is enabled, impure derivations can access the
  network in the same way as fixed-output derivations. In relaxed
  sandboxing mode, they can access the local filesystem.

* Currently, the output of an impure derivation must have no
  references. This is because the content-addressing scheme must be
  extended to handle references, in particular self-references (as
  described in the ASE-2005 paper.)

* Currently, impure derivations can only have a single output. No real
  reason for this.

* "nix-build" on an impure derivation currently creates a result
  symlink to the incorrect, virtual output.

A motivating example is the problem of using "fetchurl" on a
dynamically generated tarball whose contents are deterministic, but
where the tarball does not have a canonical form. Previously, this
required "fetchurl" to do the unpacking in the same
derivation. (That's what "fetchzip" does.) But now we can say:

  tarball = stdenv.mkDerivation {
    __impure = true;
    name = "tarball";
    buildInputs = [ curl ];
    buildCommand =
      "curl --fail -Lk https://github.com/NixOS/patchelf/tarball/c1f89c077e44a495c62ed0dcfaeca21510df93ef > $out";
  };

  unpacked = stdenv.mkDerivation {
    name = "unpacked";
    outputHashAlgo = "sha256";
    outputHashMode = "recursive";
    outputHash = "1jl8n1n36w63wffkm56slcfa7vj9fxkv4ax0fr0mcfah55qj5l8s";
    buildCommand =
      "mkdir $out; tar xvf ${tarball} -C $out";
  };

I needed this because <nix/fetchurl.nix> does not support unpacking,
and adding untar/unzip functionality would be annoying (especially
since we can't just call "tar" or "unzip" in a sandbox).

NixOS#520
FRidh pushed a commit to FRidh/nix that referenced this issue Feb 18, 2019
Impure derivations are derivations that can produce a different result
every time they're built. Example:

  stdenv.mkDerivation {
    name = "impure";
    __impure = true; # marks this derivation as impure
    buildCommand = "date > $out";
  };

Some important characteristics:

* Impure derivations are not "cached". Thus, running "nix-build" on
  the example above multiple times will cause a rebuild every time. In
  the future, we could implement some mechanism for reusing impure
  builds across invocations.

* The outputs of impure derivations are moved to a content-addressed
  location after the build (i.e., the resulting store path will
  correspond to the hash of the contents of the path). This way,
  multiple builds of the same impure derivation do not collide.

* Because of content-addressability, the output paths of an impure
  derivation recorded in its .drv file are "virtual" placeholders for
  the actual outputs which are not known in advance. This also means
  that "nix-store -q bla.drv" gives a meaningless path.

* Pure derivations are not allowed to depend on impure
  derivations. The only exception is fixed-output derivations. Because
  the latter always produce a known output, they can depend on impure
  shenanigans just fine. Also, repeatedly running "nix-build" on such
  a fixed-output derivation will *not* cause a rebuild of the impure
  dependency. After all, if the fixed output exists, its dependencies
  are no longer relevant. Thus, fixed-output derivations form an
  "impurity barrier" in the dependency graph.

* When sandboxing is enabled, impure derivations can access the
  network in the same way as fixed-output derivations. In relaxed
  sandboxing mode, they can access the local filesystem.

* Currently, the output of an impure derivation must have no
  references. This is because the content-addressing scheme must be
  extended to handle references, in particular self-references (as
  described in the ASE-2005 paper.)

* Currently, impure derivations can only have a single output. No real
  reason for this.

* "nix-build" on an impure derivation currently creates a result
  symlink to the incorrect, virtual output.

A motivating example is the problem of using "fetchurl" on a
dynamically generated tarball whose contents are deterministic, but
where the tarball does not have a canonical form. Previously, this
required "fetchurl" to do the unpacking in the same
derivation. (That's what "fetchzip" does.) But now we can say:

  tarball = stdenv.mkDerivation {
    __impure = true;
    name = "tarball";
    buildInputs = [ curl ];
    buildCommand =
      "curl --fail -Lk https://github.com/NixOS/patchelf/tarball/c1f89c077e44a495c62ed0dcfaeca21510df93ef > $out";
  };

  unpacked = stdenv.mkDerivation {
    name = "unpacked";
    outputHashAlgo = "sha256";
    outputHashMode = "recursive";
    outputHash = "1jl8n1n36w63wffkm56slcfa7vj9fxkv4ax0fr0mcfah55qj5l8s";
    buildCommand =
      "mkdir $out; tar xvf ${tarball} -C $out";
  };

I needed this because <nix/fetchurl.nix> does not support unpacking,
and adding untar/unzip functionality would be annoying (especially
since we can't just call "tar" or "unzip" in a sandbox).

NixOS#520
@deliciouslytyped
Copy link

Is there any hope of seeing __impure merged into the main branch any time soon?

@Ericson2314
Copy link
Member

@deliciouslytyped ca derivations make __impure a lot better, so we should wait for that.

@Ericson2314
Copy link
Member

Ericson2314 commented Sep 29, 2020

ca derivations make __impure a lot better, so we should wait for that.

And now we have them! (#4087) So let's resurrect this. Should be quite easy, actually.

@Ericson2314
Copy link
Member

Ericson2314 commented Sep 29, 2020

Looking at edolstra@690e06b, hare are some notes:

  • We now have DerivationType which is specifically meant to make dealing with new sorts of derivations, like this, easier. The only hiccup is how to store the extra purity bool. I suppose I would be in favor of combining Derivation and ParsedDerivation if it helps. (That would mean enriching the in-memory Derivation while continuing the same tricks to not mess with the drv file and nix expr representations.)

  • Pure derivations actually can depend on impure derivations. We just need to be careful not to pollute any maps with anything that depends on the current impure drv -> output mapping. Incidentally Allow non-CA derivations to depend on CA ones #4056 faces similar issues (don't let prior resolutions leak to eval time) and surmounts them.

  • We can also do "pure fixed output derivations" for free. I think this is good. For example, fetchpatch can become two derivations:

    1. fetch impurely without output hash.
    2. Normalize purely with output hash.

So let's just wait for #4056 to land, and then we basically "do it again" for this!

CC @regnat

@stale
Copy link

stale bot commented Mar 31, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Mar 31, 2021
@tomberek
Copy link
Contributor

still interested

@stale stale bot removed the stale label Aug 19, 2021
@stale
Copy link

stale bot commented Apr 17, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Apr 17, 2022
@MagicRB
Copy link
Contributor

MagicRB commented May 12, 2022

Still interested

@tomberek
Copy link
Contributor

Still interested

Does #6227 resolve your use-case?

@MagicRB
Copy link
Contributor

MagicRB commented May 12, 2022

Wow that was a fast response 😆 and yes it does! I want to use them in Hydra actually. Thanks!

@Ericson2314
Copy link
Member

Let's repurpose this to be a tracking issue for the now-merged unstable feature!

@stale stale bot removed the stale label May 12, 2022
@MagicRB
Copy link
Contributor

MagicRB commented May 12, 2022

Let's! I'll start playing with impure drvs soon enough. If I hit any issues I'll report back here.

@Ericson2314
Copy link
Member

I don't have perms to edit the issue or change its title, but impure-derivations is the name of the experimental feature added in the PR @tomberek linked.

@edolstra edolstra changed the title Nondeterministic derivations? Impure derivations May 13, 2022
@melvyn2
Copy link

melvyn2 commented Jun 13, 2022

I'm not able to use impure derivations at all: nix-build: src/nix-build/nix-build.cc:594: void main_nix_build(int, char**): Assertion `maybeOutputPath' failed.
This happens with any derivation that has __impure = true, so is easily reproducible with the example in #6227

{ pkgs ? import <nixpkgs> {}, ... }:
pkgs.stdenv.mkDerivation {
  name = "impure";
  __impure = true; # marks this derivation as impure
  #outputHashAlgo = "sha256"; # optional, default is sha256
  #outputHashMode = "recursive"; # optional, default is recursive
  buildCommand = "date > $out";
}

@bryanhonof
Copy link
Member

@melvyn2 I also get that error when running nix-build. But if I use the new cli, nix build --impure ..., it does seem to work?

/tmp/tmp.TMbuOx5fGy 
❯ cat default.nix 
{ pkgs ? import <nixpkgs> { } }:
pkgs.stdenv.mkDerivation {
  name = "impure";
  __impure = true;
  buildCommand = "date > $out";
}

/tmp/tmp.TMbuOx5fGy 
❯ nix build --impure --file default.nix

/tmp/tmp.TMbuOx5fGy 
❯ cat result 
Wed Aug  3 10:03:03 UTC 2022

/tmp/tmp.TMbuOx5fGy 
❯ nix-build 
this derivation will be built:
  /nix/store/2ylp1hynhl3902kjzii9ynvby9ljizwp-impure.drv
resolved derivation: '/nix/store/2ylp1hynhl3902kjzii9ynvby9ljizwp-impure.drv' -> '/nix/store/sm5kqqpsr9v7hk7hdxmhl4kxnd2mc3a6-impure.drv'...
building '/nix/store/sm5kqqpsr9v7hk7hdxmhl4kxnd2mc3a6-impure.drv'...
nix-build: src/nix-build/nix-build.cc:594: void main_nix_build(int, char**): Assertion `maybeOutputPath' failed.
Aborted (core dumped)

@fricklerhandwerk fricklerhandwerk added UX The way in which users interact with Nix. Higher level than UI. feature Feature request or proposal labels Sep 12, 2022
zolodev pushed a commit to zolodev/nix that referenced this issue Jan 1, 2024
@physics-enthusiast
Copy link

Would these kinds of impure derivations be permitted in flakes pure eval mode?

@tomberek
Copy link
Contributor

Would these kinds of impure derivations be permitted in flakes pure eval mode?

Yes, this would be safe because the outPath is not deterministic and the eval itself is not impure, only the build-phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal UX The way in which users interact with Nix. Higher level than UI.
Projects
None yet
Development

No branches or pull requests