Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce StoreReferences and ContentAddressWithReferences #3746

Merged
merged 59 commits into from
Apr 17, 2023

Conversation

Ericson2314
Copy link
Member

@Ericson2314 Ericson2314 commented Jun 25, 2020

Change itself

StoreReferences

This formalizes references where we don't necessarily know what we are called out, so self references are tracked with a bool not a StorePath in the set.

This is used with FixedOutputInfo (part of StorePathDescriptor) and internally the building logic.

ContentAddressWithReferences

This is useful to make any sort of content-addressed store path, whether it has references or not. It also makes initializing ValidPathInfo easier, by making sure the path and ca fields are in sync.

This is a sum type, not a pair of ContentAddress and References, because different types of content-addressing as different rules about what sorts of references are allowed.

Motivation

Constructing ValidPathInfos

The chief motivation today is constructing ValidPathInfo. Today, for content-addressed paths, we have to:

  1. Compute the path from some CA info.
  2. Set that path in the ValidPathInfo
  3. Set a slightly different CA in the ValidPathInfo
  4. Set references in the ValidPathInfo

It is easy for those steps where the same information must be provided again and again to drift out of sync.

With this change there is one data type for all content-addressed info (hashes and reference) that one can create with nice designated initializers (AKA explicit field names) and then use that to construct the ValidPathInfo all at once. This keeps everything in sync.

Unblocking future feature work

Deduplicating the addToStore methods

We currently have many method variations to add store objects to a store. In #4030 @roberth deduplicated this some in a new revision of the remote store protocol. With this + #3959, we can do something similar to the Store interface itself, cutting down on boilerplate.

Context

More philosophically, the idea we are inching towards is that ValidPathInfo and NarInfo are better as interfaces types (for talking to the SQLite DB and binary cache stores, respectively) than core data model types. For the CA world, StorePathDescriptor + Realisations is much better, as we are separating the trust free parts (what the data is) from the trustful parts (realisations, signatures etc.)

I think that even the older input-addressed code can be reformulated along these lines, but rest assured I would not attempt to relegate ValidPathInfo and NarInfo to the boundaries until I am confident I know how not make the input-addressed case worse.

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or bug fix: updated release notes

Further refinements to the content addressing data types are made in #3959, so it might be good to check that for reference.

@Ericson2314 Ericson2314 changed the title Super WIP: More datatype changes -- contains #3689 WIP: ValidPathInfo distinguishes references from self-references Jun 25, 2020
@Ericson2314 Ericson2314 changed the title WIP: ValidPathInfo distinguishes references from self-references ValidPathInfo distinguishes references from self-references Jun 29, 2020
@Ericson2314 Ericson2314 changed the title ValidPathInfo distinguishes references from self-references PathReferences and StorePathDescriptor Oct 7, 2020
@Ericson2314 Ericson2314 changed the title PathReferences and StorePathDescriptor Introduce PathReferences and StorePathDescriptor Oct 7, 2020
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this might be a little bit nicer representation in cases where you're special-casing the self reference, but we now have on the order of 40 call sites that are copying the references set, regardless of whether a self-reference is present. I've underestimated the weight of references before, so perhaps I'm a bit too wary. There's generally an order of magnitude more of those than store paths though.
If you could find more cases where the self-reference can/should be omitted when reading, that would help a lot. If this PR helps a lot with clarity it's also a win.

Found some surface level things, but this is not a proper review :)

Comment on lines 106 to 108
ValidPathInfo(const Store & store,
StorePathDescriptor && ca, Hash narHash);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd make more sense to me if this was a protected method on Store. I'm not 100% what the Valid in ValidPathInfo should mean, but regardless, I'd expect the bigger object of the two (Store) to manipulate smaller data object (ValidPathInfo), not the other way around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Store is here only used to get the store dir to compute a path from this info. "Valid" should probably just be dropped from the name, there are no special invariants here beyond what meets the eye (e.g. if something says "nar hash", it should be a hash of some nar, etc.).

#6236 would factor out a data type for "just" these store path functions, precisely so we can make clearer in the types for situations like this when some powerful IO might be done vs just a little more configure information is needed.

src/libstore/content-address.hh Outdated Show resolved Hide resolved
src/libstore/content-address.hh Outdated Show resolved Hide resolved
@Ericson2314
Copy link
Member Author

I suppose this might be a little bit nicer representation in cases where you're special-casing the self reference, but we now have on the order of 40 call sites that are copying the references set, regardless of whether a self-reference is present. I've underestimated the weight of references before, so perhaps I'm a bit too wary. There's generally an order of magnitude more of those than store paths though.

That makes sense. I'm happy to just std::shared_ptr everything.

@Ericson2314
Copy link
Member Author

@roberth upon further thought, I think all the referencesToSelf stuff is just used at interface boundaries where there is already some O(n) traversal, so this at last no worse than a constant factor slowdown. Does that sound right to you?

Also (as I've amended the PR description to say), consider checking out #3959 where further change the content address types. If those changes look good to you, I might back-port them to this PR so that one can just be about the derivations.

I'm happy to make the constructor change and rename fields, but I think I want to confirm with @edolstra before I spend the time making those sorts of final touches.

@roberth
Copy link
Member

roberth commented Oct 13, 2020

... constant factor slowdown. Does that sound right to you?

It seems to improve code quality, so I don't think it has to be an issue. It's not I/O so I think we have some constant factor budget to spend. I'll assume most copies are short-lived as well, so memory footprint shouldn't be an issue, and if I assume wrongly, they were already copied before this PR, because that's the only way to retain them.

@Ericson2314
Copy link
Member Author

The thing I'm not clear on, however, is the point of the change. It's adding a lot of accidental complexity, and I don't see the advantages. Care to elaborate a bit on that?

Sure.

First of all, does the part about trying to initialize a ValidPathInfo in one go make sense? That is the most "self-contained" motivation.

As to the other part, a problem I see today is that the fixed output hashing and text hashing uses very separate code paths despite being basically two instances of the same idea:

  • separate functions to construct store paths
  • separate functions to add to store

Meanwhile, we a good bit of duplicated code in make-content-addressed.cc and local-derivation-goal.cc, as you observed two years ago.

Finally, my RFCs, without further cleanups, would make this problem worse:

  • RFC 92 means the local-derivation-goal.cc stuff must also support text hashing.
  • RFC 133 adds git hashing, which means instead of 2 similar things we now have 3, and even more duplication if we stay the current course.

This RFC doesn't fix these latter problems by itself --- it is just added more data structures. But i think in doing so it sets us up to.


The simplest thing to do might be just looking at #3959 to see the extant instance of this stuff being put to use.

Also improve content-address.hh API docs.
We weren't because this ancient PR predated it!

This is actually a new version of the pattern which addresses some
issues identified in NixOS#7479.
Co-authored-by: Robert Hensing <[email protected]>
@github-actions github-actions bot added fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command store Issues and pull requests concerning the Nix store labels Apr 17, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

.others = std::move(references),
.self = false,
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing tech debt.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independently glad that you make these code improvements and that we're making a good step towards dynamic derivations, which is crucial for the sustainability of Nixpkgs.

@roberth roberth enabled auto-merge April 17, 2023 13:31
@roberth roberth merged commit e641de0 into NixOS:master Apr 17, 2023
@Ericson2314 Ericson2314 deleted the path-info branch April 17, 2023 14:00
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-17-nix-team-meeting-minutes-49/27379/1

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/rfc-92-status-update/27441/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/rfc-92-status-update/27441/1

@Ericson2314 Ericson2314 mentioned this pull request Jul 5, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor-experience Developer experience for Nix contributors fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command RFC Related to an accepted RFC store Issues and pull requests concerning the Nix store
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants