-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce StoreReferences
and ContentAddressWithReferences
#3746
Conversation
48c4cd0
to
d785c1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this might be a little bit nicer representation in cases where you're special-casing the self reference, but we now have on the order of 40 call sites that are copying the references set, regardless of whether a self-reference is present. I've underestimated the weight of references before, so perhaps I'm a bit too wary. There's generally an order of magnitude more of those than store paths though.
If you could find more cases where the self-reference can/should be omitted when reading, that would help a lot. If this PR helps a lot with clarity it's also a win.
Found some surface level things, but this is not a proper review :)
src/libstore/path-info.hh
Outdated
ValidPathInfo(const Store & store, | ||
StorePathDescriptor && ca, Hash narHash); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd make more sense to me if this was a protected method on Store
. I'm not 100% what the Valid in ValidPathInfo
should mean, but regardless, I'd expect the bigger object of the two (Store
) to manipulate smaller data object (ValidPathInfo
), not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Store
is here only used to get the store dir to compute a path from this info. "Valid" should probably just be dropped from the name, there are no special invariants here beyond what meets the eye (e.g. if something says "nar hash", it should be a hash of some nar, etc.).
#6236 would factor out a data type for "just" these store path functions, precisely so we can make clearer in the types for situations like this when some powerful IO might be done vs just a little more configure information is needed.
That makes sense. I'm happy to just |
120c67b
to
39c11c5
Compare
@roberth upon further thought, I think all the Also (as I've amended the PR description to say), consider checking out #3959 where further change the content address types. If those changes look good to you, I might back-port them to this PR so that one can just be about the derivations. I'm happy to make the constructor change and rename fields, but I think I want to confirm with @edolstra before I spend the time making those sorts of final touches. |
It seems to improve code quality, so I don't think it has to be an issue. It's not I/O so I think we have some constant factor budget to spend. I'll assume most copies are short-lived as well, so memory footprint shouldn't be an issue, and if I assume wrongly, they were already copied before this PR, because that's the only way to retain them. |
Sure. First of all, does the part about trying to initialize a As to the other part, a problem I see today is that the fixed output hashing and text hashing uses very separate code paths despite being basically two instances of the same idea:
Meanwhile, we a good bit of duplicated code in Finally, my RFCs, without further cleanups, would make this problem worse:
This RFC doesn't fix these latter problems by itself --- it is just added more data structures. But i think in doing so it sets us up to.
The simplest thing to do might be just looking at #3959 to see the extant instance of this stuff being put to use. |
Also improve content-address.hh API docs.
We weren't because this ancient PR predated it! This is actually a new version of the pattern which addresses some issues identified in NixOS#7479.
Co-authored-by: Robert Hensing <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
.others = std::move(references), | ||
.self = false, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre-existing tech debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independently glad that you make these code improvements and that we're making a good step towards dynamic derivations, which is crucial for the sustainability of Nixpkgs.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-04-17-nix-team-meeting-minutes-49/27379/1 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Change itself
StoreReferences
This formalizes references where we don't necessarily know what we are called out, so self references are tracked with a
bool
not aStorePath
in the set.This is used with
FixedOutputInfo
(part ofStorePathDescriptor
) and internally the building logic.ContentAddressWithReferences
This is useful to make any sort of content-addressed store path, whether it has references or not. It also makes initializing
ValidPathInfo
easier, by making sure thepath
andca
fields are in sync.This is a sum type, not a pair of
ContentAddress
andReferences
, because different types of content-addressing as different rules about what sorts of references are allowed.Motivation
Constructing
ValidPathInfo
sThe chief motivation today is constructing
ValidPathInfo
. Today, for content-addressed paths, we have to:ValidPathInfo
ValidPathInfo
ValidPathInfo
It is easy for those steps where the same information must be provided again and again to drift out of sync.
With this change there is one data type for all content-addressed info (hashes and reference) that one can create with nice designated initializers (AKA explicit field names) and then use that to construct the
ValidPathInfo
all at once. This keeps everything in sync.Unblocking future feature work
Derivations can output "text-hashed" data #3959 leverages this to support text-hashed derivations. This is a key part of accepted RFC 92 Tracking issue for RFC 92: Dynamic derivations #6316.
Use
StorePathDescriptor
to fetching from CA-only stores #3754 leverages this to allow fetching from stores which only support content addressing. This was done as part of the IPFS / git hashing work, and something that we would hopefully return to with [RFC 0133] Git hashing and Git-hashing-based remote stores rfcs#133.Deduplicating the
addToStore
methodsWe currently have many method variations to add store objects to a store. In #4030 @roberth deduplicated this some in a new revision of the remote store protocol. With this + #3959, we can do something similar to the
Store
interface itself, cutting down on boilerplate.Context
More philosophically, the idea we are inching towards is that
ValidPathInfo
andNarInfo
are better as interfaces types (for talking to the SQLite DB and binary cache stores, respectively) than core data model types. For the CA world,StorePathDescriptor
+Realisations
is much better, as we are separating the trust free parts (what the data is) from the trustful parts (realisations, signatures etc.)I think that even the older input-addressed code can be reformulated along these lines, but rest assured I would not attempt to relegate
ValidPathInfo
andNarInfo
to the boundaries until I am confident I know how not make the input-addressed case worse.Checklist for maintainers
Maintainers: tick if completed or explain if not relevant
tests/**.sh
src/*/tests
Further refinements to the content addressing data types are made in #3959, so it might be good to check that for reference.