-
Notifications
You must be signed in to change notification settings - Fork 108
Foundations: more focused description of linking. #153
Conversation
I've written a new introduction of linking for that heading which focuses in on the parts we're sure about (and their upsides). Some things seem more like wishlists or are unratified (and that in some cases because the complexity of implications hasn't been sufficiently explored), and I removed much of that content. (Likely a good place to explore these things further would be by making new files in the "exploration reports" design directories.) Some of the wishlist content has also become fairly solved rather than being todos since the earlier rounds of this document's life (namely, some of the wishes for "slicing" can now actually be done using Selectors! which is fairly awesome), so that can also be dropped. This addresses several earlier reviews: - #146 (comment) - #146 (comment) - #146 (comment)
@@ -13,15 +13,12 @@ ensure the success of future improvements (especially type systems). | |||
|
|||
## Linked | |||
|
|||
IPLD must support linking to any IPLD node (even if the node is in the middle of | |||
a block). That is, IPLD must support arbitrary IPLD paths in links. | |||
The IPLD Data Model includes Links. A Link can be resolved to reach another IPLD Node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"another" seems misplaced here. Will "an" suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reads weirder to me. It's definitely not reaching the same node we started on.
being able to *store* a pointer to a struct *inside* of another struct would be | ||
severely limiting. | ||
|
||
NOTE: We don't currently support arbitrary paths but, in the context of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this, "arbitrary paths". By removing it are we missing something important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that it's correct to parse that sentence as a continuation of the thought above about links which aim directly to mid-block nodes, my refutation is is here: #146 (comment) -- in short, that we can't actually act as if this is fully and clearly defined; it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ipld/QmFoobar/path/to/thing
is an arbitrary IPLD path. The requirement is to be able to use any arbitrary path as a link in an IPLD object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the broader topic of is a Link as CID or CID + Path. I think we agree that this needs further discussion. Could we perhaps re-add the TODO section of this document and put in that issue? I really don't want to have the overall document blocked on that discussion, which I think is a pretty big one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slowly becoming more convinced this is at least viable by further discussion over in #146 (comment) -- specifically, someone suggesting that implementing updates to structures by copying data from a CID+Path link to a new block would be considered acceptable -- but agree that this still needs a lot more discussion -- for example, how many people would consider that copy-to-a-new-block behavior wise or unsurprising? If that implication, and any further transitive implications it has, are surprising enough to enough people, is there a point where that means it's a feature we actually don't want to pursue after all?
This is also discussed in #145, and I'm singing the same tune there; it's really problematic to say "we're definitely doing X" for any value of "X" that isn't fully explored nor even partially prototyped. A commitment to "linking" in general is one thing and we all agree on it; a commitment to these particular features of linking seems... well, maybe.
I'd be supportive of adding back in a TODO so it's clear from within this document that we have work to do to resolve this. I'd also be receptive to putting text about mid-block linking back here in the top if we had a really good set of texts to describe what that implies, but I don't know if that's something that's easy to do in this PR; I'd rather do this narrowing first, and then have an expansion back to CID+Path/midblock-targets be proposed afterward.
|
||
## Non-Cyclic, Block-Local Relative Links | ||
|
||
That is, relative links that don't traverse out the back of an object. See the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you're going to remove this then maybe you need to re-introduce "block-local" and "relative" into the above section about Links since it's not explicitly covered but seems to be important from the perspective of the original authors of this doc. See that it's noted again at the very bottom of the section and #1 is dedicated to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree holistically and completely that #1 is a good idea. This is one of the things that "seem more like wishlists or are unratified", and I propose removing for that reason.
It's certainly not something that we do now, it's not something we plan on doing soon (or ever) to my knowledge, and it got a lot of pushback in that issue itself from the very beginning. It's not even clear that it's philosophically directly compatible with our aims: it makes things not a DAG (or at least it pushes determining the non-cyclicness to a graph computation problem itself, rather than a given!), and we spend an awful lot of time talking about the joy of DAGs (and implicitly, how joyful they are when you don't have to do a graph algo first to prove you've got one!).
In short, I agree completely with Juan's comments on #1 which include "I don't like that."
The only thing I particularly like in that issue is this comment: #1 (comment) -- and specifically: "There are ways to represent cycles (on a data structure level) on an application level"... which is what I'd suggest we put in some sort of FAQ. IPLD built-in link loading does not do relative links, because that breaks the model. Applications can certainly invent a relative link concept and we'll try to help, or at the very least not be obstructive. Fin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this feature, one can't form a link without breaking nodes into new blocks.
In short, I agree completely with Juan's comments on #1 which include "I don't like that."
You're taking that out of context. The full comment is:
On one hand, i think this violates the idea of the DAG and will be confusing to people. It's ultimately a dag, but one typical property of DAGs is that when you traverse down an edge, you won't find parts you've already traversed. This effectively turns the DAG into a DG with the possibility of traversal cycles. I don't like that.
All the push-back is around enabling cycles or links that cross block boundaries. This section clearly forbids:
- Cross-block relative links as they'd be impossible to validate properly.
- Cycles within a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I don't want to implement, document, or try to explain to people that we have "links" and then we have, separately "relative, but not-cross block" links
- because it's complex and a mouthful
- because I don't think there exist any applications that want this but couldn't better do with with application level semantics of their own anyway
- and because it goes against the general grain of the desire to make blocks a transparent detail as much as possible.
-
I don't think saying "cycles within a block are forbidden" is the end of this.
- As I commented above: declaring that that's a requirement now means that we'd have to implement a graph exploration that validates the non-existence of cycles on any content that has relative links. That's a nontrivial add to the demands of the system in terms of implementation work that needs doing, and in terms of features that need detailed documentation, and even in terms of runtime costs. I don't think we should do this without a serious set of user stories saying we really need to, and I don't think issue text so far, even if it has number "1", has done that to a sufficient degree for how much of an ask this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all comes down to tradeoffs and we need to discuss it in terms of the tradeoff. In this case, the tradeoff is forcing arbitrary block boundries or not.
- With this requirement (and the "arbitrary IPLD path" requirement), I can always collapse an entire IPLD graph into a single block.
- Without this requirement, I can't do that. There are cases where I need multiple blocks.
On the other hand, there is a wart here: forbidding non-local relative links forbids the following operation:
A
/ \
v v
B-->C
to
A + C // block 1
\ ^
v /
B // block 2
Without that rule, block boundries would be truly arbitrary. An argument here would be that: If we can't make block boundries truly arbitrary, why bother? My counter argument would be the example I gave at the top: being able to collapse an entire subdag is still useful (directory tree to a single block).
WRT validation,
The only validation required would be within a single block. The validation is literally: walk the graph and see if you hit a cycle, remembering all nodes you've touched. I agree that validation sucks but we'll have to validate data when importing it anyways and, unless I'm missing something, this will pretty much always involve walking the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favour of not having inner Block relative links without cycles in the foundational document. I would put the information that we have about it (this discussion and #1) nicely written up into a design history document. From there we can take it out when we see the need to have it. I currently don't see a fundamental need for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This. I don't want to say 'no' eternally and with finality, but I think we should carry this through a process that recruits a lot more detail and design exploration before we consider it a 'yes' with finality either.
Not quite. The point of "slices" was to be able to link to a subslice of some array as if it were a node itself. |
To clear up any confusion, the requirements doc was about what IPLS must do (from my perspective), not what it currently does. We should argue from a standpoint of what IPLD must and must not do to be complete, not what we know or don't know how to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few requirements here that are being turned into a general "principle". They need to remain explicit requirements. We can discuss them and/or change them but we need to argue about their merits directly.
NOTE: We don't currently support arbitrary paths but, in the context of | ||
programming, we really need to. | ||
**Motivation:** Linking makes it possible to build data structures which are | ||
theoretically unbounded in size, while still being traversable, consistent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unbounded? Not without cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you rather phrase this?
I thought we typically referred to the size of a merkle-dag as being "unbounded". The size of a blockchain certainly isn't often called "bounded"!
|
||
## Non-Cyclic, Block-Local Relative Links | ||
|
||
That is, relative links that don't traverse out the back of an object. See the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this feature, one can't form a link without breaking nodes into new blocks.
In short, I agree completely with Juan's comments on #1 which include "I don't like that."
You're taking that out of context. The full comment is:
On one hand, i think this violates the idea of the DAG and will be confusing to people. It's ultimately a dag, but one typical property of DAGs is that when you traverse down an edge, you won't find parts you've already traversed. This effectively turns the DAG into a DG with the possibility of traversal cycles. I don't like that.
All the push-back is around enabling cycles or links that cross block boundaries. This section clearly forbids:
- Cross-block relative links as they'd be impossible to validate properly.
- Cycles within a block.
being able to *store* a pointer to a struct *inside* of another struct would be | ||
severely limiting. | ||
|
||
NOTE: We don't currently support arbitrary paths but, in the context of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ipld/QmFoobar/path/to/thing
is an arbitrary IPLD path. The requirement is to be able to use any arbitrary path as a link in an IPLD object.
In some cases, I think the problem is that we lack alignment about what ”in IPLD” means. I think that for @warpfork this means, effectively, what we can do today with the layers of abstractions we’ve already solidified (Data Model, Block Layer, CID’s, etc). But the vision for the project and what it will eventually provide is much larger. Links with paths is a good illustration of this. These are not part of the Data Model, but we’ve implemented support for this kind of traversal and interpretation of paths in selectors, several path libraries, and in the experimental Composites library. It’s something the project will eventually provide in some form, I think we’re all agreed on that, but it’ll be provided in libraries and layers we have not solidified yet. The challenge with this document is whether we consider it descriptive of solidified foundations and principals or if we consider it aspirational. The lines between these have traditionally been very blurry and something we’ve been trying to make clearer with the numerous re-organizing of the specs and the spec staging process. My preference would be to remove things we haven’t delivered on and detail them more thoroughly in another prescriptive document we can work towards in order to make this document a more consistent representation of how we operate and what we have delivered on. However, my preference isn’t strongly held and I’ll defer to people that have been putting more work into this document than I have. |
+1 to this. There's a reason I care about this, and a reason I titled this PR to be about "focus". IPLD is, in general, a scope monster. It's incredibly easy to say more things are "in scope"; and if we do this too freely, we will fail to ship things. The path to making a product involves a very small number of incidents of saying "yes", and a much larger number of incidents of saying "no", and we need to come to terms with that. Therefore, holistically, I'd like to:
If we're not sure something doesn't violate the design, or we're not sure if it results in substantial complications, we should error on the side of caution (and then work out how we're going to reduce that uncertainty). If we have aspirations, we can talk about those, but then the appropriate way to engage with those is by noting them as such, and exploring the motivations first, and ideally exploring more than one idea for how to satisfy the desires. This set of debates we're having about whether or not links include CID+Path is remarkably far in the weeds if this is a document about principles, scope, and aspirations. (Compare it to the level of holisticness that the "no-nonlocal reasoning" heading is on! I like that one!) And if it's instead a document about things that are extremely detailed commitments, then I think it's clear that several people feel this detail hasn't reached a level of definition that's comfortable to us. One way to address this is to take these ideas through some more work in the form of exploration reports, etc. Another way to address this is to find a short, not-overly-prescriptive way to describe the principles we consider important here, without picking the particular battle in the details that we seem to not be fully equipped and focused on resolving right now anyway. A way not to address this is to try to say it's absolutely a requirement while focusing in on details that haven't been explored very well, and therefore are hard to even say one way or the other whether they're then widely agreed upon. It's the focus on the details, and on getting people to commit to specific details that aren't well-explored yet, that's generating feelings of concern here. Generating and documenting user stories can help with this. Exploration reports can help with this. We have lots and lots and lots of ideas in our design toolkit for how to improve and iterate on these things to make discourse about them more productive. I think we all agree that there's work to be done here. We just need to resolve how we want to use them, who's going to put in that effort and when, and how much we're willing to carry commitments before that's done. |
I'm not sure how to proceed with this. To try to unblock this, I could close this PR, but I think we still certainly have unresolved issues here, so then the hope would be it will be easier to follow this up in others that break it down a bit more? Would that help? I originally thought it would make sense to tackle this set of subjects under one heading ("focus"), but it seems to have turned out problematic to tackle both the CID+Path-as-link subject, the relative links (asterisk, asterisk, asterisk) subject, and the handleful of other details at the same time. While breaking this down even further (remember, this PR was already one of... 4? 5? breaking these topics down already) will make more work, and probably be difficult to sequence in such a way that it doesn't make merge conflicts due to the textual proximity of the topics... perhaps it's necessary. If doing another take on this, I could also try to introduce exploration reports and design documents in the same diff, making it more of a "move" than a "remove". This might not be ideal -- it would be better if this work came from someone who's an advocate of this; it will be tricky for a non-advocate of the ideas to make a good case (and there's no small irony if I end up having to do this, since the majority of the reason I'm a non-advocate in the first place is that I'm not convinced these subjects a good use of attention right now) -- but if it leads to progress in this, I can give it a shot. |
bring mining.md doc up to speed with new EC changes
I've written a new introduction of linking for that heading which focuses in on the parts we're sure about (and their upsides).
Some things seem more like wishlists or are unratified (and that in some cases because the complexity of implications hasn't been sufficiently explored), and I removed much of that content. (Likely a good place to explore these things further would be by making new files in the "exploration reports" design directories.)
Some of the wishlist content has also become fairly solved rather than being todos since the earlier rounds of this document's life (namely, some of the wishes for "slicing" can now actually be done using Selectors! which is fairly awesome), so that can also be dropped.
This extracts some content from #148, and addresses several earlier reviews: