-
Notifications
You must be signed in to change notification settings - Fork 108
Iterate on requirements/cornerstones #148
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,24 @@ | ||
# IPLD Requirements | ||
# IPLD Cornerstones | ||
|
||
This document outlines parts of IPLD that should and should not be changed to | ||
ensure the success of future improvements (especially type systems). | ||
ensure the success of future improvements and the continuity of direction. | ||
|
||
**Definitions** | ||
|
||
* Block: A block is a chunk of an IPLD DAG, encoded in a format. Blocks have CIDs. | ||
* Fragment: A piece of an IPLD DAG. Blocks contain fragments. | ||
* Node: A node is a *point* in an IPLD DAG (object, array, number, etc.). | ||
* Link: A link is an IPLD Node that points to another IPLD Node. | ||
* Path: A paths a human readable pointer to an IPLD Node. | ||
Many nodes can exist encoded inside one Block. | ||
* Link: A link is a kind of IPLD Node that points to another IPLD Node. | ||
* Path: A path is composed of segments which each specify a step across an IPLD Node. | ||
|
||
## Linked | ||
|
||
IPLD must support linking to any IPLD node (even if the node is in the middle of | ||
a block). That is, IPLD must support arbitrary IPLD paths in links. | ||
The IPLD Data Model includes Links. A Link can be resolved to reach another IPLD Node. | ||
|
||
**Motivation:** Considering this in the context of programming languages, not | ||
being able to *store* a pointer to a struct *inside* of another struct would be | ||
severely limiting. | ||
|
||
NOTE: We don't currently support arbitrary paths but, in the context of | ||
programming, we really need to. | ||
**Motivation:** Linking makes it possible to build data structures which are | ||
theoretically unbounded in size, while still being traversable, consistent, | ||
authenticated and immutable. This unlocks the potential for a host of | ||
decentralized applications and is part of IPLD's fundamental purpose. | ||
|
||
## Immutable | ||
|
||
|
@@ -31,54 +28,89 @@ IPLD but there needs to be an immutable layer at the bottom. | |
**Motivation:** *Having* an immutable layer is important for a lot of analysis, | ||
memoization, type checking, etc. | ||
|
||
## Multicodecs Are Not Types | ||
## Multicodecs Are Not Meant to Act As Types | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is great. Probably the most concise text I’ve read so far on “why you don’t need to write a new codec.” Recently, I’ve run into some areas where I legitimately did need a new block format and codec for an application specific use case. Until I read this I didn’t have great language to distinguish why this case was different (it wasn’t about types, the motivation was compactness and limiting dependencies). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Heh... Yeah, we could almost make a "## Multicodec nonproliferation treaty" and put this motivation hunk under that. "Multicodecs are not meant to act as types" could just have a "see above" for its motivation. |
||
|
||
Multicodecs are used to indicate the format of data in a Block, and thus the | ||
codec which transforms that serial data into a tree of Nodes conforming to the | ||
IPLD Data Model. This is the limit of their purpose. | ||
|
||
In particular, multicodecs should not be confused with a | ||
[type system](https://en.wikipedia.org/wiki/Type_system). | ||
|
||
It's impossible to understand IPLD data at a *structural* level if we don't know | ||
the format. Therefore, we should avoid introducing new formats unnecessarily as | ||
*every* IPLD implementation needs to support these new formats. | ||
**Motivation:** It's impossible to understand IPLD data at a *structural* level | ||
if we don't know the format. Therefore, multicodecs describe the format, and | ||
we use this information to handle the transformation into the IPLD Data Model. | ||
Beyond this, we don't want to use multicodecs further, because we should avoid | ||
introducing new formats unnecessarily: *every* IPLD implementation needs to | ||
support these new formats, and this is a burden it's preferable to minimize. | ||
|
||
## No Non-Local Reasoning | ||
|
||
An IPLD block should never be interpreted in the context of *anything* not | ||
contained in the block (and CID). | ||
Transforming content of a Block into the IPLD Data Model should never require | ||
interpretation in the context of *anything* not contained in the Block plus CID. | ||
|
||
For example, assuming we add support for relative links, the following | ||
definition of `foo` would not be a valid IPLD block: | ||
Similarly, traversing an IPLD Node according to a Path should not require | ||
interpretation in the context of anything not already contained in that Node plus Path. | ||
|
||
``` | ||
**Motivation:** IPLD needs to be easy to reason about. | ||
|
||
**Negative Examples:** | ||
|
||
```javascript | ||
// This is an example of what is NOT possible. | ||
var foo = { | ||
// points outside of the current block, into the parent's "baz" field. | ||
"baz": {"/": "../../baz"} | ||
"baz": Link("../../zot") // NOT legal: makes a non-local reference. | ||
} | ||
var bar = { | ||
"foo": CidOf(foo), | ||
// `/foo/baz` points here. | ||
"baz": "something" | ||
"zot": "something" // `./foo/baz` imagines pointing here. | ||
} | ||
|
||
// resolution throug block `foo` depends on block `bar`. | ||
// resolution through block `foo` depends on block `bar`... | ||
Resolve("/ipld/${CidOf(bar)}/foo/baz/") | ||
|
||
// meaning this would be undefined, which is why relative links are NOT allowed: | ||
Resolve("/ipld/${CidOf(foo)}/baz/") | ||
``` | ||
|
||
For the same reason, IPLD links can't rely on an authority (e.g., a blockchain). | ||
|
||
Note: Links like this can still be encoded at the application level but they | ||
won't be handled by the IPLD resolver (and won't get the special "link" type). | ||
**Note:** Concepts that seem similar to relative linking can still be encoded | ||
at the application level. This is fine, but distinct from "IPLD Links", because | ||
such linking won't be interpreted by IPLD path and link resolution (e.g. they | ||
won't get the special "link" type, and won't violate the constraints that the | ||
IPLD Data Model expresses a DAG, etc). | ||
|
||
**Motivation:** IPLD needs to be easy to reason about. | ||
### Moving beyond local reasoning | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea making putting those things into a new section. |
||
|
||
**Caveat:** | ||
The "no non-local reasoning" rule holds at the Data Model layer. | ||
Some higher-level layers relax the rule. | ||
|
||
We *may* want to relax this if we want to move schemas into separate, | ||
deduplicated blocks (referenced by CID). If we do that, we'd need to fetch a | ||
block's schema before being able to interpret the it. | ||
For example, Advanced Data Layouts which split data across multiple blocks | ||
defacto carry some logical information in mind as they wield their constituent | ||
blocks (jumping into a HAMT mid-way through its trie with no context is unlikely | ||
to make any semantic sense, for example -- even though the data can still be | ||
parsed in terms of the Data Model). | ||
|
||
However, we need to *thoroughly* discuss any changes to this requirement. | ||
Schemas describe constraints around data and are typically applied over | ||
a whole DAG which may span multiple Blocks, and are themselves usually | ||
located in another Block (for ease of reference by CID). Schemas thus also | ||
can be seen as using some forms of non-local reasoning. | ||
|
||
1. The space savings may not be worth it given the size of CIDs (>40 bytes), | ||
compression, smart transports, and smart datastores. | ||
2. This change would introduce some weird interface complexities and potential | ||
network dependencies. | ||
Applications built on top of IPLD can also use their own contextual reasoning, | ||
as described earlier in the relative linking example. | ||
|
||
These are not contradictions of the "no non-local reasoning" rule; it's just | ||
relaxed for these high-level systems, and the scope of "local" can be | ||
understood more broadly. | ||
|
||
Since we can always interpret block structurally (e.g., parse them at least to | ||
the Data Model layer) -- even in data that's also meant to be used with | ||
Advanced Data Layouts or Schemas other application logic that uses contextual | ||
concepts, etc -- we can still have replication and hashing and DAG traversal | ||
and all the rest of the important promises of the IPLD Data Model regardless of | ||
that other context, meaning these systems are purely value-add and do not | ||
compromise any of the other core promises of IPLD. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A possible "motivation" could be building higher level data structures (like HAMT) on top of the core IPLD Data Model. |
||
|
||
## No Cycles | ||
|
||
|
@@ -105,6 +137,34 @@ on top of IPLD. | |
**Motivation:** Deterministic computations on top of a IPLD need to produce the | ||
same result every time. | ||
|
||
### Higher Level Pathing | ||
|
||
The "stable pathing" rule holds at the Data Model layer. | ||
Some higher-level layers relax the rule. | ||
|
||
For example, Advanced Data Layouts operate by "feigning" an IPLD Node which | ||
conforms with the Data Model specified behaviors in every way -- except that | ||
they're internally implemented in some way that maps the Node content onto | ||
Blocks in a more advanced way than the basic Data Model way. This means we | ||
can "path" across an Advanced Data Layout that acts like a map or a list as | ||
if it's a regular Node. We still aim for stable pathing: however, at this | ||
layer, that stability now requires a fixed understanding of the Advanced Layout | ||
logic itself. | ||
|
||
Schemas describe data in terms of both semantic types and a representation | ||
strategy, and in some cases the semantic type information contains a name | ||
(such as a struct field name) even while the representation does not (such as | ||
when a struct uses "tuple" representation, causing it to be transformed into | ||
a list rather than a map when encoded). In these cases, we can "path" across | ||
data interpreted in context of a Schema using the field names, even if at the | ||
Data Model layer it's been represented as a list (and thus has indexes instead | ||
of map keys corresponding to the field names). This kind of pathing can be | ||
stable and predictable, but (as with the Advanced Data Layouts story), that | ||
stability now requires more: holding the Schema declaration. | ||
|
||
Note that regular, core Data Model still maintains stable pathing even in these | ||
examples of higher level systems with alternative rules. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original sections all have a "Motivation" which I quite like. Here the motivation could be about different views on the data like IPFS is doing it with UnixFS. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. I should keep that up. |
||
|
||
## Link Transparent Pathing | ||
|
||
Path resolution must transparently traverse links. | ||
|
@@ -115,46 +175,4 @@ inline data into large objects (lots of duplication and copying). | |
|
||
## Primitives | ||
|
||
The "recommended" IPLD format (currently DagCBOR) needs to support *at a minimum*: | ||
|
||
* 32/64 bit integers without losing information. | ||
* 32/64 bit floats without losing information. | ||
* Unicode strings. | ||
* Binary strings. | ||
* Objects (with string keys, at least). | ||
* Arrays. | ||
* Booleans. | ||
* A bottom type (null). | ||
|
||
**Motivation:** Convenience, really. | ||
|
||
## Non-Cyclic, Block-Local Relative Links | ||
|
||
That is, relative links that don't traverse out the back of an object. See the | ||
conclusions from: [#1](https://github.com/ipld/specs/issues/1). | ||
|
||
**Motivation:** This is required to efficiently represent a highly connected DAG | ||
of tiny nodes. | ||
|
||
**Caveat:** This brings in some sticky issues around mutability. Depending on | ||
the implementation, relative links within an object may be act like mutable | ||
links (from the perspective of the user). The concern here is that we don't want | ||
users to bundle nodes together into single block *because* they want this | ||
mutability. | ||
|
||
# To Do | ||
|
||
Working through this, I realized we have a few things we really need to finish a few things before we can | ||
call IPLD ready. | ||
|
||
* **Path links.** Pointers that can only point to objects at block boundaries | ||
are useful but severely gimped. We've been fine up till now because we | ||
generally don't *edit* complicated datastructures but this will change. | ||
([#83](https://github.com/ipld/specs/issues/83)) | ||
* **Slice links.** For the same reason, we really should support | ||
`/ipld/QmID/start..stop` as a syntax for slicing an array. Most programming | ||
languages support this so *not* supporting it would be a bit awkward. | ||
([#84](https://github.com/ipld/specs/issues/84)) | ||
* **Link Spec.** We need to specify a complete and formal link spec and stick | ||
with it. | ||
* **Relative Links.** [#1](https://github.com/ipld/specs/issues/1). | ||
See [the IPLD Data Model](/data-model-layer/data-model.md#kinds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling to grok
a step across an IPLD Node.
How does one step across a point? Would it be more accurate to saystep between IPLD Node(s)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it just clicked.
A Node is:
A Link is:
A Path of
/A/B/C
represents:So maybe
step over IPLD Node(s)
is a better way of saying this. @warpfork?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree, this bit of text struggled. "between" might do better.
I'm not sure about that diagram, because it's not necessary that it be a link in-between; rememeber that a whole tree of Nodes can be in a Block.
Maybe I should also break this down into bullet points for "Path" and "Path Segment", and that might help? E.g. one Path Segment moves from one Node to the next (but only can describe movement of a single stride: e.g. looking up one map key, or stepping into an list by index). A Path is just a collection of Path Segments, which we often use to describe a walk of a specific, er, well, path down a Node tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made another shot at this in: #152 -- does that one seems clearer to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warpfork Much better, made a note on a slight clarification.