From bea80f71e78fe97799526b1414931904010a397b Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Tue, 30 Jul 2019 20:03:56 +0200 Subject: [PATCH] Iterate on the requirements doc. We definitely need these concise definitions of goals (and nongoals; if anything, I think we might do well to have more of those!), but we have indeed refined a few things in the last 8 months or so. This is mostly addressing the huge batch of comments I left in the vicinity of https://github.com/ipld/specs/pull/146#pullrequestreview-268375431 . Particularly tricky is that some of our most interesting and useful ideas over the past few months *relax* some of the rules -- and so we need to carefully come up with a way to describe that: it is mostly recontextualizations consistent with the same underlying philosophies... but boy is that hard to express in text sometimes! --- REQUIREMENTS.md | 182 ++++++++++++++++++++++++++---------------------- 1 file changed, 100 insertions(+), 82 deletions(-) diff --git a/REQUIREMENTS.md b/REQUIREMENTS.md index ff187e72..41ce85f8 100644 --- a/REQUIREMENTS.md +++ b/REQUIREMENTS.md @@ -1,27 +1,24 @@ -# IPLD Requirements +# IPLD Cornerstones This document outlines parts of IPLD that should and should not be changed to -ensure the success of future improvements (especially type systems). +ensure the success of future improvements and the continuity of direction. **Definitions** * Block: A block is a chunk of an IPLD DAG, encoded in a format. Blocks have CIDs. -* Fragment: A piece of an IPLD DAG. Blocks contain fragments. * Node: A node is a *point* in an IPLD DAG (object, array, number, etc.). -* Link: A link is an IPLD Node that points to another IPLD Node. -* Path: A paths a human readable pointer to an IPLD Node. + Many nodes can exist encoded inside one Block. +* Link: A link is a kind of IPLD Node that points to another IPLD Node. +* Path: A path is composed of segments which each specify a step across an IPLD Node. ## Linked -IPLD must support linking to any IPLD node (even if the node is in the middle of -a block). That is, IPLD must support arbitrary IPLD paths in links. +The IPLD Data Model includes Links. A Link can be resolved to reach another IPLD Node. -**Motivation:** Considering this in the context of programming languages, not -being able to *store* a pointer to a struct *inside* of another struct would be -severely limiting. - -NOTE: We don't currently support arbitrary paths but, in the context of -programming, we really need to. +**Motivation:** Linking makes it possible to build data structures which are +theoretically unbounded in size, while still being traversable, consistent, +authenticated and immutable. This unlocks the potential for a host of +decentralized applications and is part of IPLD's fundamental purpose. ## Immutable @@ -31,54 +28,89 @@ IPLD but there needs to be an immutable layer at the bottom. **Motivation:** *Having* an immutable layer is important for a lot of analysis, memoization, type checking, etc. -## Multicodecs Are Not Types +## Multicodecs Are Not Meant to Act As Types + +Multicodecs are used to indicate the format of data in a Block, and thus the +codec which transforms that serial data into a tree of Nodes conforming to the +IPLD Data Model. This is the limit of their purpose. + +In particular, multicodecs should not be confused with a +[type system](https://en.wikipedia.org/wiki/Type_system). -It's impossible to understand IPLD data at a *structural* level if we don't know -the format. Therefore, we should avoid introducing new formats unnecessarily as -*every* IPLD implementation needs to support these new formats. +**Motivation:** It's impossible to understand IPLD data at a *structural* level +if we don't know the format. Therefore, multicodecs describe the format, and +we use this information to handle the transformation into the IPLD Data Model. +Beyond this, we don't want to use multicodecs further, because we should avoid +introducing new formats unnecessarily: *every* IPLD implementation needs to +support these new formats, and this is a burden it's preferable to minimize. ## No Non-Local Reasoning -An IPLD block should never be interpreted in the context of *anything* not -contained in the block (and CID). +Transforming content of a Block into the IPLD Data Model should never require +interpretation in the context of *anything* not contained in the Block plus CID. -For example, assuming we add support for relative links, the following -definition of `foo` would not be a valid IPLD block: +Similarly, traversing an IPLD Node according to a Path should not require +interpretation in the context of anything not already contained in that Node plus Path. -``` +**Motivation:** IPLD needs to be easy to reason about. + +**Negative Examples:** + +```javascript +// This is an example of what is NOT possible. var foo = { - // points outside of the current block, into the parent's "baz" field. - "baz": {"/": "../../baz"} + "baz": Link("../../zot") // NOT legal: makes a non-local reference. } var bar = { "foo": CidOf(foo), - // `/foo/baz` points here. - "baz": "something" + "zot": "something" // `./foo/baz` imagines pointing here. } -// resolution throug block `foo` depends on block `bar`. +// resolution through block `foo` depends on block `bar`... Resolve("/ipld/${CidOf(bar)}/foo/baz/") + +// meaning this would be undefined, which is why relative links are NOT allowed: +Resolve("/ipld/${CidOf(foo)}/baz/") ``` For the same reason, IPLD links can't rely on an authority (e.g., a blockchain). -Note: Links like this can still be encoded at the application level but they -won't be handled by the IPLD resolver (and won't get the special "link" type). +**Note:** Concepts that seem similar to relative linking can still be encoded +at the application level. This is fine, but distinct from "IPLD Links", because +such linking won't be interpreted by IPLD path and link resolution (e.g. they +won't get the special "link" type, and won't violate the constraints that the +IPLD Data Model expresses a DAG, etc). -**Motivation:** IPLD needs to be easy to reason about. +### Moving beyond local reasoning -**Caveat:** +The "no non-local reasoning" rule holds at the Data Model layer. +Some higher-level layers relax the rule. -We *may* want to relax this if we want to move schemas into separate, -deduplicated blocks (referenced by CID). If we do that, we'd need to fetch a -block's schema before being able to interpret the it. +For example, Advanced Data Layouts which split data across multiple blocks +defacto carry some logical information in mind as they wield their constituent +blocks (jumping into a HAMT mid-way through its trie with no context is unlikely +to make any semantic sense, for example -- even though the data can still be +parsed in terms of the Data Model). -However, we need to *thoroughly* discuss any changes to this requirement. +Schemas describe constraints around data and are typically applied over +a whole DAG which may span multiple Blocks, and are themselves usually +located in another Block (for ease of reference by CID). Schemas thus also +can be seen as using some forms of non-local reasoning. -1. The space savings may not be worth it given the size of CIDs (>40 bytes), - compression, smart transports, and smart datastores. -2. This change would introduce some weird interface complexities and potential - network dependencies. +Applications built on top of IPLD can also use their own contextual reasoning, +as described earlier in the relative linking example. + +These are not contradictions of the "no non-local reasoning" rule; it's just +relaxed for these high-level systems, and the scope of "local" can be +understood more broadly. + +Since we can always interpret block structurally (e.g., parse them at least to +the Data Model layer) -- even in data that's also meant to be used with +Advanced Data Layouts or Schemas other application logic that uses contextual +concepts, etc -- we can still have replication and hashing and DAG traversal +and all the rest of the important promises of the IPLD Data Model regardless of +that other context, meaning these systems are purely value-add and do not +compromise any of the other core promises of IPLD. ## No Cycles @@ -105,6 +137,34 @@ on top of IPLD. **Motivation:** Deterministic computations on top of a IPLD need to produce the same result every time. +### Higher Level Pathing + +The "stable pathing" rule holds at the Data Model layer. +Some higher-level layers relax the rule. + +For example, Advanced Data Layouts operate by "feigning" an IPLD Node which +conforms with the Data Model specified behaviors in every way -- except that +they're internally implemented in some way that maps the Node content onto +Blocks in a more advanced way than the basic Data Model way. This means we +can "path" across an Advanced Data Layout that acts like a map or a list as +if it's a regular Node. We still aim for stable pathing: however, at this +layer, that stability now requires a fixed understanding of the Advanced Layout +logic itself. + +Schemas describe data in terms of both semantic types and a representation +strategy, and in some cases the semantic type information contains a name +(such as a struct field name) even while the representation does not (such as +when a struct uses "tuple" representation, causing it to be transformed into +a list rather than a map when encoded). In these cases, we can "path" across +data interpreted in context of a Schema using the field names, even if at the +Data Model layer it's been represented as a list (and thus has indexes instead +of map keys corresponding to the field names). This kind of pathing can be +stable and predictable, but (as with the Advanced Data Layouts story), that +stability now requires more: holding the Schema declaration. + +Note that regular, core Data Model still maintains stable pathing even in these +examples of higher level systems with alternative rules. + ## Link Transparent Pathing Path resolution must transparently traverse links. @@ -115,46 +175,4 @@ inline data into large objects (lots of duplication and copying). ## Primitives -The "recommended" IPLD format (currently DagCBOR) needs to support *at a minimum*: - -* 32/64 bit integers without losing information. -* 32/64 bit floats without losing information. -* Unicode strings. -* Binary strings. -* Objects (with string keys, at least). -* Arrays. -* Booleans. -* A bottom type (null). - -**Motivation:** Convenience, really. - -## Non-Cyclic, Block-Local Relative Links - -That is, relative links that don't traverse out the back of an object. See the -conclusions from: [#1](https://github.com/ipld/specs/issues/1). - -**Motivation:** This is required to efficiently represent a highly connected DAG -of tiny nodes. - -**Caveat:** This brings in some sticky issues around mutability. Depending on -the implementation, relative links within an object may be act like mutable -links (from the perspective of the user). The concern here is that we don't want -users to bundle nodes together into single block *because* they want this -mutability. - -# To Do - -Working through this, I realized we have a few things we really need to finish a few things before we can -call IPLD ready. - -* **Path links.** Pointers that can only point to objects at block boundaries - are useful but severely gimped. We've been fine up till now because we - generally don't *edit* complicated datastructures but this will change. - ([#83](https://github.com/ipld/specs/issues/83)) -* **Slice links.** For the same reason, we really should support - `/ipld/QmID/start..stop` as a syntax for slicing an array. Most programming - languages support this so *not* supporting it would be a bit awkward. - ([#84](https://github.com/ipld/specs/issues/84)) -* **Link Spec.** We need to specify a complete and formal link spec and stick - with it. -* **Relative Links.** [#1](https://github.com/ipld/specs/issues/1). +See [the IPLD Data Model](/data-model-layer/data-model.md#kinds)