ipld · warpfork · Jul 30, 2019 · lanzafame · Aug 1, 2019 · lanzafame
diff --git a/REQUIREMENTS.md b/REQUIREMENTS.md
@@ -1,27 +1,24 @@
-# IPLD Requirements
+# IPLD Cornerstones
 
 This document outlines parts of IPLD that should and should not be changed to
-ensure the success of future improvements (especially type systems).
+ensure the success of future improvements and the continuity of direction.
 
 **Definitions**
 
 * Block: A block is a chunk of an IPLD DAG, encoded in a format. Blocks have CIDs.
-* Fragment: A piece of an IPLD DAG. Blocks contain fragments.
 * Node: A node is a *point* in an IPLD DAG (object, array, number, etc.).
-* Link: A link is an IPLD Node that points to another IPLD Node.
-* Path: A paths a human readable pointer to an IPLD Node.
+  Many nodes can exist encoded inside one Block.
+* Link: A link is a kind of IPLD Node that points to another IPLD Node.
+* Path: A path is composed of segments which each specify a step across an IPLD Node.
 
 ## Linked
 
-IPLD must support linking to any IPLD node (even if the node is in the middle of
-a block). That is, IPLD must support arbitrary IPLD paths in links.
+The IPLD Data Model includes Links. A Link can be resolved to reach another IPLD Node.
 
-**Motivation:** Considering this in the context of programming languages, not
-being able to *store* a pointer to a struct *inside* of another struct would be
-severely limiting.
-
-NOTE: We don't currently support arbitrary paths but, in the context of
-programming, we really need to.
+**Motivation:** Linking makes it possible to build data structures which are
+theoretically unbounded in size, while still being traversable, consistent,
+authenticated and immutable.  This unlocks the potential for a host of
+decentralized applications and is part of IPLD's fundamental purpose.
 
 ## Immutable
 
@@ -31,54 +28,89 @@ IPLD but there needs to be an immutable layer at the bottom.
 **Motivation:** *Having* an immutable layer is important for a lot of analysis,
 memoization, type checking, etc.
 
-## Multicodecs Are Not Types
+## Multicodecs Are Not Meant to Act As Types
+
+Multicodecs are used to indicate the format of data in a Block, and thus the
+codec which transforms that serial data into a tree of Nodes conforming to the
+IPLD Data Model.  This is the limit of their purpose.
+
+In particular, multicodecs should not be confused with a
+[type system](https://en.wikipedia.org/wiki/Type_system).
 
-It's impossible to understand IPLD data at a *structural* level if we don't know
-the format. Therefore, we should avoid introducing new formats unnecessarily as
-*every* IPLD implementation needs to support these new formats.
+**Motivation:** It's impossible to understand IPLD data at a *structural* level
+if we don't know the format.  Therefore, multicodecs describe the format, and
+we use this information to handle the transformation into the IPLD Data Model.
+Beyond this, we don't want to use multicodecs further, because we should avoid
+introducing new formats unnecessarily: *every* IPLD implementation needs to
+support these new formats, and this is a burden it's preferable to minimize.
 
 ## No Non-Local Reasoning
 
-An IPLD block should never be interpreted in the context of *anything* not
-contained in the block (and CID).
+Transforming content of a Block into the IPLD Data Model should never require
+interpretation in the context of *anything* not contained in the Block plus CID.
 
-For example, assuming we add support for relative links, the following
-definition of `foo` would not be a valid IPLD block:
+Similarly, traversing an IPLD Node according to a Path should not require
+interpretation in the context of anything not already contained in that Node plus Path.
 
-```
+**Motivation:** IPLD needs to be easy to reason about.
+
+**Negative Examples:**
+
+```javascript
+// This is an example of what is NOT possible.
 var foo = {
-  // points outside of the current block, into the parent's "baz" field.
-  "baz": {"/": "../../baz"}
+  "baz": Link("../../zot") // NOT legal: makes a non-local reference.
 }
 var bar = {
   "foo": CidOf(foo),
-  // `/foo/baz` points here.
-  "baz": "something"
+  "zot": "something" // `./foo/baz` imagines pointing here.
 }
 
-// resolution throug block `foo` depends on block `bar`.
+// resolution through block `foo` depends on block `bar`...
 Resolve("/ipld/${CidOf(bar)}/foo/baz/")
+
+// meaning this would be undefined, which is why relative links are NOT allowed:
+Resolve("/ipld/${CidOf(foo)}/baz/")
 ```
 
 For the same reason, IPLD links can't rely on an authority (e.g., a blockchain).
 
-Note: Links like this can still be encoded at the application level but they
-won't be handled by the IPLD resolver (and won't get the special "link" type).
+**Note:** Concepts that seem similar to relative linking can still be encoded
+at the application level.  This is fine, but distinct from "IPLD Links", because
+such linking won't be interpreted by IPLD path and link resolution (e.g. they
+won't get the special "link" type, and won't violate the constraints that the
+IPLD Data Model expresses a DAG, etc).
 
-**Motivation:** IPLD needs to be easy to reason about.
+### Moving beyond local reasoning
 
-**Caveat:**
+The "no non-local reasoning" rule holds at the Data Model layer.
+Some higher-level layers relax the rule.
 
-We *may* want to relax this if we want to move schemas into separate,
-deduplicated blocks (referenced by CID). If we do that, we'd need to fetch a
-block's schema before being able to interpret the it.
+For example, Advanced Data Layouts which split data across multiple blocks
+defacto carry some logical information in mind as they wield their constituent
+blocks (jumping into a HAMT mid-way through its trie with no context is unlikely
+to make any semantic sense, for example -- even though the data can still be
+parsed in terms of the Data Model).
 
-However, we need to *thoroughly* discuss any changes to this requirement.
+Schemas describe constraints around data and are typically applied over
+a whole DAG which may span multiple Blocks, and are themselves usually
+located in another Block (for ease of reference by CID).  Schemas thus also
+can be seen as using some forms of non-local reasoning.
 
-1. The space savings may not be worth it given the size of CIDs (>40 bytes),
-   compression, smart transports, and smart datastores.
-2. This change would introduce some weird interface complexities and potential
-   network dependencies.
+Applications built on top of IPLD can also use their own contextual reasoning,
+as described earlier in the relative linking example.
+
+These are not contradictions of the "no non-local reasoning" rule; it's just
+relaxed for these high-level systems, and the scope of "local" can be
+understood more broadly.
+
+Since we can always interpret block structurally (e.g., parse them at least to
+the Data Model layer) -- even in data that's also meant to be used with
+Advanced Data Layouts or Schemas other application logic that uses contextual
+concepts, etc -- we can still have replication and hashing and DAG traversal
+and all the rest of the important promises of the IPLD Data Model regardless of
+that other context, meaning these systems are purely value-add and do not
+compromise any of the other core promises of IPLD.
 
 ## No Cycles
 
@@ -105,6 +137,34 @@ on top of IPLD.
 **Motivation:** Deterministic computations on top of a IPLD need to produce the
 same result every time.
 
+### Higher Level Pathing
+
+The "stable pathing" rule holds at the Data Model layer.
+Some higher-level layers relax the rule.
+
+For example, Advanced Data Layouts operate by "feigning" an IPLD Node which
+conforms with the Data Model specified behaviors in every way -- except that
+they're internally implemented in some way that maps the Node content onto
+Blocks in a more advanced way than the basic Data Model way.  This means we
+can "path" across an Advanced Data Layout that acts like a map or a list as
+if it's a regular Node.  We still aim for stable pathing: however, at this
+layer, that stability now requires a fixed understanding of the Advanced Layout
+logic itself.
+
+Schemas describe data in terms of both semantic types and a representation
+strategy, and in some cases the semantic type information contains a name
+(such as a struct field name) even while the representation does not (such as
+when a struct uses "tuple" representation, causing it to be transformed into
+a list rather than a map when encoded).  In these cases, we can "path" across
+data interpreted in context of a Schema using the field names, even if at the
+Data Model layer it's been represented as a list (and thus has indexes instead
+of map keys corresponding to the field names).  This kind of pathing can be
+stable and predictable, but (as with the Advanced Data Layouts story), that
+stability now requires more: holding the Schema declaration.
+
+Note that regular, core Data Model still maintains stable pathing even in these
+examples of higher level systems with alternative rules.
+
 ## Link Transparent Pathing
 
 Path resolution must transparently traverse links.
@@ -115,46 +175,4 @@ inline data into large objects (lots of duplication and copying).
 
 ## Primitives
 
-The "recommended" IPLD format (currently DagCBOR) needs to support *at a minimum*:
-
-* 32/64 bit integers without losing information.
-* 32/64 bit floats without losing information.
-* Unicode strings.
-* Binary strings.
-* Objects (with string keys, at least).
-* Arrays.
-* Booleans.
-* A bottom type (null).
-
-**Motivation:** Convenience, really.
-
-## Non-Cyclic, Block-Local Relative Links
-
-That is, relative links that don't traverse out the back of an object. See the
-conclusions from: [#1](https://github.com/ipld/specs/issues/1).
-
-**Motivation:** This is required to efficiently represent a highly connected DAG
-of tiny nodes.
-
-**Caveat:** This brings in some sticky issues around mutability. Depending on
-the implementation, relative links within an object may be act like mutable
-links (from the perspective of the user). The concern here is that we don't want
-users to bundle nodes together into single block *because* they want this
-mutability.
-
-# To Do
-
-Working through this, I realized we have a few things we really need to finish a few things before we can
-call IPLD ready.
-
-* **Path links.** Pointers that can only point to objects at block boundaries
-  are useful but severely gimped. We've been fine up till now because we
-  generally don't *edit* complicated datastructures but this will change.
-  ([#83](https://github.com/ipld/specs/issues/83))
-* **Slice links.** For the same reason, we really should support
-  `/ipld/QmID/start..stop` as a syntax for slicing an array. Most programming
-  languages support this so *not* supporting it would be a bit awkward.
-  ([#84](https://github.com/ipld/specs/issues/84))
-* **Link Spec.** We need to specify a complete and formal link spec and stick
-  with it. 
-* **Relative Links.** [#1](https://github.com/ipld/specs/issues/1).
+See [the IPLD Data Model](/data-model-layer/data-model.md#kinds)