Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Iterate on requirements/cornerstones #148

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 100 additions & 82 deletions REQUIREMENTS.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,24 @@
# IPLD Requirements
# IPLD Cornerstones

This document outlines parts of IPLD that should and should not be changed to
ensure the success of future improvements (especially type systems).
ensure the success of future improvements and the continuity of direction.

**Definitions**

* Block: A block is a chunk of an IPLD DAG, encoded in a format. Blocks have CIDs.
* Fragment: A piece of an IPLD DAG. Blocks contain fragments.
* Node: A node is a *point* in an IPLD DAG (object, array, number, etc.).
* Link: A link is an IPLD Node that points to another IPLD Node.
* Path: A paths a human readable pointer to an IPLD Node.
Many nodes can exist encoded inside one Block.
* Link: A link is a kind of IPLD Node that points to another IPLD Node.
* Path: A path is composed of segments which each specify a step across an IPLD Node.
Copy link

@lanzafame lanzafame Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to grok a step across an IPLD Node. How does one step across a point? Would it be more accurate to say step between IPLD Node(s)?

Copy link

@lanzafame lanzafame Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it just clicked.
A Node is:

(A)

A Link is:

(A) -> (B)
    ^
   Link

A Path of /A/B/C represents:

(A) -> (B) -> (C)
 |_____________^
        ^
(B) gets stepped over 

So maybe step over IPLD Node(s) is a better way of saying this. @warpfork?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, this bit of text struggled. "between" might do better.

I'm not sure about that diagram, because it's not necessary that it be a link in-between; rememeber that a whole tree of Nodes can be in a Block.

Maybe I should also break this down into bullet points for "Path" and "Path Segment", and that might help? E.g. one Path Segment moves from one Node to the next (but only can describe movement of a single stride: e.g. looking up one map key, or stepping into an list by index). A Path is just a collection of Path Segments, which we often use to describe a walk of a specific, er, well, path down a Node tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another shot at this in: #152 -- does that one seems clearer to you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warpfork Much better, made a note on a slight clarification.


## Linked

IPLD must support linking to any IPLD node (even if the node is in the middle of
a block). That is, IPLD must support arbitrary IPLD paths in links.
The IPLD Data Model includes Links. A Link can be resolved to reach another IPLD Node.

**Motivation:** Considering this in the context of programming languages, not
being able to *store* a pointer to a struct *inside* of another struct would be
severely limiting.

NOTE: We don't currently support arbitrary paths but, in the context of
programming, we really need to.
**Motivation:** Linking makes it possible to build data structures which are
theoretically unbounded in size, while still being traversable, consistent,
authenticated and immutable. This unlocks the potential for a host of
decentralized applications and is part of IPLD's fundamental purpose.

## Immutable

Expand All @@ -31,54 +28,89 @@ IPLD but there needs to be an immutable layer at the bottom.
**Motivation:** *Having* an immutable layer is important for a lot of analysis,
memoization, type checking, etc.

## Multicodecs Are Not Types
## Multicodecs Are Not Meant to Act As Types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Probably the most concise text I’ve read so far on “why you don’t need to write a new codec.”

Recently, I’ve run into some areas where I legitimately did need a new block format and codec for an application specific use case. Until I read this I didn’t have great language to distinguish why this case was different (it wasn’t about types, the motivation was compactness and limiting dependencies).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh... Yeah, we could almost make a "## Multicodec nonproliferation treaty" and put this motivation hunk under that. "Multicodecs are not meant to act as types" could just have a "see above" for its motivation.


Multicodecs are used to indicate the format of data in a Block, and thus the
codec which transforms that serial data into a tree of Nodes conforming to the
IPLD Data Model. This is the limit of their purpose.

In particular, multicodecs should not be confused with a
[type system](https://en.wikipedia.org/wiki/Type_system).

It's impossible to understand IPLD data at a *structural* level if we don't know
the format. Therefore, we should avoid introducing new formats unnecessarily as
*every* IPLD implementation needs to support these new formats.
**Motivation:** It's impossible to understand IPLD data at a *structural* level
if we don't know the format. Therefore, multicodecs describe the format, and
we use this information to handle the transformation into the IPLD Data Model.
Beyond this, we don't want to use multicodecs further, because we should avoid
introducing new formats unnecessarily: *every* IPLD implementation needs to
support these new formats, and this is a burden it's preferable to minimize.

## No Non-Local Reasoning

An IPLD block should never be interpreted in the context of *anything* not
contained in the block (and CID).
Transforming content of a Block into the IPLD Data Model should never require
interpretation in the context of *anything* not contained in the Block plus CID.

For example, assuming we add support for relative links, the following
definition of `foo` would not be a valid IPLD block:
Similarly, traversing an IPLD Node according to a Path should not require
interpretation in the context of anything not already contained in that Node plus Path.

```
**Motivation:** IPLD needs to be easy to reason about.

**Negative Examples:**

```javascript
// This is an example of what is NOT possible.
var foo = {
// points outside of the current block, into the parent's "baz" field.
"baz": {"/": "../../baz"}
"baz": Link("../../zot") // NOT legal: makes a non-local reference.
}
var bar = {
"foo": CidOf(foo),
// `/foo/baz` points here.
"baz": "something"
"zot": "something" // `./foo/baz` imagines pointing here.
}

// resolution throug block `foo` depends on block `bar`.
// resolution through block `foo` depends on block `bar`...
Resolve("/ipld/${CidOf(bar)}/foo/baz/")

// meaning this would be undefined, which is why relative links are NOT allowed:
Resolve("/ipld/${CidOf(foo)}/baz/")
```

For the same reason, IPLD links can't rely on an authority (e.g., a blockchain).

Note: Links like this can still be encoded at the application level but they
won't be handled by the IPLD resolver (and won't get the special "link" type).
**Note:** Concepts that seem similar to relative linking can still be encoded
at the application level. This is fine, but distinct from "IPLD Links", because
such linking won't be interpreted by IPLD path and link resolution (e.g. they
won't get the special "link" type, and won't violate the constraints that the
IPLD Data Model expresses a DAG, etc).

**Motivation:** IPLD needs to be easy to reason about.
### Moving beyond local reasoning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea making putting those things into a new section.


**Caveat:**
The "no non-local reasoning" rule holds at the Data Model layer.
Some higher-level layers relax the rule.

We *may* want to relax this if we want to move schemas into separate,
deduplicated blocks (referenced by CID). If we do that, we'd need to fetch a
block's schema before being able to interpret the it.
For example, Advanced Data Layouts which split data across multiple blocks
defacto carry some logical information in mind as they wield their constituent
blocks (jumping into a HAMT mid-way through its trie with no context is unlikely
to make any semantic sense, for example -- even though the data can still be
parsed in terms of the Data Model).

However, we need to *thoroughly* discuss any changes to this requirement.
Schemas describe constraints around data and are typically applied over
a whole DAG which may span multiple Blocks, and are themselves usually
located in another Block (for ease of reference by CID). Schemas thus also
can be seen as using some forms of non-local reasoning.

1. The space savings may not be worth it given the size of CIDs (>40 bytes),
compression, smart transports, and smart datastores.
2. This change would introduce some weird interface complexities and potential
network dependencies.
Applications built on top of IPLD can also use their own contextual reasoning,
as described earlier in the relative linking example.

These are not contradictions of the "no non-local reasoning" rule; it's just
relaxed for these high-level systems, and the scope of "local" can be
understood more broadly.

Since we can always interpret block structurally (e.g., parse them at least to
the Data Model layer) -- even in data that's also meant to be used with
Advanced Data Layouts or Schemas other application logic that uses contextual
concepts, etc -- we can still have replication and hashing and DAG traversal
and all the rest of the important promises of the IPLD Data Model regardless of
that other context, meaning these systems are purely value-add and do not
compromise any of the other core promises of IPLD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible "motivation" could be building higher level data structures (like HAMT) on top of the core IPLD Data Model.


## No Cycles

Expand All @@ -105,6 +137,34 @@ on top of IPLD.
**Motivation:** Deterministic computations on top of a IPLD need to produce the
same result every time.

### Higher Level Pathing

The "stable pathing" rule holds at the Data Model layer.
Some higher-level layers relax the rule.

For example, Advanced Data Layouts operate by "feigning" an IPLD Node which
conforms with the Data Model specified behaviors in every way -- except that
they're internally implemented in some way that maps the Node content onto
Blocks in a more advanced way than the basic Data Model way. This means we
can "path" across an Advanced Data Layout that acts like a map or a list as
if it's a regular Node. We still aim for stable pathing: however, at this
layer, that stability now requires a fixed understanding of the Advanced Layout
logic itself.

Schemas describe data in terms of both semantic types and a representation
strategy, and in some cases the semantic type information contains a name
(such as a struct field name) even while the representation does not (such as
when a struct uses "tuple" representation, causing it to be transformed into
a list rather than a map when encoded). In these cases, we can "path" across
data interpreted in context of a Schema using the field names, even if at the
Data Model layer it's been represented as a list (and thus has indexes instead
of map keys corresponding to the field names). This kind of pathing can be
stable and predictable, but (as with the Advanced Data Layouts story), that
stability now requires more: holding the Schema declaration.

Note that regular, core Data Model still maintains stable pathing even in these
examples of higher level systems with alternative rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original sections all have a "Motivation" which I quite like. Here the motivation could be about different views on the data like IPFS is doing it with UnixFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I should keep that up.


## Link Transparent Pathing

Path resolution must transparently traverse links.
Expand All @@ -115,46 +175,4 @@ inline data into large objects (lots of duplication and copying).

## Primitives

The "recommended" IPLD format (currently DagCBOR) needs to support *at a minimum*:

* 32/64 bit integers without losing information.
* 32/64 bit floats without losing information.
* Unicode strings.
* Binary strings.
* Objects (with string keys, at least).
* Arrays.
* Booleans.
* A bottom type (null).

**Motivation:** Convenience, really.

## Non-Cyclic, Block-Local Relative Links

That is, relative links that don't traverse out the back of an object. See the
conclusions from: [#1](https://github.com/ipld/specs/issues/1).

**Motivation:** This is required to efficiently represent a highly connected DAG
of tiny nodes.

**Caveat:** This brings in some sticky issues around mutability. Depending on
the implementation, relative links within an object may be act like mutable
links (from the perspective of the user). The concern here is that we don't want
users to bundle nodes together into single block *because* they want this
mutability.

# To Do

Working through this, I realized we have a few things we really need to finish a few things before we can
call IPLD ready.

* **Path links.** Pointers that can only point to objects at block boundaries
are useful but severely gimped. We've been fine up till now because we
generally don't *edit* complicated datastructures but this will change.
([#83](https://github.com/ipld/specs/issues/83))
* **Slice links.** For the same reason, we really should support
`/ipld/QmID/start..stop` as a syntax for slicing an array. Most programming
languages support this so *not* supporting it would be a bit awkward.
([#84](https://github.com/ipld/specs/issues/84))
* **Link Spec.** We need to specify a complete and formal link spec and stick
with it.
* **Relative Links.** [#1](https://github.com/ipld/specs/issues/1).
See [the IPLD Data Model](/data-model-layer/data-model.md#kinds)