Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
Iterate on the requirements doc.
Browse files Browse the repository at this point in the history
We definitely need these concise definitions of goals (and nongoals;
if anything, I think we might do well to have more of those!), but we
have indeed refined a few things in the last 8 months or so.

This is mostly addressing the huge batch of comments I left in the
vicinity of
#146 (review) .
Particularly tricky is that some of our most interesting and useful
ideas over the past few months *relax* some of the rules -- and so
we need to carefully come up with a way to describe that: it is
mostly recontextualizations consistent with the same underlying
philosophies... but boy is that hard to express in text sometimes!
  • Loading branch information
warpfork committed Jul 30, 2019
1 parent d859302 commit bea80f7
Showing 1 changed file with 100 additions and 82 deletions.
182 changes: 100 additions & 82 deletions REQUIREMENTS.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,24 @@
# IPLD Requirements
# IPLD Cornerstones

This document outlines parts of IPLD that should and should not be changed to
ensure the success of future improvements (especially type systems).
ensure the success of future improvements and the continuity of direction.

**Definitions**

* Block: A block is a chunk of an IPLD DAG, encoded in a format. Blocks have CIDs.
* Fragment: A piece of an IPLD DAG. Blocks contain fragments.
* Node: A node is a *point* in an IPLD DAG (object, array, number, etc.).
* Link: A link is an IPLD Node that points to another IPLD Node.
* Path: A paths a human readable pointer to an IPLD Node.
Many nodes can exist encoded inside one Block.
* Link: A link is a kind of IPLD Node that points to another IPLD Node.
* Path: A path is composed of segments which each specify a step across an IPLD Node.

## Linked

IPLD must support linking to any IPLD node (even if the node is in the middle of
a block). That is, IPLD must support arbitrary IPLD paths in links.
The IPLD Data Model includes Links. A Link can be resolved to reach another IPLD Node.

**Motivation:** Considering this in the context of programming languages, not
being able to *store* a pointer to a struct *inside* of another struct would be
severely limiting.

NOTE: We don't currently support arbitrary paths but, in the context of
programming, we really need to.
**Motivation:** Linking makes it possible to build data structures which are
theoretically unbounded in size, while still being traversable, consistent,
authenticated and immutable. This unlocks the potential for a host of
decentralized applications and is part of IPLD's fundamental purpose.

## Immutable

Expand All @@ -31,54 +28,89 @@ IPLD but there needs to be an immutable layer at the bottom.
**Motivation:** *Having* an immutable layer is important for a lot of analysis,
memoization, type checking, etc.

## Multicodecs Are Not Types
## Multicodecs Are Not Meant to Act As Types

Multicodecs are used to indicate the format of data in a Block, and thus the
codec which transforms that serial data into a tree of Nodes conforming to the
IPLD Data Model. This is the limit of their purpose.

In particular, multicodecs should not be confused with a
[type system](https://en.wikipedia.org/wiki/Type_system).

It's impossible to understand IPLD data at a *structural* level if we don't know
the format. Therefore, we should avoid introducing new formats unnecessarily as
*every* IPLD implementation needs to support these new formats.
**Motivation:** It's impossible to understand IPLD data at a *structural* level
if we don't know the format. Therefore, multicodecs describe the format, and
we use this information to handle the transformation into the IPLD Data Model.
Beyond this, we don't want to use multicodecs further, because we should avoid
introducing new formats unnecessarily: *every* IPLD implementation needs to
support these new formats, and this is a burden it's preferable to minimize.

## No Non-Local Reasoning

An IPLD block should never be interpreted in the context of *anything* not
contained in the block (and CID).
Transforming content of a Block into the IPLD Data Model should never require
interpretation in the context of *anything* not contained in the Block plus CID.

For example, assuming we add support for relative links, the following
definition of `foo` would not be a valid IPLD block:
Similarly, traversing an IPLD Node according to a Path should not require
interpretation in the context of anything not already contained in that Node plus Path.

```
**Motivation:** IPLD needs to be easy to reason about.

**Negative Examples:**

```javascript
// This is an example of what is NOT possible.
var foo = {
// points outside of the current block, into the parent's "baz" field.
"baz": {"/": "../../baz"}
"baz": Link("../../zot") // NOT legal: makes a non-local reference.
}
var bar = {
"foo": CidOf(foo),
// `/foo/baz` points here.
"baz": "something"
"zot": "something" // `./foo/baz` imagines pointing here.
}

// resolution throug block `foo` depends on block `bar`.
// resolution through block `foo` depends on block `bar`...
Resolve("/ipld/${CidOf(bar)}/foo/baz/")

// meaning this would be undefined, which is why relative links are NOT allowed:
Resolve("/ipld/${CidOf(foo)}/baz/")
```

For the same reason, IPLD links can't rely on an authority (e.g., a blockchain).

Note: Links like this can still be encoded at the application level but they
won't be handled by the IPLD resolver (and won't get the special "link" type).
**Note:** Concepts that seem similar to relative linking can still be encoded
at the application level. This is fine, but distinct from "IPLD Links", because
such linking won't be interpreted by IPLD path and link resolution (e.g. they
won't get the special "link" type, and won't violate the constraints that the
IPLD Data Model expresses a DAG, etc).

**Motivation:** IPLD needs to be easy to reason about.
### Moving beyond local reasoning

**Caveat:**
The "no non-local reasoning" rule holds at the Data Model layer.
Some higher-level layers relax the rule.

We *may* want to relax this if we want to move schemas into separate,
deduplicated blocks (referenced by CID). If we do that, we'd need to fetch a
block's schema before being able to interpret the it.
For example, Advanced Data Layouts which split data across multiple blocks
defacto carry some logical information in mind as they wield their constituent
blocks (jumping into a HAMT mid-way through its trie with no context is unlikely
to make any semantic sense, for example -- even though the data can still be
parsed in terms of the Data Model).

However, we need to *thoroughly* discuss any changes to this requirement.
Schemas describe constraints around data and are typically applied over
a whole DAG which may span multiple Blocks, and are themselves usually
located in another Block (for ease of reference by CID). Schemas thus also
can be seen as using some forms of non-local reasoning.

1. The space savings may not be worth it given the size of CIDs (>40 bytes),
compression, smart transports, and smart datastores.
2. This change would introduce some weird interface complexities and potential
network dependencies.
Applications built on top of IPLD can also use their own contextual reasoning,
as described earlier in the relative linking example.

These are not contradictions of the "no non-local reasoning" rule; it's just
relaxed for these high-level systems, and the scope of "local" can be
understood more broadly.

Since we can always interpret block structurally (e.g., parse them at least to
the Data Model layer) -- even in data that's also meant to be used with
Advanced Data Layouts or Schemas other application logic that uses contextual
concepts, etc -- we can still have replication and hashing and DAG traversal
and all the rest of the important promises of the IPLD Data Model regardless of
that other context, meaning these systems are purely value-add and do not
compromise any of the other core promises of IPLD.

## No Cycles

Expand All @@ -105,6 +137,34 @@ on top of IPLD.
**Motivation:** Deterministic computations on top of a IPLD need to produce the
same result every time.

### Higher Level Pathing

The "stable pathing" rule holds at the Data Model layer.
Some higher-level layers relax the rule.

For example, Advanced Data Layouts operate by "feigning" an IPLD Node which
conforms with the Data Model specified behaviors in every way -- except that
they're internally implemented in some way that maps the Node content onto
Blocks in a more advanced way than the basic Data Model way. This means we
can "path" across an Advanced Data Layout that acts like a map or a list as
if it's a regular Node. We still aim for stable pathing: however, at this
layer, that stability now requires a fixed understanding of the Advanced Layout
logic itself.

Schemas describe data in terms of both semantic types and a representation
strategy, and in some cases the semantic type information contains a name
(such as a struct field name) even while the representation does not (such as
when a struct uses "tuple" representation, causing it to be transformed into
a list rather than a map when encoded). In these cases, we can "path" across
data interpreted in context of a Schema using the field names, even if at the
Data Model layer it's been represented as a list (and thus has indexes instead
of map keys corresponding to the field names). This kind of pathing can be
stable and predictable, but (as with the Advanced Data Layouts story), that
stability now requires more: holding the Schema declaration.

Note that regular, core Data Model still maintains stable pathing even in these
examples of higher level systems with alternative rules.

## Link Transparent Pathing

Path resolution must transparently traverse links.
Expand All @@ -115,46 +175,4 @@ inline data into large objects (lots of duplication and copying).

## Primitives

The "recommended" IPLD format (currently DagCBOR) needs to support *at a minimum*:

* 32/64 bit integers without losing information.
* 32/64 bit floats without losing information.
* Unicode strings.
* Binary strings.
* Objects (with string keys, at least).
* Arrays.
* Booleans.
* A bottom type (null).

**Motivation:** Convenience, really.

## Non-Cyclic, Block-Local Relative Links

That is, relative links that don't traverse out the back of an object. See the
conclusions from: [#1](https://github.com/ipld/specs/issues/1).

**Motivation:** This is required to efficiently represent a highly connected DAG
of tiny nodes.

**Caveat:** This brings in some sticky issues around mutability. Depending on
the implementation, relative links within an object may be act like mutable
links (from the perspective of the user). The concern here is that we don't want
users to bundle nodes together into single block *because* they want this
mutability.

# To Do

Working through this, I realized we have a few things we really need to finish a few things before we can
call IPLD ready.

* **Path links.** Pointers that can only point to objects at block boundaries
are useful but severely gimped. We've been fine up till now because we
generally don't *edit* complicated datastructures but this will change.
([#83](https://github.com/ipld/specs/issues/83))
* **Slice links.** For the same reason, we really should support
`/ipld/QmID/start..stop` as a syntax for slicing an array. Most programming
languages support this so *not* supporting it would be a bit awkward.
([#84](https://github.com/ipld/specs/issues/84))
* **Link Spec.** We need to specify a complete and formal link spec and stick
with it.
* **Relative Links.** [#1](https://github.com/ipld/specs/issues/1).
See [the IPLD Data Model](/data-model-layer/data-model.md#kinds)

0 comments on commit bea80f7

Please sign in to comment.