Skip to content
This repository has been archived by the owner on Jun 2, 2020. It is now read-only.

New IPFS explainer #170

Merged
merged 79 commits into from
Aug 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
358130b
Initial commit
meiqimichelle May 20, 2019
e7dc7d4
Add FPO images for draft new content
meiqimichelle May 21, 2019
0a84f3e
Add v1 intro and DHT explainer
meiqimichelle May 21, 2019
e78c5c9
Another version of the new content, still WIP
meiqimichelle May 21, 2019
6e45808
Finished v1 new content
meiqimichelle May 21, 2019
ee6c025
Merge latest edits to upstream branch 'fix/copyedit-ipfs-intro' into …
meiqimichelle May 21, 2019
df75cd7
A few quick edits to give more instal options in getting started.
meiqimichelle May 21, 2019
9d27faa
Change order of intro sections. Update overview next steps to reflect…
meiqimichelle May 21, 2019
4d6a2f7
Remove extra 'complex'
meiqimichelle May 24, 2019
2947e3f
Removes unecessary 'you could also...'
meiqimichelle May 28, 2019
d9e44fe
Rework explanation of content addressed p2p network
meiqimichelle May 28, 2019
da44b2a
Identity --> identify
meiqimichelle May 28, 2019
606a39f
Rework hash paragraph based on Hector's suggestions
meiqimichelle May 28, 2019
71e281e
Copyedit hash interoperability paragraph
meiqimichelle May 28, 2019
9e50177
Replace all dumb quotes with smart quotes
meiqimichelle May 28, 2019
fb19cf8
Rework IPFS linked data paragraph with some Oli style
meiqimichelle May 28, 2019
76d71d8
Rework and accept edits to first DAG paragraph
meiqimichelle May 28, 2019
2f88691
Combine Hector/Oli edits to DAG sharing
meiqimichelle May 28, 2019
af8ccfe
Rework DHT paragraph
meiqimichelle May 28, 2019
288e2d3
Incorporates edits to multiplexing paragraph
meiqimichelle May 28, 2019
3092c45
Add and edit changes to why multiplexing is useful paragraph
meiqimichelle May 28, 2019
1478b6c
Fix link to Aardvark
meiqimichelle May 28, 2019
8eeffd2
Fix IPFS link to Aardvark
meiqimichelle May 28, 2019
24fd26d
Add parenthetical reference to cray cray hash
meiqimichelle May 28, 2019
edd891d
Pulls two new copyedit commits into local from remote
meiqimichelle May 28, 2019
7d6d867
Fix Aardvark link
meiqimichelle May 28, 2019
829f40e
Fix Aardvark link
meiqimichelle May 28, 2019
86cfc6e
Edit to IPFS IPLD paragraph
meiqimichelle May 28, 2019
34433b6
Makes hash sentence cleaner
meiqimichelle May 28, 2019
decff14
Server-client --> client-server
meiqimichelle May 28, 2019
241d392
Incorpoartes Hector's edits to summary
meiqimichelle May 28, 2019
2a61c70
Incorporate Oli edits to DAG structure
meiqimichelle May 28, 2019
fb3746f
Incorporate Oli's edits to chunking
meiqimichelle May 28, 2019
1fd3067
Incorporate Oli edit to recap re: DAG and IPLD
meiqimichelle May 28, 2019
2325c00
More dumb quotes --> smart quotes.
meiqimichelle May 28, 2019
abf146c
Rework IPLD paragraph to focus more on 'traverse' rather than 'transl…
meiqimichelle May 28, 2019
e71cc55
Incorporate Hector's edits to chunking paragraph
meiqimichelle May 28, 2019
77972dc
Remove references to network stack; no longer helpful
meiqimichelle May 28, 2019
e655dc0
Remove first heading. Improve CID section.
meiqimichelle May 28, 2019
efc537b
Initial improvements to DAG section
meiqimichelle May 28, 2019
aad4a53
Add concept doc on Merkle-DAGs. Words from Merkle-CRDT paper.
meiqimichelle May 28, 2019
12a1b5a
Reworks DAG section.
meiqimichelle May 29, 2019
0b1ebe2
Final copyedit before further review
meiqimichelle May 29, 2019
8041639
Comments out placeholder images. They are distracting.
meiqimichelle May 29, 2019
d716671
Fix link in content/introduction/usage.md
lidel Jun 3, 2019
2aa10f5
Remove commented out images. They're not helpful, and we won't be usi…
meiqimichelle Aug 19, 2019
0f7942a
Changes Merkle-DAG example from comma to website file, as per @olizilla
meiqimichelle Aug 19, 2019
4e758be
A few edits to the bit that says database
meiqimichelle Aug 19, 2019
538337e
IPFS project --> libp2p as per @hsanjuan
meiqimichelle Aug 19, 2019
3756a1a
Adds Merkle to last paragraph
meiqimichelle Aug 19, 2019
659fa2d
Title --> often title
meiqimichelle Aug 19, 2019
b4f174e
Connectivity --> connection, as per @momack2
meiqimichelle Aug 19, 2019
9c414cd
Provides better link to Merkle-DAG paper, as per @lanzafame
meiqimichelle Aug 19, 2019
0074a9f
Remove some of the most informal words and structures. Too many excla…
meiqimichelle Aug 19, 2019
ce5b46a
IPLD link --> translate
meiqimichelle Aug 19, 2019
1482617
Remove efficiency claim from block paragraph
meiqimichelle Aug 19, 2019
9995c72
Add info on bitswap, ht @momack2 and @hsanjuan
meiqimichelle Aug 19, 2019
788b3ca
Simplify first sentence
meiqimichelle Aug 19, 2019
0d426c4
Spelling fix and much more --> more
meiqimichelle Aug 19, 2019
34c5379
Add link to DNSLink concept guide
meiqimichelle Aug 19, 2019
a4baa75
Add sentence about not being able to remove content from current web
meiqimichelle Aug 19, 2019
b217dd6
Edit to finding content sentence to make it sound less like you'll be…
meiqimichelle Aug 19, 2019
e61efaf
Your --> the, and removes more exclamation points
meiqimichelle Aug 19, 2019
b11dc2c
Remove TODO re expanding modularization section for now
meiqimichelle Aug 19, 2019
315b330
Comments out final paragraph for now, because we haven't written thos…
meiqimichelle Aug 19, 2019
b32647d
Merge branch 'master' into feature/new-ipfs-explainer
meiqimichelle Aug 19, 2019
3415430
Quick copyedit of overview.md to soften some statements, as per @hsan…
meiqimichelle Aug 19, 2019
915a3fb
Merge branch 'feature/new-ipfs-explainer' of https://github.com/ipfs/…
meiqimichelle Aug 19, 2019
5234572
Removes fun
meiqimichelle Aug 19, 2019
95a3ab5
Update overview.md
jessicaschilling Aug 30, 2019
3685309
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
544aa0f
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
76add37
Update overview.md
jessicaschilling Aug 30, 2019
a29da06
Update overview.md
jessicaschilling Aug 30, 2019
672126b
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
eb2d6bf
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
16d3b36
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
72b8e8c
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
864a892
Update how-ipfs-works.md
jessicaschilling Aug 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,15 @@ title: IPFS Documentation

Welcome to the IPFS documentation portal! Whether you’re just learning about IPFS or are looking for detailed reference information, this is the place to start. You might have noticed that IPFS is a project with a big scope — and a *lot* of different tools, sites, and code.

Here's an overview of what you'll find in our documentation:
Heres an overview of what youll find in our documentation:

## Introduction

Head over to the [introduction](/introduction) section to learn about the basics of IPFS. There are also instructions on how to install IPFS, and tips on basic IPFS usage.

## Guides

IPFS is a complex system that hopes to change how we use the internet, so it comes with many new concepts. The guides section has an overview of major [concepts](/guides/concepts) in IPFS (including terms and ideas associated with distributed file systems generally), and guides for specific IPFS use cases. The examples section is home to a number of [basic examples](/guides/examples) of ways to use the IPFS ecosystem, including:

* A simple [how-to on pinning](/guides/examples/pinning)
* Instructions for [making your own IPFS service](/guides/examples/api/service/readme)
* A guide to [hosting your website](/guides/examples/websites)
IPFS is a system that hopes to change how we use the internet, so it comes with many new concepts. The guides section has an overview of major [concepts](/guides/concepts) in IPFS (including terms and ideas associated with distributed file systems generally), guides for specific IPFS use cases, and example projects demonstrating various ways to use the IPFS ecosystem.

For detailed guidance on select topics, try out the interactive tutorials at [ProtoSchool](https://proto.school). You can learn about the decentralized web by solving code challenges.

Expand Down
29 changes: 29 additions & 0 deletions content/guides/concepts/merkle-DAG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: "Merkle-DAGs"
menu:
guides:
parent: concepts
---

A _Direct Acyclic Graph_ (DAG)is a type of graph in which edges have direction and cycles are not allowed. For example, a linked list like _A→B→C_ is an instance of a DAG where _A_ references _B_ and so on. We say that _B_ is _a child_ or _a descendant of A_, and that _node A has a link to B_. Conversely _A_ is a _parent of B_. We call nodes that are not children to any other node in the DAG _root nodes_.

A Merkle-DAG is a DAG where each node has an identifier and this is the result of hashing the node’s contents — any opaque payload carried by the node and the list of identifiers of its children — using a cryptographic hash function like SHA256. This brings some important considerations:

1. Merkle-DAGs can only be constructed from the leaves, that is, from nodes without children. Parents are added after children because the children’s identifiers must be computed in advance to be able to link them.
1. every node in a Merkle-DAG is the root of a (sub)Merkle-DAG itself, and this subgraph is _contained_ in the parent DAG[9].
1. Merkle-DAG nodes are _immutable_. Any change in a node would alter its identifier and thus affect all the ascendants in the DAG, essentially creating a different DAG. Take a look at [this helpful illustration using bananas](https://media.consensys.net/ever-wonder-how-merkle-trees-work-c2f8b7100ed3) from our friends at Consensys.

Identifying a data object (like a Merkle-DAG node) by the value of its hash is referred to as _content addressing_. Thus, we name the node identifier as _Content Identifier_ or CID.

For example, the previous linked list, assuming that the payload of eachnode is just the CID of its descendant would be: _A=Hash(B)→B=Hash(C)→C=Hash(∅)_. The properties of the hash function ensure thatno cycles can exist when creating Merkle-DAGs[10].

Merkle-DAGs are _self-verified_ structures. The CID of a node is univocally linked to the contents of its payload and those of all its descendants. Thus two nodes with the same CID univocally represent exactly the same DAG. This will be a key property to efficiently sync Merkle-CRDTs without having to copy the full DAG, as exploited by systems like IPFS. Merkle-DAGs are very widely used. Source control systems like Git [11] and others [6] use them to efficiently store the repository history, in away that enables de-duplicating the objects and detecting conflicts between branches.

_Excerpted from Markle-CRDT draft paper by @hsanjuan, @haadcode, and @pgte. Available: https://hector.link/presentations/merkle-crdts/merkle-crdts.pdf_


### Footnotes

[6] Merkle-DAGs are similar to Merkle Trees [20] but there are no balance requirements and every node can carry a payload. In DAGs, several branches can re-converge or, in other words, a node can have several parents.

[10] Hash functions are one way functions. Creating a cycle should then be impossibly difficult, unless some weakness is discovered and exploited.
Binary file added content/introduction/assets/ipfs_stack-apps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/introduction/assets/ipfs_stack-data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/introduction/assets/ipfs_stack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions content/introduction/how-ipfs-works.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: How IPFS Works
weight: 2
---

IPFS is a peer-to-peer (p2p) storage network. Content is accessible through peers that might relay information or store it (or do both), and those peers can be located anywhere in the world. IPFS knows how to find what you ask for by its content address, rather than where it is.

## There are three important things to understand about IPFS

Let’s first look at _content addressing_ and how that content is _linked together_. This “middle” part of the IPFS stack is what connects the ecosystem together; everything is built on being able to find content via linked, unique identifiers.

### 1 \\ Content addressing and linked data

IPFS uses _content addressing_ to identify content by what’s in it, rather than by where it’s located. Looking for an item by content is something you already do all the time. For example, when you look for a book in the library, you often ask for it by the title; that’s content addressing because you’re asking for **what** it is. If you were using location addressing to find that book, you’d ask for it by **where** it is: “I want the book that’s on the second floor, first stack, third shelf from the bottom, four books from the left.” If someone moved that book, you’d be out of luck!

It’s the same on the internet and on your computer. Right now, content is found by location, such as…

- `https://en.wikipedia.org/wiki/Aardvark`
- `/Users/Alice/Documents/term_paper.doc`
- `C:\Users\Joe\My Documents\project_sprint_presentation.ppt`

By contast, every piece of content that uses the IPFS protocol has a [*content identifier*]({{<relref "guides/concepts/cid.md">}}), or CID, that is its *hash*. The hash is unique to the content that it came from, even though it may look short compared to the original content. _If hashes are new to you, check out [the concept guide on hashes]({{<relref "guides/concepts/hashes.md">}}) for an introduction._

Content addressing through hashes has become a widely-used means of connecting data in distributed systems, from the commits that back your code to the blockchains that run cryptocurrencies. However, the underlying data structures in these systems are not necessarily interoperable.

This is where the [IPLD project](https://ipld.io/) comes in. **Hashes identify content, and IPLD translates between data structures**. Since different distributed systems structure their data in different ways, IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across many linked nodes (allowing you explore data regardless of the underlying protocol). IPLD provides a way to translate between content-addressable data structures: “Oh you use Git-style, no worries, I can follow those links. Oh you use Ethereum, I got you, I can follow those links too!”

The IPFS protocol uses IPLD to get from raw content to an IPFS address. IPFS has its own preferences and conventions about how data should be broken up into a DAG (more on DAGs below!); IPLD links content on the IPFS network together using those conventions.

**Everything else in the IPFS ecosystem builds on top of this core concept: linked, addressable content is the fundamental connecting element that makes the rest work.**

### 2 \\ IPFS turns files into DAGs

IPFS and many other distributed systems take advantage of a data structure called [directed acyclic graphs](https://en.wikipedia.org/wiki/Directed_acyclic_graph), or DAGs. Specifically, they use _Merkle-DAGs_, which are DAGs where each node has an identifier that is a hash of the node’s contents. Sound familiar? This refers back to the _CID_ concept that we covered in the previous section. Another way to look the this CID-linked-data concept: identifying a data object (like a Merkle-DAG node) by the value of its hash is _content addressing_. _(Check out [the concept guide on Merkle-DAGs]({{<relref "guides/concepts/merkle-DAG.md">}}) for a more in-depth treatment of this topic.)_

IPFS uses a Merkle-DAG that is optimized for representing directories and files, but you can structure a Merkle-DAG in many different ways. For example, Git uses a Merkle-DAG that has many versions of your repo inside of it.

To build a Merkle-DAG representation of your content, IPFS often first splits it into _blocks_. Splitting it into blocks means that different parts of the file can come from different sources, and be authenticated quickly. (If you've ever used BitTorrent, you may have noticed that when you download a file, BitTorrent can fetch it from multiple peers at once; this is the same idea.)

Merkle-DAGs are a bit of a [“turtles all the way down”](https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Turtles_all_the_way_down.html) scenario; that is, **everything** has a CID. You’ve got a file that has a CID. What if there are several files in a folder? That folder has a CID, and that CID contains the CIDs of the files underneath. In turn, those files are made up of blocks, and each of those blocks has a CID. You can see how a file system on your computer could be represented as a DAG. You can also see, hopefully, how Merkle-DAG graphs start to form. For a visual exploration of this concept, take a look at our [IPLD Explorer](https://explore.ipld.io/#/explore/QmSnuWmxptJZdLJpKRarxBMS2Ju2oANVrgbr2xWbie9b2D).

Another useful feature of Merkle-DAGs and breaking content into blocks is that if you have two similar files, they can share parts of the Merkle-DAG; ie, parts of different Merkle-DAGs can reference the same data. For example, if you update a website, only the files that changed will get new content addresses. Your old version and your new version can refer to the same blocks for everything else. This can make transferring versions of large datasets (such as genomics research or weather data) more efficient because you only need to transfer the parts that are new or have changed instead of creating entirely new files each time.


### 3 \\ The DHT

So, to recap, IPFS lets you give CIDs to content, and link that content together in a Merkle-DAG using IPLD. Now let’s move on to the last piece: how you find and move content.

To find which peers are hosting the content you’re after (_discovery_), IPFS uses a [_distributed hash table_](https://en.wikipedia.org/wiki/Distributed_hash_table), or DHT. A hash table is a database of keys to values. A _distributed_ hash table is one where the table is split across all the peers in a distributed network. To find content, you ask these peers.

The <a hrefm src="https://libp2p.io/">libp2p project</a> is the part of the IPFS ecosystem that provides the DHT and handles peers connecting and talking to each other. (Note that, as with IPLD, libp2p can also be used as a tool for other distributed systems, not just IPFS.)

Once you know where your content is (ie, which peer or peers are storing each of the blocks that make up the content you’re after), you use the DHT **again** to find the current location of those peers (_routing_). So, in order to get to content, you use libp2p to query the DHT twice.

You’ve discovered your content, and you’ve found the current location(s) of that content — now you need to connect to that content and get it (_exchange_). To request blocks from and send blocks to other peers, IPFS currently uses a module called [_Bitswap_](https://github.com/ipfs/specs/tree/master/bitswap). Bitswap allows you to connect to the peer or peers that have the content you want, send them your _wantlist_ (a list of all the blocks you're interested in), and have them send you the blocks you requested. Once those blocks arrive, you can verify them by hashing their content to get CIDs. (These CIDs also allow you to deduplicate blocks if needed.)

There are [other content replication protocols under discussion](https://github.com/ipfs/camp/blob/master/DEEP_DIVES/24-replication-protocol.md) as well, the most developed of which is [_Graphsync_](https://github.com/ipld/specs/blob/master/block-layer/graphsync/graphsync.md). There's also a proposal under discussion to [extend the Bitswap protocol](https://github.com/ipfs/go-bitswap/issues/186) to add functionality around requests and responses.

#### A note on libp2p

What makes libp2p especially useful for peer to peer connections is _connection multiplexing_. Traditionally, every service in a system would open a different connection to remotely communicate with other services of the same kind. Using IPFS, you open just one connection, and you multiplex everything on that. For everything your peers need to talk to each other about, you send a little bit of each thing, and the other end knows how to sort those chunks where they belong.

This is useful because establishing connections is usually hard to set up and expensive to maintain. With multiplexing, once you have that connection, you can do whatever you need on it.


## And everything is modular

As you may have noticed from this discussion, the IPFS ecosystem is made up of many modular libraries that support specific parts of any distributed system. You can certainly use any part of the stack independently, or combine them in novel ways.


## Summary

The IPFS ecosystem gives CIDs to content, and links that content together by generating IPLD-Merkle-DAGs. You can discover content using a DHT that's provided by libp2p, and open a connection to any provider of that content and download it using a multiplexed connection. All of this is held together by the “middle” of the stack, which is linked, unique identifiers; that's the essential part that the IPFS is built on.

<!--Next, we’ll look at how IPFS is an interconnected network of equal peers, each with the same abilities (no client-server relationships), and what that means for system architectures. We’ll also touch on another useful project in the ecosystem -- IPFS Cluster -- that can help make sure your content is always available, even on a network like IPFS that supports peers dropping in and out at will.-->
Loading