New Tutorial: CAR Files (IPLD / Filecoin) #412

terichadbourne · 2020-03-27T14:44:23Z

In brainstorming ProtoSchool content that would be useful for the Filecoin project, @mishmosh suggested a multiple-choice tutorial on CAR Files, more or less an IPLD version of zip files, optimized for flattening. They're important in this context because Filecoin only stores flat files, not objects. CAR files have a header with all the CIDS they contain and make it easier to search for a subset of data you want. The spec lives in IPLD, and both Filecoin implementations use it.

Topics for a tutorial might include:

what a CAR file is
how it's constructed from hashing
how it's comprised
what's contained in the CAR header
how you search more easily with IPLD selectors (search criteria)

My understanding is that @rvagg @whyrusleeping @Stebalien might be among the experts on this topic, so I'd love feedback on whether this would be a useful topic, common misconceptions we should try to address, etc. Thanks!

terichadbourne · 2020-03-27T21:10:52Z

@mikeal you just mentioned this in another channel:

both go-ipfs and js-ipfs have open PR’s for an import/export that works with CAR files
probably want to center the tutorial around that

ipfs/js-ipfs#2953
Will these tools be used by Filecoin folks or just IPFS folks?

We could certainly present this as an IPLD tutorial and reference it from multiple other projects, just asking because the original suggestion came from the Filecoin team.

rvagg · 2020-03-30T04:59:54Z

a header with all the CIDS they contain and make it easier to search for a subset of data you want

sadly, no. The header only contains a "roots" array that specify the tip of DAGs that the format may contain.

Resources:

Spec https://github.com/ipld/specs/blob/master/block-layer/content-addressable-archives.md
Go: https://github.com/ipld/go-car
JS: https://github.com/ipld/js-datastore-car
go-ipfs import/export (that's driving how the feature will work for js-ipfs too): https://github.com/ipfs/go-ipfs/pulls/ribasushi
js-ipfs (as above): (WIP) feat: dag import and export to and from CAR files ipfs/js-ipfs#2953

Will these tools be used by Filecoin folks or just IPFS folks?

That's a good question that we may have to wait and have an answer for. I think once people have the ability to export whole DAGs from IPFS then a bunch of interesting use-cases open up, well beyond Filecoin. e.g. in that js-ipfs PR I showed exporting the XKCD archive into a single file, that kind of fully-offline archiving capability has some interesting uses. Maybe it should become part of the normal workflow for people publishing websites to IPFS that they also take a CAR backup to stick into their cold storage rather than just hoping that their IPFS daemon will keep it properly pinned? We've also looked at the possibility of using this format as the basis of a generic block database store. The lack of inbuilt indexing makes that a bit tricky but building an external index of block positions and then reading those blocks directly out is something we can already do in Go and JS.

mikeal · 2020-04-02T19:36:14Z

Will these tools be used by Filecoin folks or just IPFS folks?

A few months ago I’d say “Filecoin” but just in the last month we’ve seen the format gain traction across projects. I think it’s going to be a widely used format across our stack. It provides a very simple way to transfer data that should be useful even between IPFS instances.

mishmosh · 2020-04-03T17:17:19Z

+1 on presenting this as a general lesson. Learning about CARs will be helpful to Filecoin users either way, but I'm not stuck on labelling it as such.

The course outline originally proposed by @terichadbourne is really nice:

what a CAR file is
how it's constructed from hashing
how it's comprised
what's contained in the CAR header
how you search more easily with IPLD selectors (search criteria)

A few more suggestions:

At the beginning, why you might want a CAR file ("to serialize, aka flatten, graph-based data"?)
At the end, what systems currently use CAR files

terichadbourne · 2020-07-16T21:19:49Z

@terichadbourne @mikeal @ribasushi @rvagg @mishmosh met on Jun 30 to discuss the most appropriate content angle and staffing plan for this tutorial (full notes here).

Since we spoke, @mishmosh @har00ga @terichadbourne had a follow-up on staffing resources.

Key takeaways:

Primary focus for tutorial
How Merkle trees and DAGs make your data portable across different systems or networks
(CAR files aren't the key focus of the tutorial, just a means described for exporting from Filecoin to IPFS)

Format
Multiple-choice

Target audience
A person using IPFS for something and wondering how to get data into and out of Filecoin

Stuff to cover

What's are Merkle trees and DAGs and why should we care? Move the end part of Decentralized Data Structures into this new tutorial (https://proto.school/#/data-structures/05) adding context as needed.
Present CAR files as a format for transfer of DAGs between technologies (for example, lotus and go-ipfs)
The most approachable path to getting data into Filecoin / Lotus doesn't involve thinking about CAR files. Thanks to recent improvements from Textile you can just start with an IPFS CID. (Don't hand the same file to lotus and have them recheck it because you'll mess up the CID (with different chunk sizes??)
You care more about CAR files on the other end when retrieving data from Filecoin. Would be useful to teach how to take data out of Filecoin and put it in an IPFS node.
Everything in IPFS is a DAG, and a CAR file holds a DAG instead of a directory. When you care about DAG you might want to use a CAR file instead of a directory.
When you give Lotus a file to do a deal for, it turns into UnixFS file and then makes the root file. There's always on file at the top of the DAG. This works well for a single file: doesn't need selectors, don't need to assemble CAR file, it'll spit out the original file when you retrieve it.
When you store a directory in Filecoin (give it more than one file or give it a CID for a graph in the IPFS network) you have to export it into a CAR file when retrieving. You can't put it into a regular file because the root of the DAG isn't the file. The root CID doesn't correspond to a single file.
Anyone exporting from standard Filecoin Lotus to IPFS without an intermediary will want to use a flag that makes it export to a CAR file to directly import into IPFS.
Include some code samples without using coding challenges (see this example for a similar multiple choice lesson)

Stuff not to cover

The original suggestion for teaching selectors won't work at present. We don't yet expose a way to do selectors in the retrieval market - won't exist before launch. (More in extended notes.)
Don't need to specifically talk about things like OrbitDB
Power of two (why the size numbers don’t match up) - covered in docs.filecoin.io - overhead can put you over limit. This isn't the right place to surface as not essential to newcomers.

Mystery notes (Teri doesn't know where they belong)

Lotus is very tied to UnixFS now (don't all agree) - there are some flows that work well that way. Textile is doing non-UnixFS stuff.

Timeline

@mikeal would like to see this published before mainnet
@ribasushi can apply time while he's blocked on other projects

Next steps

All: Review notes above and let us know what doesn't look accurate to you.
@har00ga: Create a proposed outline of how the necessary content could be chunked out into brief lessons
@ribasushi @terichadbourne @har00ga (and others interested) Pick Peter's brain on a sync call about appropriateness of proposed outline, revise together as needed
@har00ga: Create lesson files and tutorial data using the ProtoWizard CLI, including preliminary lesson titles, content pulled from Decentralized Data Structures, etc. (@terichadbourne available to help with questions on ProtoSchool's structure)
@ribasushi Update the markdown files (lesson text) and JS files (quiz Q&A) with first draft of content
@har00ga 1st round review, re-chunk content as needed, make things beginner friendly, copy edit, etc. & confirm edits make sense to @ribasushi
@terichadbourne & others interested - 2nd round review & edits

Please let me know if these notes and proposed staffing approach sound appropriate!

har00ga · 2020-07-23T07:22:54Z

Here's what I have for basic summaries for 2 separate tutorials. Feel free to chop / slice / add whatever you'd like, also need to discuss a few things about selectors during our meeting tomorrow. Hopefully this suffices

Storage basics on the Filecoin Network

Merkle Trees and DAGs: What are they?

Introductory information on both concepts, how Merkle trees and DAGs make your data portable across different systems or networks
touch on how merkle DAGs differ

DAGs, CARs and directories: What are the differences?

Conceptual overview of CAR files + the difference between standalone directories and a flat file, and why one is preferable other the other in the case of storage / retrieval on the FIL network.
Explain how CAR files 'hold' DAGs, Everything in IPFS is a DAG, and a CAR file holds a DAG instead of a directory + why its preferrable

Introducing selectors
[need further discussion on this one, will talk in meeting]

Storing and retrieving on the Filecoin network

Storing CAR files

Clarify that the simplest way of getting data into Filecoin is via IPFS CID, note that CARs arent something that need to be considered until retrieval if uploading via CID. Useful for miners being shipped drives. Mention Textile
Explain UnixFS files and then makes the root file. There's always on file at the top of the DAG. This works well for a single file: doesn't need selectors, don't need to assemble CAR file, it'll spit out the original file when you retrieve it.

Retrieving CAR files

Explain directories stored on FIL network must be exported into a CAR file when retrieving. Van't put it into a regular file because the root of the DAG isn't the file. The root CID doesn't correspond to a single file.
howto: retrieve

Submitting data to Filecoin via IPFS
Retrieving data from Filecoin via IPFS

mishmosh · 2020-07-23T16:24:17Z

how Merkle trees and DAGs make your data portable across different systems or networks

Yes! This will be an important concept to feature.

Storing and retrieving on the Filecoin network

We should clarify (both in the eventual title of this section, and in its contents) that this tutorial is not intended to cover storing & retrieving data for general purposes, but specifically with flexible, selector-based subset retrieval in mind. This might affect how we think about 3 & 4.
+1 to the contents in 1 & 2.

terichadbourne · 2020-07-23T16:31:11Z

Thanks so much for pulling this together @har00ga!

The ordering of concepts here (other than selectors, see below) feels appropriate to me, just have a couple of initial reactions on structure and framing that I look forward to chatting about later today:

Based on the discussion I had originally with @mishmosh @mikeal @ribasushi @rvagg, I was envisioning this as a single tutorial framed around how Merkle trees and DAGs make your data portable across different systems or networks. If we did it this way, CAR files wouldn't be the key focus of the tutorial, just a means described for exporting from Filecoin to IPFS, which would in itself be presented as just one example of why the DAG format is useful. I think this is mostly a matter of framing (tutorial and lesson titles, project focus) and structure (1 versus 2 tutorials), not what content is appropriate to include.
Your proposed titles feel more Filecoin-focused than I was expecting, as we had talked about potentially labelling this as an IPLD tutorials and including it in multiple courses including the Filecoin one. I'm definitely flexible on this, just reflecting what I thought I'd heard previously.
If I interpreted @mikeal correctly in that convo, we don't want to get into selectors because they're not accessible to most folks at present (See more in the "stuff not to cover" section above for why.) It looks like @mishmosh may be saying the opposite of what I am above, however.

ribasushi · 2020-07-23T16:37:27Z

that this tutorial is not intended to cover storing & retrieving data for general purposes, but specifically with flexible, selector-based subset retrieval in mind.

👎

we don't want to get into selectors because they're not accessible to most folks at present

👍

Selectors, while cool, have a rather limited utility for end-users within the Fil story. We'll cover this in more depth during the meet.

har00ga · 2020-07-23T16:53:16Z

Noted on 1 & 2 @terichadbourne, happy to make changes to reflect both those after the meeting 🙂 That was my question I had regarding selectors RE: 3, was mentioned conceptually a bunch of times in the notes but was also thrown in that it should be held off for now. Shall remove that part 👍

terichadbourne · 2020-07-24T20:10:11Z

@har00ga @mishmosh @ribasushi and I met yesterday to work further on the outline for what's now 2 courses, the first of which @har00ga and @ribasushi will work together to finalize an outline for in our notes doc.

@mishmosh Based on what we discussed yesterday, do you envision the already-proposed "How Filecoin enhances IPFS" tutorial that you were planning to outline being different from the one we discussed re moving files between IPFS and Filecoin (2nd piece of what's been discussed in this issue)? Just figuring out whether we need a third issue to capture all the plans.

terichadbourne · 2020-10-29T20:35:03Z

On Monday I met with @mitchwagner, who'll be drafting this tutorial with SME support from @ribasushi. 🎉

Based on further discussion with Mosh, it does look like the 2nd piece of what's been described in this issue could be the "How Filecoin enhances IPFS" tutorial (#413). Mitch will be starting to refine an outline for that one while drafting this one.

terichadbourne · 2021-01-15T20:15:59Z

We published our tutorial on Merkle DAGs yesterday. I'm going to close out this issue although it has a lot of comments that we'll need while planning a follow-up tutorial on how to store and retrieve files on Filecoin via IPFS. For continued discussion on that one, please see #413.

terichadbourne added new-tutorial Proposal for a new tutorial content:Filecoin content:IPLD labels Mar 27, 2020

terichadbourne mentioned this issue Jun 26, 2020

Visual / Diagram for Storage Prep Phase (Verifying Storage on Filecoin) #458

Open

terichadbourne assigned ribasushi and har00ga Jul 23, 2020

terichadbourne added OKR-2020-Q3 Q3 2020 OKR P1 - High labels Sep 3, 2020

This was referenced Sep 18, 2020

Explain what's so special about DAGs in Basics - Lesson 1 #186

Open

Separate Decentalized Data Structures into 2 separate tutorials (general and DAG-focused) #185

Closed

terichadbourne mentioned this issue Oct 29, 2020

New Tutorial: Storing and Retrieving Data on Filecoin via IPFS #413

Open

terichadbourne mentioned this issue Jan 9, 2021

Expanded Merkle DAG Tutorial #582

Merged

terichadbourne closed this as completed Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Tutorial: CAR Files (IPLD / Filecoin) #412

New Tutorial: CAR Files (IPLD / Filecoin) #412

terichadbourne commented Mar 27, 2020 •

edited

Loading

terichadbourne commented Mar 27, 2020 •

edited

Loading

rvagg commented Mar 30, 2020

mikeal commented Apr 2, 2020

mishmosh commented Apr 3, 2020

terichadbourne commented Jul 16, 2020

har00ga commented Jul 23, 2020

mishmosh commented Jul 23, 2020 •

edited

Loading

terichadbourne commented Jul 23, 2020

ribasushi commented Jul 23, 2020

har00ga commented Jul 23, 2020

terichadbourne commented Jul 24, 2020

terichadbourne commented Oct 29, 2020 •

edited

Loading

terichadbourne commented Jan 15, 2021

New Tutorial: CAR Files (IPLD / Filecoin) #412

New Tutorial: CAR Files (IPLD / Filecoin) #412

Comments

terichadbourne commented Mar 27, 2020 • edited Loading

terichadbourne commented Mar 27, 2020 • edited Loading

rvagg commented Mar 30, 2020

mikeal commented Apr 2, 2020

mishmosh commented Apr 3, 2020

terichadbourne commented Jul 16, 2020

har00ga commented Jul 23, 2020

Storage basics on the Filecoin Network

Storing and retrieving on the Filecoin network

mishmosh commented Jul 23, 2020 • edited Loading

terichadbourne commented Jul 23, 2020

ribasushi commented Jul 23, 2020

har00ga commented Jul 23, 2020

terichadbourne commented Jul 24, 2020

terichadbourne commented Oct 29, 2020 • edited Loading

terichadbourne commented Jan 15, 2021

terichadbourne commented Mar 27, 2020 •

edited

Loading

terichadbourne commented Mar 27, 2020 •

edited

Loading

mishmosh commented Jul 23, 2020 •

edited

Loading

terichadbourne commented Oct 29, 2020 •

edited

Loading