Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tutorial: CAR Files (IPLD / Filecoin) #412

Closed
terichadbourne opened this issue Mar 27, 2020 · 13 comments
Closed

New Tutorial: CAR Files (IPLD / Filecoin) #412

terichadbourne opened this issue Mar 27, 2020 · 13 comments
Assignees

Comments

@terichadbourne
Copy link
Member

terichadbourne commented Mar 27, 2020

In brainstorming ProtoSchool content that would be useful for the Filecoin project, @mishmosh suggested a multiple-choice tutorial on CAR Files, more or less an IPLD version of zip files, optimized for flattening. They're important in this context because Filecoin only stores flat files, not objects. CAR files have a header with all the CIDS they contain and make it easier to search for a subset of data you want. The spec lives in IPLD, and both Filecoin implementations use it.

Topics for a tutorial might include:

  • what a CAR file is
  • how it's constructed from hashing
  • how it's comprised
  • what's contained in the CAR header
  • how you search more easily with IPLD selectors (search criteria)

My understanding is that @rvagg @whyrusleeping @Stebalien might be among the experts on this topic, so I'd love feedback on whether this would be a useful topic, common misconceptions we should try to address, etc. Thanks!

@terichadbourne
Copy link
Member Author

terichadbourne commented Mar 27, 2020

@mikeal you just mentioned this in another channel:

both go-ipfs and js-ipfs have open PR’s for an import/export that works with CAR files
probably want to center the tutorial around that

ipfs/js-ipfs#2953
Will these tools be used by Filecoin folks or just IPFS folks?

We could certainly present this as an IPLD tutorial and reference it from multiple other projects, just asking because the original suggestion came from the Filecoin team.

@rvagg
Copy link
Collaborator

rvagg commented Mar 30, 2020

a header with all the CIDS they contain and make it easier to search for a subset of data you want

sadly, no. The header only contains a "roots" array that specify the tip of DAGs that the format may contain.

Resources:

Will these tools be used by Filecoin folks or just IPFS folks?

That's a good question that we may have to wait and have an answer for. I think once people have the ability to export whole DAGs from IPFS then a bunch of interesting use-cases open up, well beyond Filecoin. e.g. in that js-ipfs PR I showed exporting the XKCD archive into a single file, that kind of fully-offline archiving capability has some interesting uses. Maybe it should become part of the normal workflow for people publishing websites to IPFS that they also take a CAR backup to stick into their cold storage rather than just hoping that their IPFS daemon will keep it properly pinned? We've also looked at the possibility of using this format as the basis of a generic block database store. The lack of inbuilt indexing makes that a bit tricky but building an external index of block positions and then reading those blocks directly out is something we can already do in Go and JS.

@mikeal
Copy link
Member

mikeal commented Apr 2, 2020

Will these tools be used by Filecoin folks or just IPFS folks?

A few months ago I’d say “Filecoin” but just in the last month we’ve seen the format gain traction across projects. I think it’s going to be a widely used format across our stack. It provides a very simple way to transfer data that should be useful even between IPFS instances.

@mishmosh
Copy link
Contributor

mishmosh commented Apr 3, 2020

+1 on presenting this as a general lesson. Learning about CARs will be helpful to Filecoin users either way, but I'm not stuck on labelling it as such.

The course outline originally proposed by @terichadbourne is really nice:

what a CAR file is
how it's constructed from hashing
how it's comprised
what's contained in the CAR header
how you search more easily with IPLD selectors (search criteria)

A few more suggestions:

  • At the beginning, why you might want a CAR file ("to serialize, aka flatten, graph-based data"?)
  • At the end, what systems currently use CAR files

@terichadbourne
Copy link
Member Author

@terichadbourne @mikeal @ribasushi @rvagg @mishmosh met on Jun 30 to discuss the most appropriate content angle and staffing plan for this tutorial (full notes here).

Since we spoke, @mishmosh @har00ga @terichadbourne had a follow-up on staffing resources.

Key takeaways:

Primary focus for tutorial
How Merkle trees and DAGs make your data portable across different systems or networks
(CAR files aren't the key focus of the tutorial, just a means described for exporting from Filecoin to IPFS)

Format
Multiple-choice

Target audience
A person using IPFS for something and wondering how to get data into and out of Filecoin

Stuff to cover

  • What's are Merkle trees and DAGs and why should we care? Move the end part of Decentralized Data Structures into this new tutorial (https://proto.school/#/data-structures/05) adding context as needed.
  • Present CAR files as a format for transfer of DAGs between technologies (for example, lotus and go-ipfs)
  • The most approachable path to getting data into Filecoin / Lotus doesn't involve thinking about CAR files. Thanks to recent improvements from Textile you can just start with an IPFS CID. (Don't hand the same file to lotus and have them recheck it because you'll mess up the CID (with different chunk sizes??)
  • You care more about CAR files on the other end when retrieving data from Filecoin. Would be useful to teach how to take data out of Filecoin and put it in an IPFS node.
  • Everything in IPFS is a DAG, and a CAR file holds a DAG instead of a directory. When you care about DAG you might want to use a CAR file instead of a directory.
  • When you give Lotus a file to do a deal for, it turns into UnixFS file and then makes the root file. There's always on file at the top of the DAG. This works well for a single file: doesn't need selectors, don't need to assemble CAR file, it'll spit out the original file when you retrieve it.
  • When you store a directory in Filecoin (give it more than one file or give it a CID for a graph in the IPFS network) you have to export it into a CAR file when retrieving. You can't put it into a regular file because the root of the DAG isn't the file. The root CID doesn't correspond to a single file.
  • Anyone exporting from standard Filecoin Lotus to IPFS without an intermediary will want to use a flag that makes it export to a CAR file to directly import into IPFS.
  • Include some code samples without using coding challenges (see this example for a similar multiple choice lesson)

Stuff not to cover

  • The original suggestion for teaching selectors won't work at present. We don't yet expose a way to do selectors in the retrieval market - won't exist before launch. (More in extended notes.)
  • Don't need to specifically talk about things like OrbitDB
  • Power of two (why the size numbers don’t match up) - covered in docs.filecoin.io - overhead can put you over limit. This isn't the right place to surface as not essential to newcomers.

Mystery notes (Teri doesn't know where they belong)

  • Lotus is very tied to UnixFS now (don't all agree) - there are some flows that work well that way. Textile is doing non-UnixFS stuff.

Timeline

  • @mikeal would like to see this published before mainnet
  • @ribasushi can apply time while he's blocked on other projects

Next steps

  1. All: Review notes above and let us know what doesn't look accurate to you.
  2. @har00ga: Create a proposed outline of how the necessary content could be chunked out into brief lessons
  3. @ribasushi @terichadbourne @har00ga (and others interested) Pick Peter's brain on a sync call about appropriateness of proposed outline, revise together as needed
  4. @har00ga: Create lesson files and tutorial data using the ProtoWizard CLI, including preliminary lesson titles, content pulled from Decentralized Data Structures, etc. (@terichadbourne available to help with questions on ProtoSchool's structure)
  5. @ribasushi Update the markdown files (lesson text) and JS files (quiz Q&A) with first draft of content
  6. @har00ga 1st round review, re-chunk content as needed, make things beginner friendly, copy edit, etc. & confirm edits make sense to @ribasushi
  7. @terichadbourne & others interested - 2nd round review & edits

Please let me know if these notes and proposed staffing approach sound appropriate!

@har00ga
Copy link

har00ga commented Jul 23, 2020

Here's what I have for basic summaries for 2 separate tutorials. Feel free to chop / slice / add whatever you'd like, also need to discuss a few things about selectors during our meeting tomorrow. Hopefully this suffices

Storage basics on the Filecoin Network

  1. Merkle Trees and DAGs: What are they?
  • Introductory information on both concepts, how Merkle trees and DAGs make your data portable across different systems or networks

  • touch on how merkle DAGs differ

  1. DAGs, CARs and directories: What are the differences?
  • Conceptual overview of CAR files + the difference between standalone directories and a flat file, and why one is preferable other the other in the case of storage / retrieval on the FIL network.

  • Explain how CAR files 'hold' DAGs, Everything in IPFS is a DAG, and a CAR file holds a DAG instead of a directory + why its preferrable

  1. Introducing selectors
    [need further discussion on this one, will talk in meeting]

Storing and retrieving on the Filecoin network

  1. Storing CAR files
  • Clarify that the simplest way of getting data into Filecoin is via IPFS CID, note that CARs arent something that need to be considered until retrieval if uploading via CID. Useful for miners being shipped drives. Mention Textile

  • Explain UnixFS files and then makes the root file. There's always on file at the top of the DAG. This works well for a single file: doesn't need selectors, don't need to assemble CAR file, it'll spit out the original file when you retrieve it.

  1. Retrieving CAR files
  • Explain directories stored on FIL network must be exported into a CAR file when retrieving. Van't put it into a regular file because the root of the DAG isn't the file. The root CID doesn't correspond to a single file.

  • howto: retrieve

  1. Submitting data to Filecoin via IPFS

  2. Retrieving data from Filecoin via IPFS

@mishmosh
Copy link
Contributor

mishmosh commented Jul 23, 2020

how Merkle trees and DAGs make your data portable across different systems or networks

Yes! This will be an important concept to feature.

Storing and retrieving on the Filecoin network

  • We should clarify (both in the eventual title of this section, and in its contents) that this tutorial is not intended to cover storing & retrieving data for general purposes, but specifically with flexible, selector-based subset retrieval in mind. This might affect how we think about 3 & 4.
  • +1 to the contents in 1 & 2.

@terichadbourne
Copy link
Member Author

Thanks so much for pulling this together @har00ga!

The ordering of concepts here (other than selectors, see below) feels appropriate to me, just have a couple of initial reactions on structure and framing that I look forward to chatting about later today:

  • Based on the discussion I had originally with @mishmosh @mikeal @ribasushi @rvagg, I was envisioning this as a single tutorial framed around how Merkle trees and DAGs make your data portable across different systems or networks. If we did it this way, CAR files wouldn't be the key focus of the tutorial, just a means described for exporting from Filecoin to IPFS, which would in itself be presented as just one example of why the DAG format is useful. I think this is mostly a matter of framing (tutorial and lesson titles, project focus) and structure (1 versus 2 tutorials), not what content is appropriate to include.

  • Your proposed titles feel more Filecoin-focused than I was expecting, as we had talked about potentially labelling this as an IPLD tutorials and including it in multiple courses including the Filecoin one. I'm definitely flexible on this, just reflecting what I thought I'd heard previously.

  • If I interpreted @mikeal correctly in that convo, we don't want to get into selectors because they're not accessible to most folks at present (See more in the "stuff not to cover" section above for why.) It looks like @mishmosh may be saying the opposite of what I am above, however.

@ribasushi
Copy link
Collaborator

that this tutorial is not intended to cover storing & retrieving data for general purposes, but specifically with flexible, selector-based subset retrieval in mind.

👎

we don't want to get into selectors because they're not accessible to most folks at present

👍

Selectors, while cool, have a rather limited utility for end-users within the Fil story. We'll cover this in more depth during the meet.

@har00ga
Copy link

har00ga commented Jul 23, 2020

Noted on 1 & 2 @terichadbourne, happy to make changes to reflect both those after the meeting 🙂 That was my question I had regarding selectors RE: 3, was mentioned conceptually a bunch of times in the notes but was also thrown in that it should be held off for now. Shall remove that part 👍

@terichadbourne
Copy link
Member Author

@har00ga @mishmosh @ribasushi and I met yesterday to work further on the outline for what's now 2 courses, the first of which @har00ga and @ribasushi will work together to finalize an outline for in our notes doc.

@mishmosh Based on what we discussed yesterday, do you envision the already-proposed "How Filecoin enhances IPFS" tutorial that you were planning to outline being different from the one we discussed re moving files between IPFS and Filecoin (2nd piece of what's been discussed in this issue)? Just figuring out whether we need a third issue to capture all the plans.

@terichadbourne
Copy link
Member Author

terichadbourne commented Oct 29, 2020

On Monday I met with @mitchwagner, who'll be drafting this tutorial with SME support from @ribasushi. 🎉

Based on further discussion with Mosh, it does look like the 2nd piece of what's been described in this issue could be the "How Filecoin enhances IPFS" tutorial (#413). Mitch will be starting to refine an outline for that one while drafting this one.

@terichadbourne
Copy link
Member Author

We published our tutorial on Merkle DAGs yesterday. I'm going to close out this issue although it has a lot of comments that we'll need while planning a follow-up tutorial on how to store and retrieve files on Filecoin via IPFS. For continued discussion on that one, please see #413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants