Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAR/ipld1.6 (car files) #430

Merged
merged 1 commit into from
Dec 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion config/_default/goals.json
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,16 @@
},
"1.6": {
"description": "Learn about the CAR format and how it helps data distribution",
"subgoals": [{}],
"subgoals": [
{
"id": "1.61",
"description": "Get an idea of how the CAR format is beneficial to IPLD and how it is used today"
},
{
"id": "1.62",
"description": "See the differences between the two CAR version formats"
}
],
"levels": ["deep"]
}
},
Expand Down
15 changes: 15 additions & 0 deletions content/en/curriculum/ipld/the-car-format/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ weight: 270
category: lecture
level:
- deep
objectives:
show: true
goals:
- "1.6"
subgoals:
- 1.61
- 1.62
---

![](intro.png)
Expand Down Expand Up @@ -48,6 +55,14 @@ CARv2 has a flexible approach to index formats. The header provides details abou

The index at the end of the format provides information about what blocks are stored within the CARv1 data payload and _where_ they exist within the archive. A CARv2 reader implementation can load the index and then use its CID->offset mapping information to seek directly to the requested block and not have to hunt for it. The index _format_ is flexible, in that the first byte of the index identifies the format (which a given CARv2 implementation may or may not understand how to read) and the rest of the bytes conform to that format. There are currently two well-specified index formats, but there are a number of additional experimental index formats. Index formats may be selected depending on the suitability for a particular application or set of data - generation speed, usage performance, size, etc. Indexes typically only store the _Multihash_ of a block, rather than the entire CID, for efficiency reasons (but there are other interesting characteristics enabled by being able to look up a block by multihash rather than the entire CID, even if the _Multicodec_ is useful for decoding the block once it's found).

## Performance
Some considerations regarding performance:

* Streaming: the CAR format is ideal for dumping blocks via streaming reads as the Header can be loaded first and minimal state is required for ongoing parsing.
* Individual block reads: as the CAR format contains no index information, reads require either a partial scan to discover the location of a required block or an external index must be maintained and referenced for a seek and partial read of that data. See below regarding indexing.
* DAG traversal: without an external index, traversal of a DAG specified by a "root" CID is not possible without dumping all blocks into a more convenient data store or by partial scans to find each block as required, which will likely be too inefficient to be practical.
* Modification: CARs may be appended after initial write as there is no constraint in the Header regarding total length. Care must be taken in appending if a CAR is intended to contain coherent DAG data.

#### Further Reading

The CARv1 and CARv2 specifications, including specifications for CARv2 index formats, can be found on the IPLD specifications site: [ipld.io/specs/transport/car](https://ipld.io/specs/transport/car/)