This package is no longer maintained. Please use the new @ipld/car
package for dealing with CAR archives in JavaScript.
A JavaScript Content ARchive (CAR) file reader and writer for for IPLD blocks. See original Go implementation.
The interface wraps a Datastore, similar to datastore-zipcar and has multiple create-modes for different use-cases, including memory-efficient read and write options.
import fs from 'fs'
import multiformats from 'multiformats/basics'
import car from 'datastore-car'
import dagCbor from '@ipld/dag-cbor'
// dag-cbor is required for the CAR root block
multiformats.add(dagCbor)
const CarDatastore = car(multiformats)
async function example () {
const binary = new TextEncoder().encode('random meaningless bytes')
const mh = await multiformats.multihash.hash(binary, 'sha2-256')
const cid = multiformats.CID.create(1, multiformats.get('raw').code, mh)
const outStream = fs.createWriteStream('example.car')
const writeDs = await CarDatastore.writeStream(outStream)
// set the header with a single root
await writeDs.setRoots(cid)
// store a new block, creates a new file entry in the CAR archive
await writeDs.put(cid, binary)
await writeDs.close()
const inStream = fs.createReadStream('example.car')
// read and parse the entire stream so we have `get()` and `has()` methods
// use readStreaming(inStream) to support efficient stream decoding with
// just query() available for iterative reads.
const readDs = await CarDatastore.readStreamComplete(inStream)
// read the list of roots from the header
const roots = await readDs.getRoots()
// retrieve a block, as a UInt8Array, reading from the ZIP archive
const got = await readDs.get(roots[0])
// also possible: for await (const { key, value } of readDs.query()) { ... }
console.log('Retrieved [%s] from example.car with CID [%s]',
new TextDecoder().decode(got),
roots[0].toString())
await readDs.close()
}
example().catch((err) => {
console.error(err)
process.exit(1)
})
Will output:
Retrieved [random meaningless bytes] from example.car with CID [bafkreihwkf6mtnjobdqrkiksr7qhp6tiiqywux64aylunbvmfhzeql2coa]
In this example, the writeStream()
create-mode is used to generate the CAR file, this allows for an iterative write process where first the roots are set (setRoots()
) and then all of the blocks are written (put()
). After it is created, we use the readStreamComplete()
create-mode to read the contents. Other create-modes are useful where the environment, data and needs demand:
CarDatastore.readBuffer(buffer)
: read a CAR archive from aUint8Array
. Does not support mutation operations, only reads. This mode is not efficient for large data sets but does supportget()
andhas()
operations since it caches the entire archive in memory. This mode is the only mode available in a browser environmentCarDatastore.readFileComplete(file)
: read a CAR archive directly from a file. Does not support mutation operations, only reads. This mode is not efficient for large data sets but does supportget()
andhas()
operations since it caches the entire archive in memory. This mode is not available in a browser environment.CarDatastore.readStreamComplete(stream)
: read a CAR archive directly from a stream. Does not support mutation operations, only reads. This mode is not efficient for large data sets but does supportget()
andhas()
operations since it caches the entire archive in memory. This mode is not available in a browser environment.CarDatastore.readStreaming(stream)
: read a CAR archive directly from a stream. Does not support mutation operations, and only supports iterative reads viaquery()
(i.e. noget()
andhas()
). This mode is very efficient for large data sets. This mode is not available in a browser environment.async CarDatastore.readFileIndexed(stream)
: read a CAR archive from a local file, index its contents and use that index to support random access reads (has()
,get()
andquery()
) without fitting the entire contents in memory asreadFileComplete()
does. Uses more memory thanreadStreaming()
and less thanreadFileComplete()
. Will be slower to initialize thanreadStreaming()
but suitable where random access reads are required from a large file.CarDatastore.writeStream(stream)
: write a CAR archive to a stream (e.g.fs.createWriteStream(file)
). Does not support read operations, only writes, and the writes are append-only (i.e. nodelete()
). However, this mode is very efficient for dumping large data sets, with no caching and streaming writes. This mode is not available in a browser environment.
Other create-modes may be supported in the future, such as writing to a Uint8Array (although this is already possible if you couple writeStream()
with a BufferListStream
) or a read/write mode such as datastore-zipcar makes available.
async CarDatastore.readBuffer(buffer)
async CarDatastore.readFileComplete(file)
async CarDatastore.readStreamComplete(stream)
async CarDatastore.readStreaming(stream)
async CarDatastore.readFileIndexed(stream)
async CarDatastore.writeStream(stream)
async CarDatastore.completeGraph(root, get, car[, concurrency])
class CarDatastore
async CarDatastore#get(key)
async CarDatastore#has(key)
async CarDatastore#put(key, value)
async CarDatastore#delete(key)
async CarDatastore#setRoots(comment)
async CarDatastore#getRoots()
async CarDatastore#close()
async CarDatastore#query([q])
async CarDatastore.indexer(input)
async CarDatastore.readRaw(fd, blockIndex)
Read a CarDatastore from a Uint8Array containing the contents of an existing
CAR archive. Mutation operations (put()
, delete()
and setRoots()
) are
not available.
Because the entire CAR archive is represented in memory after being parsed,
this read-mode is not suitable for large data sets. readStreaming()
should
be used instead for a streaming read supporting only query()
for an
iterative decode.
However, this create-mode is currently the only mode supported in a browser environment.
Parameters:
buffer
(Uint8Array
): the byte contents of a CAR archive
Return value (CarDatastore
): a read-only CarDatastore.
Read a CAR archive from a file and return a CarDatastore. The CarDatastore
returned will only support read operations: getRoots()
, get()
, has()
and query()
. Caching makes get()
and has()
. This is possible as the entire
file is read and decoded before the CarDatastore is returned. mutation
operations (put()
, delete()
and setRoots()
) are not available as there
is no ability to modify the archive.
This create-mode is functionally similar to calling:
CarDatastore.readStreamComplete(fs.createReadStream(path))
However, this create-mode uses raw fs.read()
operations to seek through
the file as required rather than wrapping the consumption in a ReadableStream
with its fixed chunk size. This distinction is unlikely to make a difference
until a non-buffering readFile()
create-mode is exposed.
Because the entire CAR archive is represented in memory after being parsed,
this create-mode is not suitable for large data sets. readStreaming()
should be used insead for a streaming read supporting only query()
for an
iterative decode.
This create-mode is not available in the browser environment.
Parameters:
file
(string
): a path to a file containing CAR archive data.
Return value (CarDatastore
): a read-only CarDatastore.
Read a CAR archive as a CarDataStore from a ReadableStream. The CarDatastore
returned will only support read operations: getRoots()
, get()
, has()
and query()
. Caching makes get()
and has()
. This is possible as the entire
stream is read and decoded before the CarDatastore is returned. Mutation
operations (put()
, delete()
and setRoots()
) are not available as there
is no ability to modify the archive.
Because the entire CAR archive is represented in memory after being parsed,
this create-mode is not suitable for large data sets. readStreaming()
should
be used instead for a streaming read supporting only query()
for an
iterative decode.
This create-mode is not available in the browser environment.
Parameters:
stream
(ReadableStream
): a ReadableStream that provides an entire CAR archive as a binary stream.
Return value (CarDatastore
): a read-only CarDatastore.
Read a CAR archive as a CarDataStore from a ReadableStream. The CarDatastore
returned will only support getRoots()
and an iterative query()
call.
As there is no caching, individual get()
or has()
operations are not
possible and mutation operations (put()
, delete()
and setRoots()
) are
not available as there is no ability to modify the archive.
readStreaming()
is an efficient create-mode, useful for reading large CAR
archives without using much memory. Its support for a simple iterative
query()
method make its utility as a general Datastore very limited.
readStreamComplete()
is an alternative stream decoding create-mode that uses
buffering to decode an entire stream into an in-memory representation of the
CAR archive. This may be used if get()
and has()
operations are required
and the amount of data is manageable in memory.
This create-mode is not available in the browser environment.
Parameters:
stream
(ReadableStream
): a ReadableStream that provides an entire CAR archive as a binary stream.
Return value (CarDatastore
): a read-only CarDatastore.
Read a CAR archive as a CarDataStore from a local file. The CarDatastore
returned will only support read operations: getRoots()
, get()
, has()
and query()
. Caching makes get()
and has()
. This is possible as the entire
stream is read and indexed before the CarDatastore is returned. Mutation
operations (put()
, delete()
and setRoots()
) are not available as there
is no ability to modify the archive.
The indexing operation uses indexer
to catalogue the contents of the
CAR and store a mapping of CID to byte locations for each entry. This method
of parsing is not as memory intensive as readStreamComplete
as only
the index is stored in memory. When blocks are read, the index tells the
reader where to fetch the block from within the CAR file.
This mode is suitable for large files where random-access operations are
required. Where a full sequential read is only required, use
createReadStreaming
which consumes the file in a single pass with no
memory used for indexing.
This create-mode is not available in the browser environment.
Parameters:
stream
(ReadableStream
): a ReadableStream that provides an entire CAR archive as a binary stream.
Return value (CarDatastore
): a read-only CarDatastore.
Create a CarDatastore that writes a CAR archive to a WritableStream. The
CarDatastore returned will only support append operations (put()
and
setRoots()
, but not delete()
) and no caching will be performed, with
entries written directly to the provided stream.
Because the roots are encoded in the header of a CAR file, a call to
setRoots()
must be made prior to any put()
operation. Absent of a
setRoots()
call, the header will be encoded with an empty list of root
CIDs. A call to setRoots()
after one or more calls to put()
will result
in an Error being thrown.
writeStream()
is an efficient create-mode, useful for writing large amounts
of data to CAR archive as long as the roots are known before writing.
This create-mode is not available in a browser environment.
Parameters:
stream
(WritableStream
): a writable stream
Return value (CarDatastore
): an append-only, streaming CarDatastore.
Read a complete IPLD graph from a provided datastore and store the blocks in a CAR file.
Parameters:
root
(CID
): the CID of the root of the graph to start at, this block will be included in the CAR and the CID will be set as the single root.get
(AsyncFunction
): anasync
function that takes a CID and returns aBlock
. Can be used to attach to an arbitrary data store.car
(CarDatastore
): a writableCarDatastore
that has not yet been written to (setRoots()
will be called on it which requires that no data has been written).concurrency
(number
, optional, default=1
): how many asynchronousget
operations to perform at once.
CarDatastore is a class to manage reading from, and writing to a CAR archives using CIDs as keys and file names in the CAR and binary block data as the file contents.
Retrieve a block from this archive. key
s are converted to CID
automatically, whether you provide a native Datastore Key
object, a
String
or a CID
. key
s that cannot be converted will throw an error.
This operation may not be supported in some create-modes; a write-only mode may throw an error if unsupported.
Parameters:
key
(string|Key|CID
): aCID
orCID
-convertable object to identify the block.
Return value (Uint8Array
): the IPLD block data referenced by the CID.
Check whether a block exists in this archive. key
s are converted to CID
automatically, whether you provide a native Datastore Key
object, a
String
or a CID
. key
s that cannot be converted will throw an error.
This operation may not be supported in some create-modes; a write-only mode may throw an error if unsupported.
Parameters:
key
(string|Key|CID
): aCID
orCID
-convertable object to identify the block.
Return value (boolean
): indicating whether the key exists in this Datastore.
Store a block in this archive. key
s are converted to CID
automatically,
whether you provide a native Datastore Key
object, a String
or a CID
.
key
s that cannot be converted will throw an error.
Only supported by the CarDatastore.writeStream()
create-mode.
CarDatastores constructed by other create-modes will not support put()
and an Error will be thrown when it is called.
Parameters:
key
(string|Key|CID
): aCID
orCID
-convertable object to identify thevalue
.value
(Uint8Array
): an IPLD block matching the givenkey
CID
.
Currently not supported by any create-mode. CarDatastore is currently an append-only and read-only construct.
Parameters:
key
(string|Key|CID
): aCID
orCID
-convertable object to identify the block.
Set the list of roots in the CarDatastore archive on this CAR archive.
The roots will be written to the comment section of the CAR archive when
close()
is called, in the meantime it is stored in memory.
Only supported by the CarDatastore.writeStream()
create-mode.
CarDatastores constructed by other create-modes will not support put()
and an Error will be thrown when it is called.
Parameters:
comment
(string
): an arbitrary comment to store in the CAR archive.
Get the list of roots set on this CAR archive if they exist exists. See
CarDatastore#setRoots
.
Return value (Array.<CID>
): an array of CIDs
Close this archive, free resources and write its new contents if required and supported by the create-mode used.
This may or may not have any effect on the use of the underlying resource depending on the create-mode of the CarDatastore.
Create an async iterator for the entries of this CarDatastore. Ideally for
use with for await ... of
to lazily iterate over the entries.
By default, each element returned by the iterator will be an object with a
key
property with the string CID of the entry and a value
property with
the binary data.
Supply { keysOnly: true }
as an argument and the elements will only
contain the keys, without needing to load the values from storage.
The filters
parameter is also supported as per the Datastore interface.
This operation may not be supported in some create-modes; a write-only mode may throw an error if unsupported.
Parameters:
q
(Object
, optional): query parameters
Return value (AsyncIterator.<key, value>
)
Index a CAR without decoding entire blocks. This operation is similar to
CarDatastore.readStreaming()
except that it doesn't reutrn a CarDatastore
and it skips over block data. It returns the array of root CIDs as well as
an AsyncIterator that will yield index data for each block in the CAR.
The index data provided by the AsyncIterator can be stored externally and
used to read individual blocks directly from the car (using
CarDatastore.readRaw()
).
// full multiformats omitted, you'll need codecs, bases and hashes that
// appear in your CAR files if you want full information
const multiformats = ...
const { indexer } = require('datastore-car')(multiformats)
async function run () {
const cidStr = (cid) => `${multiformats.get(cid.code).name}:${cid.toString()}`
const index = await indexer('big.car')
index.roots = index.roots.map(cidStr)
console.log('roots:', index.roots)
for await (const blockIndex of index.iterator) {
blockIndex.cid = cidStr(blockIndex.cid)
console.log(JSON.toString(blockIndex))
}
}
run().catch((err) => {
console.error(err)
process.exit(1)
})
Might output something like:
roots: [
'dag-cbor:bafyreihyrpefhacm6kkp4ql6j6udakdit7g3dmkzfriqfykhjw6cad5lrm',
'dag-cbor:bafyreidj5idub6mapiupjwjsyyxhyhedxycv4vihfsicm2vt46o7morwlm'
]
{"cid":"dag-cbor:bafyreihyrpefhacm6kkp4ql6j6udakdit7g3dmkzfriqfykhjw6cad5lrm","length":92,"blockLength":55,"offset":100,"blockOffset":137}
{"cid":"dag-pb:QmNX6Tffavsya4xgBi2VJQnSuqy9GsxongxZZ9uZBqp16d","length":133,"blockLength":97,"offset":192,"blockOffset":228}
{"cid":"raw:bafkreifw7plhl6mofk6sfvhnfh64qmkq73oeqwl6sloru6rehaoujituke","length":41,"blockLength":4,"offset":325,"blockOffset":362}
{"cid":"dag-pb:QmWXZxVQ9yZfhQxLD35eDR8LiMRsYtHxYqTFCBbJoiJVys","length":130,"blockLength":94,"offset":366,"blockOffset":402}
{"cid":"raw:bafkreiebzrnroamgos2adnbpgw5apo3z4iishhbdx77gldnbk57d4zdio4","length":41,"blockLength":4,"offset":496,"blockOffset":533}
{"cid":"dag-pb:QmdwjhxpxzcMsR3qUuj7vUL8pbA7MgR3GAxWi2GLHjsKCT","length":82,"blockLength":47,"offset":537,"blockOffset":572}
{"cid":"raw:bafkreidbxzk2ryxwwtqxem4l3xyyjvw35yu4tcct4cqeqxwo47zhxgxqwq","length":41,"blockLength":4,"offset":619,"blockOffset":656}
{"cid":"dag-cbor:bafyreidj5idub6mapiupjwjsyyxhyhedxycv4vihfsicm2vt46o7morwlm","length":55,"blockLength":18,"offset":660,"blockOffset":697}
...
When indexing files, performance may vary when providing a file path compared to a ReadableStream of the same file. In the latter case all of the bytes of the file will be read from disk. Whereas a direct file read may be able to skip over much of the block data and increase indexing speed; although the reads use a buffer so there will be extraneous data read in the process and if a CAR contains only small blocks then the entire file may end up being read into memory.
Parameters:
input
(string|ReadableStream
): either a string path name to a CAR file or a ReadableStream that provides CAR archive data.
Return value (Object.<Array.<roots:CID>, iterator:AsyncIterator>
): an object containing a
roots
array of CIDs and an iterator
AsyncIterator that will yield
Objects of the form { cid:CID, offset:number, length:number, byteOffset:number, byteLength:number }
indicating the CID of the block located at blockOffset
with a length of
blockLength
in the CAR archive provided.
Read a block directly from a CAR file given an block index provided by
CarDatastore.indexer()
(i.e. an object with the minimal form:
{ cid:CID, blockOffset:number, blockLength:number }
).
Parameters:
fd
(number|FileHandle
): an open file descriptor, either an integer fromfs.open()
or aFileHandle
onfs.promises.open()
.blockIndex
(Object
): an index object of the style provided byCarDatastore.indexer()
({ cid, offset, length }
).
Return value (object
): an IPLD block of the form { cid, binary }
.
Copyright 2019 Rod Vagg
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.