-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Conversation
ipfs dag export <root>
added still no tests yet, though it all works according to basic manual testing. |
3da1fee
to
cf772c9
Compare
@achingbrain can I get some advice on this please? I tried to mirror the go-ipfs approach Maybe I should be implementing this somewhere else and deferring the input and output streams somehow? I have 8 test fixtures totalling 2.9M (1.1M if I compress them) for this feature, but I'm noticing that ipfs-cli relies almost exclusively on mocks to do its testing, which is probably not ideal for these. So I'll need suggestions on where all that should go and examples for how to set things up for repetitive runs against reset block stores. |
@rvagg typically you'd add functionality to The testing strategy - what sort of tests go where and how different components are tested - is documented here: https://github.com/ipfs/js-ipfs/blob/master/docs/DEVELOPMENT.md#testing-strategy
There's a small world of documentation here: https://github.com/ipfs/js-ipfs/tree/master/docs
Yes, go-ipfs generates it's CLI and HTTP API from the same code whereas for better or for worse, js-ipfs has a more manual process.
Ouch, can these be generated on the fly by the tests? |
@achingbrain the other dilemma I have here is the multiformats & ipld codec stack. So the choice is something like this:
That might depend on timing for said upgrade so I'd appreciate your input on this. Re test fixtures - I was wanting to import the ones that go-ipfs is using as they are, so we ensure both that the feature works properly and we have some level of parity with go-ipfs for it. |
Status: Still needed: unit tests in cli, http-{client,server}, and some integration tests to match go-ipfs, plus it needs some additional docs. I'd really like to get |
@rvagg where does the |
Ah, I see - it's in an un-merged PR to go-ipfs: ipfs/kubo#8237 |
👌 looking really good, thanks @achingbrain |
Anecdotally for the fixture data the js implementation is 2-3.5x faster than go (ignoring test setup & teardown times): $ npm run test:interface:http-go -- -t node
> [email protected] test:interface:http-go /Users/alex/Documents/Workspaces/ipfs/js-ipfs/packages/ipfs
> aegir test -f test/interface-http-go.js "-t" "node"
Test Node.js
interface-ipfs-core over ipfs-http-client tests against go-ipfs
.dag.export
✔ should export a car file (265ms)
✔ export of shuffled devnet export identical to canonical original (4412ms)
✔ export of shuffled testnet export identical to canonical original (32176ms)
.dag.import
✔ should import a car file (488ms)
✔ should import a car file without pinning the roots (254ms)
✔ should import multiple car files (671ms)
✔ should import car with roots but no blocks (33776ms)
✔ should import lotus devnet genesis shuffled nulroot
8 passing (1m) vs $ npm run test:interface:http-js -- -t node
> [email protected] test:interface:http-js /Users/alex/Documents/Workspaces/ipfs/js-ipfs/packages/ipfs
> aegir test -f test/interface-http-js.js "-t" "node"
Test Node.js
interface-ipfs-core over ipfs-http-client tests against js-ipfs
.dag.export
✔ should export a car file (136ms)
✔ export of shuffled devnet export identical to canonical original (1939ms)
✔ export of shuffled testnet export identical to canonical original (12438ms)
.dag.import
✔ should import a car file (134ms)
✔ should import a car file without pinning the roots (99ms)
✔ should import multiple car files (197ms)
✔ should import car with roots but no blocks (14021ms)
✔ should import lotus devnet genesis shuffled nulroot
8 passing (37s) |
@achingbrain any idea what the difference might be? could it be on the rpc side, or is this only explainable by the core impl, and perhaps block store mechanism? Would be interesting to try this same thing from the go http api, maybe. |
The original perf of this in js was really bad because we were doing sequential block writes in
If I understand you correctly, this is what the test runs above are doing - using the go http api and the js http api to run the same tests. |
well, 👌 , whatever's going on, this is pretty sweet, thanks again for picking it up! |
// blocks that we're OK with not inspecting for links | ||
/** @type {number[]} */ | ||
const NO_LINKS_CODECS = [ | ||
dagCbor.code, // CBOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dagCbor.code, // CBOR |
This should be 0x51
(cbor), not 0x71
(dag-cbor) like here, but it's probably best to just remove it since we don't have an official cbor encoder (yet, no big demand atm). We need to inspect links of dagCbor blocks, since it has links, but cbor blocks don't (or .. maybe they could do, 🤷, it's not really clear what "cbor" means and its scope, but at least in Go we're going with super-minimal cbor with no tags or extras).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense, I did wonder.
const codec = await codecs.getCodec(cid.code) | ||
|
||
if (codec) { | ||
const block = Block.createUnsafe({ bytes, cid, codec }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference we might find with Go is that they're going to be obsessively checking CIDs are correct for the bytes whenever they're loaded, at least in cases like this. The Unsafe
is giving us a shortcut, basically saying we don't care if it doesn't match, and in fact we probably trust the underlying source of blocks to have already done that check. The new go-ipld-prime integration work is providing some opportunities for "trusted" data sources that skip these checks too so they may get slight perf improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I don't think we need to as the blockstore creates the path to the block from the multihash extracted from the CID. I suppose you could mess around with the fs/level backing store to change the contents of the buffer stored at the relevant path but I think solving that problem is outside the scope of dag export.
Re-hashing/verifying blocks on every export would be quite expensive, definitely something we should put behind a flag if people need those kind of guarantees.
go-ipfs has a ipfs repo verify
command which I guess would load every block in the repo and rehash it to ensure we've got the right bytes, there's no equivalent in js land yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related: the old Datastore.HashOnRead
option that is false
by default, due to the perf hit and "i trust my datastore" assumption https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#datastorehashonread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go will be getting the same treatment when the ipld-prime work makes its way through: https://github.com/ipfs/go-fetcher/blob/64b1f390e7ae13d96494f15a10e07527369a521d/impl/blockservice/fetcher.go#L45
I'm pretty sure (but far from certain) that any time you interact with an IPLD Node
at the moment it does the whole-hog of verification and you can't bypass it. In fact there's a lot of places where you can't even avoid decoding the bytes even if you just want the bytes. Things should get a little more sophisticated as ipld-prime moves through.
🎉 |
Adds `ipfs.dag.import` and `ipfs.dag.export` commands to import/export CAR files, e.g. single-file archives that contain blocks and root CIDs. Supersedes #2953 Fixes #2745 Co-authored-by: achingbrain <[email protected]>
New version of #2953
Fixes: #2745
WIP, just
export
implemented to far, no tests yet.It has to lean on the new multiformats and I've pulled in the new IPLD codecs too in anticipation of #3556 eventually sorting out the slight dependency bloat introduced by this (i.e. we have both versions of the codecs in this branch for ipfs-cli).