Support for bigger dag nodes out of the box #52

magik6k · 2017-05-04T13:45:20Z

As of now one can't ipfs.dag.get a cbor node larger than 64k. This is an issue when building structures such as large hash-maps. The solution IMO would be to reallocate borc heap when cbor bigger than 64k is encountered, so that memory usage is kept as low as possible.

Stack trace I got when attempting to dag.get large cbor:

decoder.js:552 Uncaught (in promise) RangeError: Source is too large
    at Uint8Array.set (native)
    at Decoder._decode (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:52712:18)
    at Decoder.decodeFirst (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:52736:11)
    at Object.exports.deserialize (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:51925:29)
    at waterfall (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:20381:19)
    at nextTask (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:24233:15)
    at next (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:24240:10)
    at http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:23409:17
    at store.get (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:103105:10)
    at http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:28188:8

The text was updated successfully, but these errors were encountered:

daviddias · 2017-08-25T08:05:23Z

@dignifiedquire this was fixed, correcT? Do you confirm, @wanderer ?

ProjectAtlantis-dev · 2018-01-08T21:25:57Z

Is there a workaround? I need to load some real industry data for a demo - which we can all benefit from

daviddias · 2018-01-10T19:01:34Z

ping @dignifiedquire

daviddias · 2018-03-21T04:31:48Z

@dignifiedquire didn't you fix this already?

mikeal · 2018-06-22T23:32:22Z

I'm still running into this in the latest release. I was able to fix it locally by adding a size attribute to the cbor decoder options: https://github.com/ipld/js-ipld-dag-cbor/blob/master/src/util.js#L33

I'm not sure what the permanent fix should be. What's a reasonable size default size? How do we expose a way for people to increase the size?

daviddias · 2018-06-25T09:17:15Z

@mikeal there is an arbitrary max size of 2MB per node imposed by Bitswap, anything bigger must be sharded so that it doesn't clog the pipes.

mikeal · 2018-06-25T14:26:31Z

Ah, good to know, is there anything in dag-cbor that standardizes a way to break up a node so that you can shard it like this and still support the {cid}/path/subpath lookups without change?

…

On Mon, Jun 25, 2018, 2:17 AM David Dias ***@***.***> wrote: @mikeal <https://github.com/mikeal> there is an arbitrary max size of 2MB per node imposed by Bitswap, anything bigger must be sharded so that it doesn't clog the pipes. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#52 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACQ9jchp3Ct7kSJrOkND6XnrIYm3O6ks5uAKqlgaJpZM4NQsOd> .

mikeal · 2018-06-25T17:26:07Z

@diasdavid long term, is the plan to standardize how dag nodes are split up to stay under this size or is the plan to make changes to bitswap to move around parts of a block?

daviddias · 2018-06-25T19:35:11Z

It is hard to define a single standard for IPLD Sharding (different use cases require different data structures), although we think we can make it happen through specifying multiple layouts and incorporating the metadata about the layout in a root node. This way an IPLD Graph would be self describable on how it is serialized and also would pack a schema reference at the root node of the layout used (ideally, the traversal of the graph is all in Webassembly so that we only write these once).

Read more about this on the IPLD Sharding discussions - ipfs/notes#76

Meanwhile, what we have done for Unixfs is to use a HAMT (Hash Array Mapped Trie). See the implementation at -- https://github.com/ipfs/js-ipfs-unixfs-engine/tree/master/src/hamt -- and notes from convos at: ipfs/specs#32 && ipfs/notes#216 && ipfs/kubo#3042.

mikeal · 2018-06-25T20:09:02Z

@diasdavid thanks, I've been getting deep into the current IPLD API and will have a broader Issue filed about this and a few other things. I think that in order to handle this we we may need some interface changes.

mikeal · 2018-06-25T20:54:23Z

One thing I want to point out though, we keep talking about "sharding" as a single topic, but i think there are a few different use cases getting lumped in here that are conflicting.

Inserting and retrieving a single node in a very large collection (Billions of keys).
Segmenting a single node across multiple blocks (Thousands of keys).

In the case of 1, the full set of keys in the collection can't fit in memory. In the case of 2, they can easily fit in memory but just serialize to larger than 2MB. If we are lumping 2 into 1 we're essentially saying that once you hit just a few thousand links to a parent you're off in user level sharding land, which I don't think is productive.

For 1 we probably want to build some form of sharding on top of the underlying IPLD implementation. These can be very use case specific and the underlying dag implementation is unaware, it's just getting lookups into /:hash1/:hash2/key.

For 2 we need changes to the IPLD interface so that node serialization can return multiple blocks (working on a POC we can poke at). This is implemented at that layer so that lookups are still just /key.

rklaehn · 2018-10-25T08:09:30Z

What's wrong with just increasing the size of the buffer to 4 megs for now, so any block that can be transferred over IPFS can also be decoded as dag-cbor? There is a single instance of the decoder, so I would think this would not be a big problem.

As a second step optimize borc so it allocates on demand.

vmx · 2018-10-25T08:27:39Z

I think the limit should be increased to 2MB (the Bitswap limitation) by default, but also made configurable in case you use IPLD outside a context where Bitswap is used. Any objections?

magik6k · 2018-10-25T09:59:52Z

afaik the bitswap limit is slightly less than 4mb as the real limit is libp2p message size which is 4mb - https://github.com/libp2p/go-libp2p-net/blob/70a8d93f2d8c33b5c1a5f6cc4d2aea21663a264c/interface.go#L20 minus bitswap headers

rklaehn · 2018-10-25T12:00:46Z

I agree about using the bitswap limitation by default. But is bitswap actually limited to 2MB? I did some experiments and it seems that the limit at which I am able to transfer a block is at 4MB...

$ head -c 4000000 /dev/urandom | ipfs block put
QmS1NELQph4KzgXzTHWSk4ioPLuC2MjNyW8KkZ57feAns1
$ ssh someothernode ipfs block get QmS1NELQph4KzgXzTHWSk4ioPLuC2MjNyW8KkZ57feAns1 | wc -l
4000000

For 5MB it does not work.

vmx · 2018-10-25T12:06:06Z

Can well be 4MB, I don't know where I got the 2MB number from :)

mikeal · 2018-10-30T21:54:52Z

Keep in mind that the setting in the actual CBOR JS code needs to be double the desired max size of the serialized node.

chafey · 2020-02-26T20:47:15Z

Note: commit 1f7b7f1 increased the max size to 64MB!

daviddias added the kind/bug A bug in existing code (including security flaws) label May 4, 2017

daviddias added the status/deferred Conscious decision to pause or backlog label Jul 3, 2017

daviddias assigned dignifiedquire Mar 21, 2018

vmx added P2 Medium: Good to have, but can wait until someone steps up and removed P1 High: Likely tackled by core team if no one steps up labels Nov 14, 2018

vmx mentioned this issue Nov 20, 2018

Increase the size of the decoder buffer to 1meg. #73

Closed

rvagg closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for bigger dag nodes out of the box #52

Support for bigger dag nodes out of the box #52

magik6k commented May 4, 2017

daviddias commented Aug 25, 2017

ProjectAtlantis-dev commented Jan 8, 2018

daviddias commented Jan 10, 2018

daviddias commented Mar 21, 2018

mikeal commented Jun 22, 2018

daviddias commented Jun 25, 2018

mikeal commented Jun 25, 2018 via email

mikeal commented Jun 25, 2018

daviddias commented Jun 25, 2018 •

edited

Loading

mikeal commented Jun 25, 2018

mikeal commented Jun 25, 2018

rklaehn commented Oct 25, 2018

vmx commented Oct 25, 2018

magik6k commented Oct 25, 2018

rklaehn commented Oct 25, 2018

vmx commented Oct 25, 2018

mikeal commented Oct 30, 2018

chafey commented Feb 26, 2020 •

edited

Loading

Support for bigger dag nodes out of the box #52

Support for bigger dag nodes out of the box #52

Comments

magik6k commented May 4, 2017

daviddias commented Aug 25, 2017

ProjectAtlantis-dev commented Jan 8, 2018

daviddias commented Jan 10, 2018

daviddias commented Mar 21, 2018

mikeal commented Jun 22, 2018

daviddias commented Jun 25, 2018

mikeal commented Jun 25, 2018 via email

mikeal commented Jun 25, 2018

daviddias commented Jun 25, 2018 • edited Loading

mikeal commented Jun 25, 2018

mikeal commented Jun 25, 2018

rklaehn commented Oct 25, 2018

vmx commented Oct 25, 2018

magik6k commented Oct 25, 2018

rklaehn commented Oct 25, 2018

vmx commented Oct 25, 2018

mikeal commented Oct 30, 2018

chafey commented Feb 26, 2020 • edited Loading

daviddias commented Jun 25, 2018 •

edited

Loading

chafey commented Feb 26, 2020 •

edited

Loading