-
Notifications
You must be signed in to change notification settings - Fork 26
Support for bigger dag nodes out of the box #52
Comments
@dignifiedquire this was fixed, correcT? Do you confirm, @wanderer ? |
Is there a workaround? I need to load some real industry data for a demo - which we can all benefit from |
ping @dignifiedquire |
@dignifiedquire didn't you fix this already? |
I'm still running into this in the latest release. I was able to fix it locally by adding a I'm not sure what the permanent fix should be. What's a reasonable size default size? How do we expose a way for people to increase the size? |
@mikeal there is an arbitrary max size of 2MB per node imposed by Bitswap, anything bigger must be sharded so that it doesn't clog the pipes. |
Ah, good to know, is there anything in dag-cbor that standardizes a way to
break up a node so that you can shard it like this and still support the
{cid}/path/subpath lookups without change?
…On Mon, Jun 25, 2018, 2:17 AM David Dias ***@***.***> wrote:
@mikeal <https://github.com/mikeal> there is an arbitrary max size of 2MB
per node imposed by Bitswap, anything bigger must be sharded so that it
doesn't clog the pipes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#52 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAACQ9jchp3Ct7kSJrOkND6XnrIYm3O6ks5uAKqlgaJpZM4NQsOd>
.
|
@diasdavid long term, is the plan to standardize how dag nodes are split up to stay under this size or is the plan to make changes to bitswap to move around parts of a block? |
It is hard to define a single standard for IPLD Sharding (different use cases require different data structures), although we think we can make it happen through specifying multiple layouts and incorporating the metadata about the layout in a root node. This way an IPLD Graph would be self describable on how it is serialized and also would pack a schema reference at the root node of the layout used (ideally, the traversal of the graph is all in Webassembly so that we only write these once). Read more about this on the IPLD Sharding discussions - ipfs/notes#76 Meanwhile, what we have done for Unixfs is to use a HAMT (Hash Array Mapped Trie). See the implementation at -- https://github.com/ipfs/js-ipfs-unixfs-engine/tree/master/src/hamt -- and notes from convos at: ipfs/specs#32 && ipfs/notes#216 && ipfs/kubo#3042. |
@diasdavid thanks, I've been getting deep into the current IPLD API and will have a broader Issue filed about this and a few other things. I think that in order to handle this we we may need some interface changes. |
One thing I want to point out though, we keep talking about "sharding" as a single topic, but i think there are a few different use cases getting lumped in here that are conflicting.
In the case of 1, the full set of keys in the collection can't fit in memory. In the case of 2, they can easily fit in memory but just serialize to larger than 2MB. If we are lumping 2 into 1 we're essentially saying that once you hit just a few thousand links to a parent you're off in user level sharding land, which I don't think is productive. For 1 we probably want to build some form of sharding on top of the underlying IPLD implementation. These can be very use case specific and the underlying dag implementation is unaware, it's just getting lookups into For 2 we need changes to the IPLD interface so that node serialization can return multiple blocks (working on a POC we can poke at). This is implemented at that layer so that lookups are still just |
What's wrong with just increasing the size of the buffer to 4 megs for now, so any block that can be transferred over IPFS can also be decoded as dag-cbor? There is a single instance of the decoder, so I would think this would not be a big problem. As a second step optimize borc so it allocates on demand. |
I think the limit should be increased to 2MB (the Bitswap limitation) by default, but also made configurable in case you use IPLD outside a context where Bitswap is used. Any objections? |
afaik the bitswap limit is slightly less than 4mb as the real limit is libp2p message size which is 4mb - https://github.com/libp2p/go-libp2p-net/blob/70a8d93f2d8c33b5c1a5f6cc4d2aea21663a264c/interface.go#L20 minus bitswap headers |
I agree about using the bitswap limitation by default. But is bitswap actually limited to 2MB? I did some experiments and it seems that the limit at which I am able to transfer a block is at 4MB...
For 5MB it does not work. |
Can well be 4MB, I don't know where I got the 2MB number from :) |
Keep in mind that the setting in the actual CBOR JS code needs to be double the desired max size of the serialized node. |
Note: commit 1f7b7f1 increased the max size to 64MB! |
As of now one can't
ipfs.dag.get
a cbor node larger than 64k. This is an issue when building structures such as large hash-maps. The solution IMO would be to reallocate borc heap when cbor bigger than 64k is encountered, so that memory usage is kept as low as possible.Stack trace I got when attempting to
dag.get
large cbor:The text was updated successfully, but these errors were encountered: