Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

Support for bigger dag nodes out of the box #52

Closed
magik6k opened this issue May 4, 2017 · 18 comments
Closed

Support for bigger dag nodes out of the box #52

magik6k opened this issue May 4, 2017 · 18 comments
Assignees
Labels
exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked

Comments

@magik6k
Copy link

magik6k commented May 4, 2017

As of now one can't ipfs.dag.get a cbor node larger than 64k. This is an issue when building structures such as large hash-maps. The solution IMO would be to reallocate borc heap when cbor bigger than 64k is encountered, so that memory usage is kept as low as possible.

Stack trace I got when attempting to dag.get large cbor:

decoder.js:552 Uncaught (in promise) RangeError: Source is too large
    at Uint8Array.set (native)
    at Decoder._decode (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:52712:18)
    at Decoder.decodeFirst (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:52736:11)
    at Object.exports.deserialize (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:51925:29)
    at waterfall (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:20381:19)
    at nextTask (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:24233:15)
    at next (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:24240:10)
    at http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:23409:17
    at store.get (http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:103105:10)
    at http://localhost:63342/root/target/scala-2.12/scalajs-bundler/main/root-fastopt-bundle.js:28188:8
@daviddias daviddias added the kind/bug A bug in existing code (including security flaws) label May 4, 2017
@daviddias daviddias added the status/deferred Conscious decision to pause or backlog label Jul 3, 2017
@daviddias
Copy link
Member

@dignifiedquire this was fixed, correcT? Do you confirm, @wanderer ?

@daviddias daviddias added status/ready Ready to be worked exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue P1 High: Likely tackled by core team if no one steps up and removed status/deferred Conscious decision to pause or backlog labels Oct 13, 2017
@ProjectAtlantis-dev
Copy link

Is there a workaround? I need to load some real industry data for a demo - which we can all benefit from

@daviddias
Copy link
Member

ping @dignifiedquire

@daviddias
Copy link
Member

@dignifiedquire didn't you fix this already?

@mikeal
Copy link
Contributor

mikeal commented Jun 22, 2018

I'm still running into this in the latest release. I was able to fix it locally by adding a size attribute to the cbor decoder options: https://github.com/ipld/js-ipld-dag-cbor/blob/master/src/util.js#L33

I'm not sure what the permanent fix should be. What's a reasonable size default size? How do we expose a way for people to increase the size?

@daviddias
Copy link
Member

@mikeal there is an arbitrary max size of 2MB per node imposed by Bitswap, anything bigger must be sharded so that it doesn't clog the pipes.

@mikeal
Copy link
Contributor

mikeal commented Jun 25, 2018 via email

@mikeal
Copy link
Contributor

mikeal commented Jun 25, 2018

@diasdavid long term, is the plan to standardize how dag nodes are split up to stay under this size or is the plan to make changes to bitswap to move around parts of a block?

@daviddias
Copy link
Member

daviddias commented Jun 25, 2018

It is hard to define a single standard for IPLD Sharding (different use cases require different data structures), although we think we can make it happen through specifying multiple layouts and incorporating the metadata about the layout in a root node. This way an IPLD Graph would be self describable on how it is serialized and also would pack a schema reference at the root node of the layout used (ideally, the traversal of the graph is all in Webassembly so that we only write these once).

Read more about this on the IPLD Sharding discussions - ipfs/notes#76

Meanwhile, what we have done for Unixfs is to use a HAMT (Hash Array Mapped Trie). See the implementation at -- https://github.com/ipfs/js-ipfs-unixfs-engine/tree/master/src/hamt -- and notes from convos at: ipfs/specs#32 && ipfs/notes#216 && ipfs/kubo#3042.

@mikeal
Copy link
Contributor

mikeal commented Jun 25, 2018

@diasdavid thanks, I've been getting deep into the current IPLD API and will have a broader Issue filed about this and a few other things. I think that in order to handle this we we may need some interface changes.

@mikeal
Copy link
Contributor

mikeal commented Jun 25, 2018

One thing I want to point out though, we keep talking about "sharding" as a single topic, but i think there are a few different use cases getting lumped in here that are conflicting.

  1. Inserting and retrieving a single node in a very large collection (Billions of keys).
  2. Segmenting a single node across multiple blocks (Thousands of keys).

In the case of 1, the full set of keys in the collection can't fit in memory. In the case of 2, they can easily fit in memory but just serialize to larger than 2MB. If we are lumping 2 into 1 we're essentially saying that once you hit just a few thousand links to a parent you're off in user level sharding land, which I don't think is productive.

For 1 we probably want to build some form of sharding on top of the underlying IPLD implementation. These can be very use case specific and the underlying dag implementation is unaware, it's just getting lookups into /:hash1/:hash2/key.

For 2 we need changes to the IPLD interface so that node serialization can return multiple blocks (working on a POC we can poke at). This is implemented at that layer so that lookups are still just /key.

@rklaehn
Copy link

rklaehn commented Oct 25, 2018

What's wrong with just increasing the size of the buffer to 4 megs for now, so any block that can be transferred over IPFS can also be decoded as dag-cbor? There is a single instance of the decoder, so I would think this would not be a big problem.

As a second step optimize borc so it allocates on demand.

@vmx
Copy link
Member

vmx commented Oct 25, 2018

I think the limit should be increased to 2MB (the Bitswap limitation) by default, but also made configurable in case you use IPLD outside a context where Bitswap is used. Any objections?

@magik6k
Copy link
Author

magik6k commented Oct 25, 2018

afaik the bitswap limit is slightly less than 4mb as the real limit is libp2p message size which is 4mb - https://github.com/libp2p/go-libp2p-net/blob/70a8d93f2d8c33b5c1a5f6cc4d2aea21663a264c/interface.go#L20 minus bitswap headers

@rklaehn
Copy link

rklaehn commented Oct 25, 2018

I agree about using the bitswap limitation by default. But is bitswap actually limited to 2MB? I did some experiments and it seems that the limit at which I am able to transfer a block is at 4MB...

$ head -c 4000000 /dev/urandom | ipfs block put
QmS1NELQph4KzgXzTHWSk4ioPLuC2MjNyW8KkZ57feAns1
$ ssh someothernode ipfs block get QmS1NELQph4KzgXzTHWSk4ioPLuC2MjNyW8KkZ57feAns1 | wc -l
4000000

For 5MB it does not work.

@vmx
Copy link
Member

vmx commented Oct 25, 2018

Can well be 4MB, I don't know where I got the 2MB number from :)

@mikeal
Copy link
Contributor

mikeal commented Oct 30, 2018

Keep in mind that the setting in the actual CBOR JS code needs to be double the desired max size of the serialized node.

@vmx vmx added P2 Medium: Good to have, but can wait until someone steps up and removed P1 High: Likely tackled by core team if no one steps up labels Nov 14, 2018
@chafey
Copy link

chafey commented Feb 26, 2020

Note: commit 1f7b7f1 increased the max size to 64MB!

@rvagg rvagg closed this as completed Mar 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

9 participants