-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Length Limit for CID/Multihash? #4918
Comments
For now the wrapper blockstore enforces a limit of 64 bytes. |
Why need a limit? |
Very large keys are likely to create a problem somewhere. Right now the biggest problem is displaying them. I am not against allowing identity hashes of a few kbytes in directory entries as long as we do something about displaying them, one idea is to replace the key with something like |
Yeah, the issue is for people. |
BTW: the 64 byte (or 512 bits) limits comes from the fact that is the largest output size of modern cryptographic hash functions. |
Another idea is to impose a length limit on the display of hashes that limit will be the size required to display a CID of a hash of of 64 bytes (which will vary depending on the multibase used) after that the length will be truncated with the a |
We can use link index in block for display or get data/link.
We already have limits of block size. No need limits for it parts. If link not fit to this block limit then move it to sub block. If link not fit to block limit then error. It can be used for micro tar files in dir block: |
The issue here is that people need to use CIDs. That is, For unixfs2, we may want to consider inline files but that changes some of the semantics of ipfs (e.g., not all files will be directly addressable without indirecting through a directory). |
@Stebalien if will allow long id-hashes we effectively allow inline files, you could access it via an a very long hash string, but no sane person will want to do that, so you effectively have to access via the directory entry. If we allow this the only issue is with display. I can think of two possibilities, we replace the hash with something like
or we truncate long hashes
Where we determine some metric for the maxim length. |
1MB is way too large. See below.
They might want to in a pinch ( i.e. useful when the IPFS network is unavailable for some reason ). What we need to design for is ensuring that it is always possible.to do so. The easiest hard limitation that comes to mind is: the URL length. GIven there is desire to move over to base32 encoding in the future, this translates to a max of:
To allow larger leading URLs we should probably cap this at 1200 There is one more consideration: bookmarks could be capped at 260 characters on some browsers. If we take that as the basis ( should we? ) we get a maximum_data_length of only 141 bytes |
Pin underline block. Index of link in block migth be used to limit pin. Data URI scheme is analog of identity hash in web. |
So, for unixfs2, we actually don't need to use CIDs for this. Instead, we can just allow inline files:
The primary motivation for inline CIDs is that it allows us to concisely point to inlined objects. If we can't do that (e.g., have a large, unwieldily CID), then there isn't much of a point in using CIDs. |
@Stebalien what about inline data in files in any place? |
@Stebalien unixfs2 is not defined yet, and there is no clear consensus on what to include (that is unless I am missing something). |
In particular I am not sure allowing inline files is such a good idea, at very least it requires careful though. I am more included to allow larger identity hashes. |
CIDs need to remain usable in paths. I agree that we'd need to be careful about inline files (and I'm not even sure if should support them) but CIDs must remain usable by humans. We need to focus on the motivations:
|
@Stebalien do you have thoughts on the ~140byte figure I derived in #4918 (comment) ? |
This needs to be decided on, changing it later will create comparability problems. I would prefer to use a nice power of two size. Right now I am trying to decide between 64 and 128 bytes for the length of the hash component. If we continue to use 256 bit (=32 bytes) hashes than 64 satisfies the requirement "CIDs shouldn't be more than ~50% of the size of the file (max): Inline CIDs.". 128 bytes will be a bit more flexible, while it will create annoying long hashes they are still manageable, for example when part of the URL as described above. @Stebalien @whyrusleeping @Kubuxu (others?) thoughts? |
One thing to remember is that we're talking about a cutoff, not a maximum. That is, we're not forbidding users from creating larger CIDs, we're saying that all files smaller than X will be embedded directly in the CID. So, if we say 140 or even 128, we will generate a ton of 140/128 byte CIDs automatically. 128 looks like:
On the other hand, a 64 byte (base58 encoded) CID looks like:
Currently, a V1 CID looks like:
Honestly, even the 64 byte CID is a bit long. We may even want to consider 52 bytes as the resulting path (including the prefix) will fit in 78 bytes (under 80 characters):
However, that doesn't give us much room to work with. So, I'd say that 64 is a max, at the very least. |
Actually that was part of the plan, unless we create a special rule that says id hashes can not be longer then XX, but other CIDs could, which doesn't make sense to me. We could just allow longer CIDs and just not use them by default as I don't think that will break anything, and now that I think about it might be the best way forward. Also I want to be clear when I say 64 bytes, I mean a 64 byte digest length, the complete Cid with the prefix (including the multihash one) will likely be a bit longer. If we do set a hard limit then 64 should be a absolute minimum in case someday we want to the the full 512 bits of some crypto. hashes. Also note, if we really want to consider things such as display width (which I don't think is such a good idea), consider that we may switch to base 32, which is slightly longer. I now see three options
I am now leaning towards (1), and possible going with (2). (3) is an option but I don't see a technical reason to force CID length to this. |
It is important for me that the program can understand the identifier of any size. And don't discards blocks with them. |
I've moved this to the CID repo as it's a spec issue and we need to involve people out side of go. |
New issue: multiformats/cid#21 |
Right now (and to my surprise) there doesn't seam to be any length limit. When identity hashes are used (see #4910) , an 18k file with hashed with the identity hash just works. I am even able to add that file to a directory entry with the files API and retrieve it.
The text was updated successfully, but these errors were encountered: