-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
SPEC.md
Outdated
|
||
- 0: Array. Tuple containing two integers, the `start` and `end` offsets of the content. | ||
- 0: Integer: start offset. | ||
- 1: Integer: end offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that would be less error prone for implementations would be to give the size
of each chunk rather than the end of the offset.
If there are no objections in the next week I'm going to merge this into the |
So do I read correctly that if we compose a tree of file-data... the topmost node in the tree will have increasingly large lengths for each part entry, because it will be the sum of all children lengths? That sounds pretty simple and seek friendly indeed. 👍 |
Correct, but there is actually no requirement that the balance of the tree be symmetrical. For instance, it's probably more performant to always keep the first few parts in the root of the tree so that you can start reading a file from the beginning without another branch lookup in the tree. In general, I think the most efficient algorithm for building the tree will be to compact it backwards. |
We're going to need to be really careful with specing any tree laying behaviors like that to make sure there's one non-contentious Right Answer, for the sake of reproducibly/convergence when multiple uncoordinated users upload the same content. (True regardless of any tree balancing choice. Just wanted to say it out loud. 😬 ) |
Yup. One thing to keep in mind is that anything we change in the chunking
algorithn or even the way we write the parts for that algorithm will change
the hash of the final file object, so everything optional we do should be
written into the metadata somewhere so that another implementation could
reproduce the same file object.
…On Sat, Oct 6, 2018, 11:07 AM Eric Myhre ***@***.***> wrote:
***@***.**** approved this pull request.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAACQx3_SSoLnpUtmC1BIJ8bhwnvgWUfks5uiIDqgaJpZM4W08m2>
.
|
That's slightly different than what I suggested. Logging a bunch of meta info does not give reproducibility/convergence when multiple uncoordinated users upload the same content. The "uncoordinated" part is important. But this is probably something to be hashed out in not-this-PR. |
Relevant: #18 Otherwise, do we want to support offsets? That is, Second, do we implicitly support holes? Also, do we want to make this it's own spec? That is, a sharded byte array spec? (kind of like HAMT being a generic sharded map spec) |
Eventually, but we should probably punt on it for now in order to get this out quickly. We still don't even have a HAMT spec.
I could see how this would be useful in theory but I struggle to imagine the tooling that would end up creating it. Also, you'd want another property for the end of the range in the link.
Do unix filesystems support "holes?" If they don't then I'd argue we should try not to. I can't articulate the exact issues but my gut says this could lead to some odd security issues. |
I assume the range would be implied by One use-case would be @warpfork's "alternative dag" thing. That is, we'd be able to, e.g., take an existing file (in IPFS) and transform it into a TAR file while keeping the original blocks. With the current system, we can do this if we start out knowing we want to represent both a tar and a file but we can't necessarily start with one and convert to the other. Another use-case is slicing video (although the keyframes make this tricky). |
They usually do. They're called sparse files. |
Ah, yes, that makes sense. What kind of error condition do we want to have when the linked data is smaller than the described range? |
I actually also struggle to imagine using an |
I'd like to exclude these from our definition of unixfs, and dissent that they're usually supported by unix filesystems. Sparse files are supported by some filesystems, but they're not generally present in the POSIX APIs. You can't generally ask a filesystem if a file has sparse hunks in it. You could just infer it from a long run of zeros: but A) that's something I don't think we should really be doing, and B) I don't think most filesystems even do that; you have to use a And we don't need to special case this anyway. Long ranges of zeros in a file will... turn into a set of chunks of zeros... which will all dedup with each other. |
Here's a new take on the file data representation that attempts to solve many of the previous discussions.
data
field for special file types.This is a pretty compact structure but it's quite simple and makes seeking into file ranges rather trivial.