-
Notifications
You must be signed in to change notification settings - Fork 30
Why are we chunking files and not chunking blocks? #300
Comments
I'm cc'ing @Stebalien since it's basically his fault I started thinking about all this :), does any of this makes sense? do you see a path for implementing this at the block layer (maybe as an experimental option) without major breakage? do we have an estimation of the performance cost this chunking brings to the table? |
So, a chunked file means we can:
I'm not sure what you mean by a "block layer". Operating systems have block storage devices but filesystems are always responsible for the actual "chunking" and managing blocks. Now, if you mean, "we should have a layer that just does the right thing and abstracts away sharding" I agree in principle, but it's a bit tricky. In the past, there was a long debate on whether or not we should just build this into IPLD and the decision was to build it on-top of IPLD instead. However, the application (in this case, unixfs) still needs to control what sharding algorithm is used for what files (depending on the expected access patterns). UnixFS 2.0 is actually trying to do this, to some extent. It will use a reusable HAMT implementation for sharded directories and, hopefully, a generalizable "sharded blob" for (see ipld/legacy-unixfs-v2#13) for storing data. If we do this right, we should be able to create a nice layer between unixfs and IPLD that hides all this sharding under the covers. |
Note, the solutions to the performance issues are things like:
Basically:
That is, we can erase chunking at all layers where it's a performance issue but having it means we can use it when we need it. |
(but yeah, our current approach of "next chunk please" is a killer in terms of performance) |
(This is a naive question but I haven't found a conclusive answer for it yet, sorry if I'm duplicating another issue).
While testing the new
go-ipfs
version (ipfs/kubo#5715) I confirmed once more how much impact can the chunk size have while transferring files. The best design explanation I found so far for chunking files is in ipfs/kubo#3104 and most of the issues seem to be related to problems that should be handled by other layers of the stack (e.g., holding large files in memory or on disk, and the correct block size for cache seems to be the job of the OS, partitioning a file to send it to a peer seems the job of the transport layer of the stack), the most definite answer I've found in that issue is a DoS concern because I would need the entire (e.g., 4 GB) file before knowing if it's valid or not, but the entire file contents if stored in a single block could be logically (but not physically) partitioned to compute partial hashes, e.g, the block could have a prefix with a list (or tree) of hashes of the different sectors and we could progressively verify if the data is valid or not (I've found a similar idea in a discuss thread).My main point is that it seems a problem for the block layer and not the file (UnixFS) layer. Partitioning a directory (e.g., with a HAMT) makes sense (to me) because files are loosely coupled, but the contents of a single file seem to be coupled enough to avoid the partition (I mean, it's common to request a single file from a server without pulling the entire directory list but you'd normally want the entire file). I'm not against partitioning files, there are valid cases for that (e.g., when streaming or when handling very large files), but a 256 KB default block size is putting a hard constraint on the transport and storage layers (which has a considerable performance impact in the bitswap, the datastore, the pinning system and potentially other components I'm not aware of).
The best argument I can find for this is that something I've learned from PL documents and presentations is that deployment and adoption matter a lot. Right now the web (seems to me) is structured around mostly atomic files, and maybe chunking a file is ultimately the way to go but we should strive for a transition that is as smooth as possible for the user and right now we are enforcing a 256 KB partition that does have a very clear performance impact (something that the user will notice).
The text was updated successfully, but these errors were encountered: