Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Why are we chunking files and not chunking blocks? #300

Open
schomatis opened this issue Nov 2, 2018 · 4 comments
Open

Why are we chunking files and not chunking blocks? #300

schomatis opened this issue Nov 2, 2018 · 4 comments

Comments

@schomatis
Copy link

schomatis commented Nov 2, 2018

(This is a naive question but I haven't found a conclusive answer for it yet, sorry if I'm duplicating another issue).

While testing the new go-ipfs version (ipfs/kubo#5715) I confirmed once more how much impact can the chunk size have while transferring files. The best design explanation I found so far for chunking files is in ipfs/kubo#3104 and most of the issues seem to be related to problems that should be handled by other layers of the stack (e.g., holding large files in memory or on disk, and the correct block size for cache seems to be the job of the OS, partitioning a file to send it to a peer seems the job of the transport layer of the stack), the most definite answer I've found in that issue is a DoS concern because I would need the entire (e.g., 4 GB) file before knowing if it's valid or not, but the entire file contents if stored in a single block could be logically (but not physically) partitioned to compute partial hashes, e.g, the block could have a prefix with a list (or tree) of hashes of the different sectors and we could progressively verify if the data is valid or not (I've found a similar idea in a discuss thread).

My main point is that it seems a problem for the block layer and not the file (UnixFS) layer. Partitioning a directory (e.g., with a HAMT) makes sense (to me) because files are loosely coupled, but the contents of a single file seem to be coupled enough to avoid the partition (I mean, it's common to request a single file from a server without pulling the entire directory list but you'd normally want the entire file). I'm not against partitioning files, there are valid cases for that (e.g., when streaming or when handling very large files), but a 256 KB default block size is putting a hard constraint on the transport and storage layers (which has a considerable performance impact in the bitswap, the datastore, the pinning system and potentially other components I'm not aware of).

The best argument I can find for this is that something I've learned from PL documents and presentations is that deployment and adoption matter a lot. Right now the web (seems to me) is structured around mostly atomic files, and maybe chunking a file is ultimately the way to go but we should strive for a transition that is as smooth as possible for the user and right now we are enforcing a 256 KB partition that does have a very clear performance impact (something that the user will notice).

@schomatis
Copy link
Author

I'm cc'ing @Stebalien since it's basically his fault I started thinking about all this :), does any of this makes sense? do you see a path for implementing this at the block layer (maybe as an experimental option) without major breakage? do we have an estimation of the performance cost this chunking brings to the table?

@Stebalien
Copy link
Member

So, a chunked file means we can:

  1. Use special chunking algorithms for better dedup. E.g., we can chunk a disk image/tar into the individual files.
  2. Append to a file without creating an entirely new file.
  3. Modify a file without rewriting the entire file.
  4. Stream a video without downloading the entire file.

do you see a path for implementing this at the block layer (maybe as an experimental option) without major breakage?

I'm not sure what you mean by a "block layer". Operating systems have block storage devices but filesystems are always responsible for the actual "chunking" and managing blocks.

Now, if you mean, "we should have a layer that just does the right thing and abstracts away sharding" I agree in principle, but it's a bit tricky. In the past, there was a long debate on whether or not we should just build this into IPLD and the decision was to build it on-top of IPLD instead.

However, the application (in this case, unixfs) still needs to control what sharding algorithm is used for what files (depending on the expected access patterns).

UnixFS 2.0 is actually trying to do this, to some extent. It will use a reusable HAMT implementation for sharded directories and, hopefully, a generalizable "sharded blob" for (see ipld/legacy-unixfs-v2#13) for storing data. If we do this right, we should be able to create a nice layer between unixfs and IPLD that hides all this sharding under the covers.

@Stebalien
Copy link
Member

Note, the solutions to the performance issues are things like:

  1. Send multiple blocks in a single bitswap message (we're finally doing this).
  2. Ask for more data at once (we currently limit the wantlist size to, I believe 8?). Upping this number depends on having bitswap sessions implemented as we currently send this wantlist to all neighbors.
  3. Better prefetch. That is, we need a prefetcher that not only looks ahead in the direct children of some shard but looks ahead in neighboring shards.
  4. DagSync. This will allow us to say "give me all chunks in this range, file, etc". While these pieces will still technically be chunked, the chunking will effectively disappear at the transport layer.
  5. Better datastores. We can write datastores that store materialized files (or even just cache them), preload related chunks, etc.

Basically:

but the entire file contents if stored in a single block could be logically (but not physically) partitioned to compute partial hashes, e.g, the block could have a prefix with a list (or tree) of hashes of the different sectors and we could progressively verify if the data is valid or not (I've found a similar idea in a discuss thread).

That is, we can erase chunking at all layers where it's a performance issue but having it means we can use it when we need it.

@Stebalien
Copy link
Member

(but yeah, our current approach of "next chunk please" is a killer in terms of performance)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants