You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we have a fixed blocksize chunker since a while in master branch that is good for reading raw disk files (e.g. virtual machine disks). also, it is rather simple Cython code (not hard to change/maintain C code as the variable size buzhash chunker).
often, these disk files are sparse, so unused blocks are not stored on disk.
for sparse files, there is a os.lseek(..., SEEK_HOLE/SEEK_DATA) api to discover the ranges that actually have data (are stored on-disk) and the ranges that are holes and have no data (the fs would just generate zeros when reading them).
the chunker currently reads input files completely. this could be optimized so it does not read holes from the fs, saving some all-zeros data shuffling from the fs code (kernel) to the borg code (userspace).
with the chunker being adapted to read only some ranges from the file (which have data we want) and not other ranges (which do not have data we want), this could be also used for other purposes in future (e.g. if we get a "changed blocks" list from some CBT "changed blocks tracking" mechanism outside of borg).
we have a fixed blocksize chunker since a while in master branch that is good for reading raw disk files (e.g. virtual machine disks). also, it is rather simple Cython code (not hard to change/maintain C code as the variable size buzhash chunker).
often, these disk files are sparse, so unused blocks are not stored on disk.
for sparse files, there is a
os.lseek(..., SEEK_HOLE/SEEK_DATA)
api to discover the ranges that actually have data (are stored on-disk) and the ranges that are holes and have no data (the fs would just generate zeros when reading them).the chunker currently reads input files completely. this could be optimized so it does not read holes from the fs, saving some all-zeros data shuffling from the fs code (kernel) to the borg code (userspace).
with the chunker being adapted to read only some ranges from the file (which have data we want) and not other ranges (which do not have data we want), this could be also used for other purposes in future (e.g. if we get a "changed blocks" list from some CBT "changed blocks tracking" mechanism outside of borg).
💰 there is a bounty for this
The text was updated successfully, but these errors were encountered: