Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve the fixed chunker for better sparse file support #5565

Closed
ThomasWaldmann opened this issue Dec 13, 2020 · 1 comment · Fixed by #5561
Closed

improve the fixed chunker for better sparse file support #5565

ThomasWaldmann opened this issue Dec 13, 2020 · 1 comment · Fixed by #5561
Assignees
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 13, 2020

we have a fixed blocksize chunker since a while in master branch that is good for reading raw disk files (e.g. virtual machine disks). also, it is rather simple Cython code (not hard to change/maintain C code as the variable size buzhash chunker).

often, these disk files are sparse, so unused blocks are not stored on disk.

for sparse files, there is a os.lseek(..., SEEK_HOLE/SEEK_DATA) api to discover the ranges that actually have data (are stored on-disk) and the ranges that are holes and have no data (the fs would just generate zeros when reading them).

the chunker currently reads input files completely. this could be optimized so it does not read holes from the fs, saving some all-zeros data shuffling from the fs code (kernel) to the borg code (userspace).

with the chunker being adapted to read only some ranges from the file (which have data we want) and not other ranges (which do not have data we want), this could be also used for other purposes in future (e.g. if we get a "changed blocks" list from some CBT "changed blocks tracking" mechanism outside of borg).

💰 there is a bounty for this

@ThomasWaldmann
Copy link
Member Author

Note: this is a sub-task, for more, see #14.

@ThomasWaldmann ThomasWaldmann changed the title improve sparse file support improve the fixed chunker for better sparse file support Dec 13, 2020
@ThomasWaldmann ThomasWaldmann added this to the hydrogen-rc1 milestone Dec 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant