Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunker: sparsify if all-zero data was read #5566

Closed
ThomasWaldmann opened this issue Dec 13, 2020 · 1 comment · Fixed by #5620
Closed

chunker: sparsify if all-zero data was read #5566

ThomasWaldmann opened this issue Dec 13, 2020 · 1 comment · Fixed by #5620
Assignees
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 13, 2020

related to #14:

once we have better sparse handling inside borg (communicating data vs. hole status between chunker, hashing, ...), it makes sense to add a sparsify functionality.

assume we read all-zero chunks from a file:

  1. could be a hole (no space allocated on disk) [we can avoid this with upcoming change from sparse map / file map support for fixed size chunker #5561]
  2. could be file space allocated, but never used (fallocate)
  3. could be zeros stored on disk

for the last 2 cases, we could add a --sparsify to borg create:

just compare the chunk content to an all-zero bytearray.

if it compares equal and --sparsify was given, the chunker shall (at least) yield "this is a hole of length X".

later it could yield (if we can detect somehow):

  • "this is a not-allocated hole of length X" (case 1)
  • "this is fallocated unused space of length X" (case 2)
  • "this is on-disk all-zero space of length X" (case 3)
@ThomasWaldmann ThomasWaldmann added this to the hydrogen milestone Jan 3, 2021
@ThomasWaldmann
Copy link
Member Author

this is implemented by #5620 within the buzhash and fixed chunker.

currently, the chunker detects zeros and yields as described above.

the hasher maintains a LRU cache of (hashalgo, size_of_allzero_chunk) -> hashvalue to avoid computing hashes of all-zero chunks again and again.

note that this does not affect storage (yet?) because in any case, borg will compress and store the zeros as if they were normal data. so the archived item does not know what was a sparse hole or fallocated unused space or just normally on-disk zeros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant