Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --sparse to only create holes in files that had holes in the original #1355

Closed
textshell opened this issue Jul 21, 2016 · 6 comments

Comments

@textshell
Copy link
Member

  • Add a is-sparse metadata bool that is true if the original file had any holes.
  • Use this flag on extraction to use sparse extraction mode if the original was sparse or normal extraction mode if it was not.
  • maybe add new option to select between never using sparse files and this mode. (always using sparse files already has an option)
@ThomasWaldmann
Copy link
Member

note: one needs to be careful with determining sparsity, see the unit tests how to do that (at least until we use SEEK_HOLE/SEEK_DATA and know for sure then by that).

@ThomasWaldmann
Copy link
Member

btw, this is no black-and-white thing. a file could have large runs of sparse zeros and also large runs of non-sparse zeros. VM software might do clever tricks here for performance reasons.

@textshell
Copy link
Member Author

Maybe there is some VM software that is really smart about it. Is there any example of one?
Some cases might be served better by perfect restoring of the holes like in #14.
This idea is easier to implement, but it might not be worth it, if the full sparse file replication is implemented.

@enkore
Copy link
Contributor

enkore commented Jul 21, 2016

I dunno, the layout of the holes depends on what the allocator of the FS decided to do and you can't force it. So I'm not even sure if that's (#14) really a feasible thing to do, or why it would make a difference (unless some software decides to track it's sparse files internally and breaks if that changes?!)

(this really belongs into #14)

@textshell
Copy link
Member Author

I think everything that has encodes semantics into holes is really broken. I can somehow understand that an advanced application might do smart things with performance vs storage requirement trade ofs. But i wonder if some application is really smart about this if it doesn‘t need an "reoptimize" feature anyway, because files sometimes need to be copied anyway. Depending on a special copy procedure that fatefully reproduces holes seems not like something that would be popular.

@enkore
Copy link
Contributor

enkore commented Oct 1, 2016

I'm closing this due to the complexities involved and that there doesn't seem to be any real use to this, but if someone really wants this -> reopen

@enkore enkore closed this as completed Oct 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants