-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tart pull: try to re-use APFS blocks by cloning the base image #864
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO seems like a great improvement especially since there is a way to punch a hole in a file. I just pulled two images using the latest released version of Tart and one from this branch. I pulled ghcr.io/cirruslabs/macos-sonoma-vanilla:latest
and ghcr.io/cirruslabs/macos-sonoma-base:latest
which is basically vanilla with couple brew installes.
With the released version of Tart these two images took 42.03 GB of space and with this change (without punch hole thing) it takes 41.63 GB which seems to liitle of a win.
PS will be great to log that Tart found a base image to verify the logic and give a user some feedback.
|
||
if chunk != actualContentsOnDisk { | ||
try disk.seek(toOffset: offset) | ||
disk.write(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we can validate that the new chunk are zeroes here and use g
syscall to erase a range. Here is an example that might work:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in c110158.
Implemented in a346ea6. |
* Show how much deduplication happening Improvement to the APFS deduplication logic which checks whether a disk image file `mayShareFileContent` with some other file, and then we put a custom attribute to track the deduplication since there is no way to get this information from APFS itself. It's not 100% accurate but given that OCI cache is immutable the actual disk usage can only be lover than that. * Use string attribute * Update Sources/tart/URL+Prunable.swift Co-authored-by: Nikolay Edigaryev <[email protected]> * Added SizeOnDisk colume --------- Co-authored-by: Nikolay Edigaryev <[email protected]>
To be honest, I'm not a fan of this approach, because it's incredibly hard to measure the real-world impact of this, both in terms of CPU overhead and disk usage savings1.
The reason being that these variables are highly influenced by the contents of the base VM image (from which we're de-duplicating) and by the contents of the VM image to be pulled, yet we only have a few publicly available to run the tests on.
1: It feels that in some cases even losses, because we can't skip writing a zero block if the file we're basing off contains some other data in that place.