Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tart pull: try to re-use APFS blocks by cloning the base image #864

Merged
merged 10 commits into from
Jul 25, 2024

Conversation

edigaryev
Copy link
Collaborator

@edigaryev edigaryev commented Jul 15, 2024

To be honest, I'm not a fan of this approach, because it's incredibly hard to measure the real-world impact of this, both in terms of CPU overhead and disk usage savings1.

The reason being that these variables are highly influenced by the contents of the base VM image (from which we're de-duplicating) and by the contents of the VM image to be pulled, yet we only have a few publicly available to run the tests on.

1: It feels that in some cases even losses, because we can't skip writing a zero block if the file we're basing off contains some other data in that place.

@edigaryev edigaryev requested a review from fkorotkov as a code owner July 15, 2024 19:07
Copy link
Contributor

@fkorotkov fkorotkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO seems like a great improvement especially since there is a way to punch a hole in a file. I just pulled two images using the latest released version of Tart and one from this branch. I pulled ghcr.io/cirruslabs/macos-sonoma-vanilla:latest and ghcr.io/cirruslabs/macos-sonoma-base:latest which is basically vanilla with couple brew installes.

With the released version of Tart these two images took 42.03 GB of space and with this change (without punch hole thing) it takes 41.63 GB which seems to liitle of a win.

PS will be great to log that Tart found a base image to verify the logic and give a user some feedback.


if chunk != actualContentsOnDisk {
try disk.seek(toOffset: offset)
disk.write(chunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can validate that the new chunk are zeroes here and use g syscall to erase a range. Here is an example that might work:

https://github.com/niw/HolePunch/blob/0f6716324915ddf073acb37e1dc268f38b071d3e/Sources/HolePunch/Command.swift#L76-L84

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in c110158.

@edigaryev edigaryev requested a review from fkorotkov July 17, 2024 09:19
@edigaryev
Copy link
Collaborator Author

PS will be great to log that Tart found a base image to verify the logic and give a user some feedback.

Implemented in a346ea6.

edigaryev added a commit that referenced this pull request Jul 17, 2024
fkorotkov pushed a commit that referenced this pull request Jul 17, 2024
edigaryev and others added 2 commits July 18, 2024 19:07
* Show how much deduplication happening

Improvement to the APFS deduplication logic which checks whether a disk image file `mayShareFileContent` with some other file, and then we put a custom attribute to track the deduplication since there is no way to get this information from APFS itself.

It's not 100% accurate but given that OCI cache is immutable the actual disk usage can only be lover than that.

* Use string attribute

* Update Sources/tart/URL+Prunable.swift

Co-authored-by: Nikolay Edigaryev <[email protected]>

* Added SizeOnDisk colume

---------

Co-authored-by: Nikolay Edigaryev <[email protected]>
@fkorotkov fkorotkov enabled auto-merge (squash) July 25, 2024 15:22
@fkorotkov fkorotkov merged commit 1b81b12 into main Jul 25, 2024
6 checks passed
@fkorotkov fkorotkov deleted the apfs-reuse branch July 25, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants