Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tart push: use fixed size chunks to allow for better deduplication #821

Merged
merged 1 commit into from
May 14, 2024

Conversation

edigaryev
Copy link
Collaborator

With this PR, we'll yield exactly 2 layers per GB of VM disk image when doing tart push, which at the moment equates to:

  • 100 layers for ghcr.io/cirruslabs/macos-sonoma-base:latest instead of the current 38 layers
  • 256 layers for ghcr.io/cirruslabs/macos-runner:sonoma instead of the current 162 layers

This slightly increases the bookkeeping costs, but allows for better deduplication in tart pull (in the upcoming PR).

Our calculations have shown that current layer deduplication ratios are:

  • 0% for distant and 63% for nearby tags for ghcr.io/cirruslabs/macos-sonoma-base:latest images
  • 0% for distant and 14% for nearby tags for ghcr.io/cirruslabs/macos-runner:sonoma images

With the fixed-size chunks of 500 MB, the deduplication ratios will be as follows:

  • 46% for distant and 73% for nearby tags for ghcr.io/cirruslabs/macos-sonoma-base:latest images
  • 36% for distant and 36% for nearby tags for ghcr.io/cirruslabs/macos-runner:sonoma images

@edigaryev edigaryev requested a review from fkorotkov as a code owner May 14, 2024 09:05
while let (compressedData, uncompressedSize, uncompressedDigest) = try compressNextLayerOfLimitBytesOrMore(mappedDisk: mappedDisk, offset: offset) {
offset += uncompressedSize
// each equal ``Self.layerLimitBytes`` bytes or less due to LZ4 compression
for data in mappedDisk.chunks(ofCount: layerLimitBytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put layerLimitBytes in the manifest in case it's gonna change at some point?

Copy link
Collaborator Author

@edigaryev edigaryev May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already there by the virtue of passing the UInt64(data.count) as uncompressedSize argument to OCIManifestLayer(...):

if let uncompressedSize = uncompressedSize {
annotations[uncompressedSizeAnnotation] = String(uncompressedSize)
}

@edigaryev edigaryev requested a review from fkorotkov May 14, 2024 12:15
@edigaryev edigaryev merged commit dbbd716 into main May 14, 2024
7 checks passed
@edigaryev edigaryev deleted the fixed-size-chunks branch May 14, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants