Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked: support converting existing images #1673

Merged
merged 2 commits into from
Jul 28, 2023

Conversation

giuseppe
Copy link
Member

if the "convert_images" option is set in the configuration file, then convert traditional images to the chunked format on the fly.

This is very expensive at the moment since the entire zstd:chunked file is created and then processed.

@giuseppe giuseppe force-pushed the chunked-convert-on-the-fly branch 3 times, most recently from b006a54 to 14dbdf7 Compare July 25, 2023 08:37
@giuseppe giuseppe marked this pull request as ready for review July 25, 2023 09:53
@giuseppe giuseppe force-pushed the chunked-convert-on-the-fly branch from 14dbdf7 to 32b4edd Compare July 25, 2023 09:59
@giuseppe giuseppe marked this pull request as draft July 25, 2023 10:00
@giuseppe giuseppe marked this pull request as ready for review July 25, 2023 12:40
@rhatdan
Copy link
Member

rhatdan commented Jul 25, 2023

@nalind @mtrmac @vrothberg PTAL

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 25, 2023

What does this do, really? I don’t understand what “convert traditional images” means; GetDiffer is used on the “apply a remote layer to local c/storage” path AFAIK.

Best I can tell this creates a “staging directory” for ApplyDiffFromStagingDirectory from non-chunked data. Why is that useful?

To the extent that this creates a zstd:chunked-formatted tar file from some other tar file, how does this differ from skopeo copy --dest-compress-format=zstd:chunked? In the extreme, even a skopeo copy containers-storage:… containers-storage:… ?

@giuseppe
Copy link
Member Author

What does this do, really? I don’t understand what “convert traditional images” means; GetDiffer is used on the “apply a remote layer to local c/storage” path AFAIK.

the current code in chunked handles only layers that are either in zstd:chunked or eStargz format, so that each file can be retrieved individually.

The current PR instead enables pkg/chunked to handle traditional images that are not in these formats. Of course, in this case, we won't be able to pull only the missing files but we still need to retrieve the entire layer. On the other hand, we can now use these files for host deduplication, and more importantly, use them with composefs that use a different format to store the files (a CAS where the file name is its checksum).

It could be implemented in a more performant way where we pull the layer and handle directly the tarball stream, instead, the current implementation first converts the layer to the zstd:chunked format and then use the local copy with the existing code path.

This is different than using skopeo copy since the conversion is just an internal detail

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 25, 2023

So this is, a very likely completely temporary way to get data in a convenient format for creating composefs layers, until an implementation building the relevant file structure directly is built? And composefs images are going to be built via ApplyDiffFromStagingDirectory or something very similar?

@giuseppe
Copy link
Member Author

And composefs images are going to be built via ApplyDiffFromStagingDirectory or something very similar?

yes, the composefs images are built from ApplyDiffFromStagingDirectory

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A fairly brief look I’m afraid), code looks reasonable.

I’m a bit concerned about the code structure / lifetimes (how conversion is not done in GetDiffer, how many fields in chunkedDiffer are set to values that refer to a temporary file that will be gone by the time ApplyDiff returns. Some of that might be inevitable if we don’t want to break external API, and I didn’t really check whether the internal data structures in this file could be restructured.


c.fileType = fileTypeZstdChunked
c.manifest = manifest
c.convertToZstdChunked = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the conversion is deferred to ApplyDiff primarily because the object returned by GetDiffer does not have a Close method that could be used to clean up?

But then… why set this to false if fileSource only lives for the duration of this function? If ApplyDiff is ever called again (which… should not happen?), the conversion will need to be done again, won’t it?


I’d prefer sharing more code with makeZstdChunkedDiffer, if this PR is not explicitly intended as a very short-term hack that will be completely reimplemented in a short order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the c.convertToZstdChunked = false as it is misleading, ApplyDiff is not called recursively.

The conversion is deferred to ApplyDiff mostly because GetDiffer doesn't have access to the staging directory for the layer yet. In this way, we can write the temporary tar file there

pkg/chunked/storage_linux.go Outdated Show resolved Hide resolved
pkg/chunked/storage_linux.go Outdated Show resolved Hide resolved
giuseppe added 2 commits July 26, 2023 10:46
if the "convert_images" option is set in the configuration file, then
convert traditional images to the chunked format on the fly.

This is very expensive at the moment since the entire zstd:chunked
file is created and then processed.

Signed-off-by: Giuseppe Scrivano <[email protected]>
if the image is created locally there is no need to validate again the
files.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the chunked-convert-on-the-fly branch from 32b4edd to 58405a5 Compare July 26, 2023 08:46
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM, with the caveat that I haven’t kept up with the other changes over the last month, I’m afraid.

@giuseppe
Copy link
Member Author

@rhatdan fine to merge?

@rhatdan
Copy link
Member

rhatdan commented Jul 28, 2023

LGTM

@rhatdan rhatdan merged commit c3da76f into containers:main Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants