Skip to content

Commit

Permalink
Add FIXMEs about handling of zstd:chunked blob annotations on blob ch…
Browse files Browse the repository at this point in the history
…anges

It's completely undefined whether the OCI blob annotations apply to
the object as a concept, regardless of representation, or to the specific
binary representation. So it's unclear whether we should preserve or drop them
when compressing/decompressing/substituting blobs.

In particular, we currently don't truly correctly handle the zstd:chunked
annotations on:
- decompression (should be dropped)
- recompression (should be dropped)
- substitution (should be replaced by data about the other blob, if any; we don't record that)

Right now, we drop all annotations on decompression and recompression (which
happens to work fine), and preserve annotations on substitution (which is technically
invalid).

Luckily, the zstd:chunked use is opportunistic, and if the annotations are invalid
or not applicable, the manifest checksum fails, and we fall through to an ordinary pull;
so, that is not quite a deal breaker.

So, for now, just add FIXMEs recording the pain points.

To fix this truly correctly, we would need:
- a new metadataCleaner field in pkg/compression/internal.Algorithm
- a new pkg/compression.CleanMetadata
- turning public manifest.Manifest into internal/manifest.Manifest where we can add methods
- adding internal/manifest.Manifest.LayerInfosWithCompression that turns MIME types into compression.Algorithm values
- (using that in copy.copyLayer instead of the current hard-coded switch)
- then either defining a new alternative to UpdatedImage that can handle these annotations naturally,
  or all the marked users that need to clean the annotations themselves calling LayerInfosWithCompression
  and CleanMetadata on the affected blobs.
- recording the zstd annotations in the blob info cache
- reading those annotations when substituting blobs based on the cache

We should do all that long-term, but that's quite a lot of work to fix a metadata inconsistency
which we can currently silently, with moderate cost, hide from the user.

Signed-off-by: Miloslav Trmač <[email protected]>
  • Loading branch information
mtrmac committed Apr 3, 2023
1 parent 94fecf6 commit 73dc790
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 7 deletions.
4 changes: 2 additions & 2 deletions copy/compression.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ func (ic *imageCopier) bpcRecompressCompressed(stream *sourceStream, detected bp
recompressed, annotations := ic.compressedStream(decompressed, *ic.compressionFormat)
// Note: recompressed must be closed on all return paths.
stream.reader = recompressed
stream.info = types.BlobInfo{ // FIXME? Should we preserve more data in src.info?
stream.info = types.BlobInfo{ // FIXME? Should we preserve more data in src.info? Notably the current approach correctly removes zstd:chunked metadata annotations.
Digest: "",
Size: -1,
}
Expand All @@ -203,7 +203,7 @@ func (ic *imageCopier) bpcDecompressCompressed(stream *sourceStream, detected bp
}
// Note: s must be closed on all return paths.
stream.reader = s
stream.info = types.BlobInfo{ // FIXME? Should we preserve more data in src.info?
stream.info = types.BlobInfo{ // FIXME? Should we preserve more data in src.info? Notably the current approach correctly removes zstd:chunked metadata annotations.
Digest: "",
Size: -1,
}
Expand Down
6 changes: 3 additions & 3 deletions copy/single.go
Original file line number Diff line number Diff line change
Expand Up @@ -739,9 +739,9 @@ func updatedBlobInfoFromReuse(inputInfo types.BlobInfo, reusedBlob private.Reuse
res := types.BlobInfo{
Digest: reusedBlob.Digest,
Size: reusedBlob.Size,
URLs: nil, // This _must_ be cleared if Digest changes; clear it in other cases as well, to preserve previous behavior.
Annotations: inputInfo.Annotations,
MediaType: inputInfo.MediaType, // Mostly irrelevant, MediaType is updated based on Compression*/CryptoOperation.
URLs: nil, // This _must_ be cleared if Digest changes; clear it in other cases as well, to preserve previous behavior.
Annotations: inputInfo.Annotations, // FIXME: This should remove zstd:chunked annotations (but those annotations being left with incorrect values should not break pulls)
MediaType: inputInfo.MediaType, // Mostly irrelevant, MediaType is updated based on Compression*/CryptoOperation.
CompressionOperation: reusedBlob.CompressionOperation,
CompressionAlgorithm: reusedBlob.CompressionAlgorithm,
CryptoOperation: inputInfo.CryptoOperation, // Expected to be unset anyway.
Expand Down
2 changes: 1 addition & 1 deletion pkg/blobcache/src.go
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ func (s *blobCacheSource) LayerInfosForCopy(ctx context.Context, instanceDigest
case types.Compress:
info.MediaType = v1.MediaTypeImageLayerGzip
info.CompressionAlgorithm = &compression.Gzip
case types.Decompress:
case types.Decompress: // FIXME: This should remove zstd:chunked annotations (but those annotations being left with incorrect values should not break pulls)
info.MediaType = v1.MediaTypeImageLayer
info.CompressionAlgorithm = nil
}
Expand Down
2 changes: 1 addition & 1 deletion storage/storage_src.go
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ func buildLayerInfosForCopy(manifestInfos []manifest.LayerInfo, physicalInfos []
if nextPhysical >= len(physicalInfos) {
return nil, fmt.Errorf("expected more than %d physical layers to exist", len(physicalInfos))
}
res[i] = physicalInfos[nextPhysical]
res[i] = physicalInfos[nextPhysical] // FIXME? Should we preserve more data in manifestInfos? Notably the current approach correctly removes zstd:chunked metadata annotations.
nextPhysical++
}
}
Expand Down

0 comments on commit 73dc790

Please sign in to comment.