Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ACR UploadBlob method to upload a blob in chunks #32059

Merged
merged 23 commits into from
Nov 11, 2022

Conversation

annelo-msft
Copy link
Member

@annelo-msft annelo-msft commented Oct 27, 2022

Fixes #32339

The current implementation of ContainerRegistryBlobClient.UploadBlob() tries to upload an artifact layer in a single chunk, and if the layer size is large, that can cause the service to close the connection during upload, and the upload to fail.

This PR reimplements UploadBlob() to upload the layer in chunks, and optionally allows the caller to specify the chunk size.

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

Azure.Containers.ContainerRegistry

@annelo-msft annelo-msft marked this pull request as ready for review November 1, 2022 16:34
@annelo-msft annelo-msft changed the title [Draft - in progress] Upload blob in chunks Update ACR UploadBlob method to upload a blob in chunks Nov 1, 2022
@pallavit
Copy link
Contributor

pallavit commented Nov 3, 2022

public partial class ContainerRegistryBlobClient

This is also a perfect client candidate for perf testing for the SDK.


Refers to: sdk/containerregistry/Azure.Containers.ContainerRegistry/api/Azure.Containers.ContainerRegistry.netstandard2.0.cs:230 in 2cda954. [](commit_id = 2cda954, deletion_comment = False)

@JoshLove-msft
Copy link
Member

Can you point me to more information about not buffering the stream? Is this something we've typically done in other libraries for large files?

By buffering the stream, I mean reading the whole stream into a memory stream. I could see how this could cause OOM issues. But we weren't doing this in the previous implementation, so I was curious where the memory issue was actually occurring. I would imagine it is probably in the transport layer if the transport ends up attempting to send a single request.

@annelo-msft
Copy link
Member Author

This is also a perfect client candidate for perf testing for the SDK.

@pallavit, tracking with #32415

@annelo-msft
Copy link
Member Author

annelo-msft commented Nov 10, 2022

By buffering the stream, I mean reading the whole stream into a memory stream. I could see how this could cause OOM issues. But we weren't doing this in the previous implementation, so I was curious where the memory issue was actually occurring. I would imagine it is probably in the transport layer if the transport ends up attempting to send a single request.

I still haven't heard back from the service team on this, but looking at the information they've given us, it looks like the OOM issue they were experience was on download of a large blob. The only thing I've been able to repro on upload is the service closing the connection, presumably because we're sending too much data/data too slowly:

----> System.Net.Http.HttpRequestException : Error while copying content to a stream.
----> System.IO.IOException : Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host..
----> System.Net.Sockets.SocketException : An existing connection was forcibly closed by the remote host.

Uploading in smaller chunks does mitigate this.

I'll update the PR description to more accurately reflect the issue as well - thanks, @JoshLove-msft!

@annelo-msft
Copy link
Member Author

Ingestion is going to use an ArrayPool (which is better for large amounts of data): https://learn.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1?view=net-6.0. Maybe that would work here as well? :)

@nisha-bhatia, I plan to re-work the implementation to possibly use ArrayPool when I implement chunked/parallel download. Thanks again for the suggestion!

@annelo-msft annelo-msft merged commit d060f7b into Azure:main Nov 11, 2022
sofiar-msft pushed a commit to sofiar-msft/azure-sdk-for-net that referenced this pull request Dec 7, 2022
* added Size to UploadBlobResult

* Add test for pull and record tests

* first pass on chunked upload; plus basic large file upload test

* compute digest in chunks

* fix bugs

* refactor and export API

* re-record failing tests

* add sync tests

* EOD WIP

* bug fix - incr chunk count

* small refactor of Content-Range header

* don't compute digest 2x; validate chunkLength input

* refactor to not seek streams we don't own

* validate digest on blob upload and add tests for blob and manifest uploads on non-seekable streams

* re-record tests for new chunked-upload approach and add recordings for new tests.

* fix failing tests

* don't allocate a large buffer to upload a small chunk, and PR fb.

* re-record failing tests

* fix missed config

* missed test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable support for large blob upload in the BlobClient
6 participants