Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Write #13344

Merged
merged 35 commits into from
Aug 12, 2020
Merged

Open Write #13344

merged 35 commits into from
Aug 12, 2020

Conversation

seanmcc-msft
Copy link
Member

Continued from #12779 and #12662

@ghost ghost added the Storage Storage Service (Queues, Blobs, Files) label Jul 9, 2020
@seanmcc-msft
Copy link
Member Author

@PaulVrugt, we are continuing Open Write here.

@PaulVrugt
Copy link

Cool, thanks for the heads up!

Copy link
Member

@jaschrep-msft jaschrep-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nitpicking, but there are some potential bugs to be addressed. Also some questions on tiny design elements (nothing major).
Will review tests when they are completed.

sdk/storage/Azure.Storage.Blobs/src/AppendBlobClient.cs Outdated Show resolved Hide resolved
sdk/storage/Azure.Storage.Blobs/src/AppendBlobClient.cs Outdated Show resolved Hide resolved
sdk/storage/Azure.Storage.Blobs/src/AppendBlobClient.cs Outdated Show resolved Hide resolved
@seanmcc-msft
Copy link
Member Author

We should test StorageWriteStream on its own in Azure.Storage.Common.Tests, mocking out the abstract behavior. I'm particularly interested in testing edge cases on combining Stream.WriteAsync and Stream.FlushAsync calls, observing when/how they trigger AppendInternal calls that actually send data to the service.

A lot of this logic is handled in the implementation streams, I'm not sure how valuable having tests just for StorageWriteStream will be.

Copy link
Member

@tg-msft tg-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Java implementation share the exact semantics around which conditions are used, updated, and cleared at various points in the life cycle? What about Track 1?

#pragma warning restore AZC0015 // Unexpected client method return type.
bool overwrite,
long position,
long? size = default,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to add an optional size parameter, and mentioned in the xml comments it is required if overwrite is set to true, or the Page Blob is being created for the first time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move at least size and maybe position (if it can default to 0 when creating a new page blob) into PageBlobOpenWriteOptions. Having optional params in both places ends up being a little confusing.

Copy link
Member Author

@seanmcc-msft seanmcc-msft Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved Size to PageBlobOpenWriteOptions. I'd like to keep position a required parameter to prevent users from overwriting the data at the beginning of a Page Blob by default.

private async Task<Stream> OpenWriteInternal(
bool overwrite,
long position,
long? maxSize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to add an optional maxSize parameter, and mentioned in the xml comments it is required if overwrite is set to true, or the File is being created for the first time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I think we should still move it to the options

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next iteration.

Copy link
Member

@tg-msft tg-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome progress - I think there are just a couple of other spots we need to update the etag.

Comment on lines 51 to 52
_conditions.IfAppendPositionEqual = null;
_conditions.IfMaxSizeLessThanOrEqual = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect these lines get removed now that we're splitting the concepts of OpenConditions and WriteConditions?

#pragma warning restore AZC0015 // Unexpected client method return type.
bool overwrite,
long position,
long? size = default,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move at least size and maybe position (if it can default to 0 when creating a new page blob) into PageBlobOpenWriteOptions. Having optional params in both places ends up being a little confusing.

Comment on lines +53 to +86
// We need a multiple of 512 to flush.
if (_buffer.Length % Constants.Blob.Page.PageSizeBytes != 0)
{
int bytesToWrite = (int)(Constants.Blob.Page.PageSizeBytes - _buffer.Length % Constants.Blob.Page.PageSizeBytes);
await WriteToBufferInternal(buffer, offset, bytesToWrite, async, cancellationToken).ConfigureAwait(false);
remaining -= bytesToWrite;
offset += bytesToWrite;
}

// Flush the buffer.
await AppendInternal(async, cancellationToken).ConfigureAwait(false);

while (remaining > 0)
{
await WriteToBufferInternal(
buffer,
offset,
(int)Math.Min(remaining, _bufferSize),
async,
cancellationToken).ConfigureAwait(false);

// Remaining bytes won't fit in buffer.
if (remaining > _bufferSize)
{
await AppendInternal(async, cancellationToken).ConfigureAwait(false);
remaining -= (int)_bufferSize;
offset += (int)_bufferSize;
}

// Remaining bytes will fit in buffer.
else
{
remaining = 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need the same 512 alignment on the Write/Append calls in the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the number of bytes we are writing to the buffer is (int)Math.Min(remaining, _bufferSize),. In the case remaining > _bufferSize, we just wrote _bufferSize to the buffer, which is an increment of 512.


if (!overwrite)
{
throw new ArgumentException($"{nameof(BlockBlobClient)}.{nameof(BlockBlobClient.OpenWrite)} only supports overwrite.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include the overwrite name as another param to ArgEx. We should also maybe change the message to "only supports overwriting" to make it slightly clearer it's the specific value of the param and not just using the param with any value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "only supports overwriting".

async: async,
cancellationToken: cancellationToken)
.ConfigureAwait(false);
}
Copy link
Member

@tg-msft tg-msft Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-list says:

You can call Put Block List to update a blob by uploading only those blocks that have changed, then committing the new and existing blocks together. You can do this by specifying whether to commit a block from the committed block list or from the uncommitted block list, or to commit the most recently uploaded version of the block, whichever list it may belong to.

private async Task<Stream> OpenWriteInternal(
bool overwrite,
long position,
long? maxSize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I think we should still move it to the options

Comment on lines +42 to +60
if (async)
{
await _fileClient.UploadRangeAsync(
range: httpRange,
content: _buffer,
progressHandler: _progressHandler,
conditions: _conditions,
cancellationToken: cancellationToken)
.ConfigureAwait(false);
}
else
{
_fileClient.UploadRange(
range: httpRange,
content: _buffer,
progressHandler: _progressHandler,
conditions: _conditions,
cancellationToken: cancellationToken);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't these need to update the etag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Files Service doesn't support ETags.

@seanmcc-msft
Copy link
Member Author

https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-list says:

You can call Put Block List to update a blob by uploading only those blocks that have changed, then committing the new and existing blocks together. You can do this by specifying whether to commit a block from the committed block list or from the uncommitted block list, or to commit the most recently uploaded version of the block, whichever list it may belong to.

BlockBlobClient.CommitBlockList() always commits blocks in the block list as "Latest". When I was experimenting with this a few days ago, it appears that blocks not included in the block list are removed.

@seanmcc-msft
Copy link
Member Author

It looks like we're still not pulling the updated etag after committing the block list? We should add a test with multiple "append, then flush, then append, then append some more, then flush" actions to verify that this works correctly for all blobs.

Fixed BlockBlobWriteStream to update IfMatch, and added a test for this case for all object types.

Copy link
Member

@tg-msft tg-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks Sean!

@seanmcc-msft seanmcc-msft merged commit f0265cb into Azure:master Aug 12, 2020
@yahorsi
Copy link

yahorsi commented Jan 3, 2021

Guys stupid question, is this functionality released? I can't find the OpenWrite method on the BlobClient.
Please point me to the right direction

@seanmcc-msft
Copy link
Member Author

Guys stupid question, is this functionality released? I can't find the OpenWrite method on the BlobClient.
Please point me to the right direction

It's not present on the BlobClient, it is on Azure.Storage.Blobs.Specialized.BlockBlobClient.

@seanmcc-msft seanmcc-msft deleted the feature/storage/openWrite branch August 17, 2021 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants