Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added UTF-8 encoding for blob names in methods that are used to build… #5943

Merged
merged 14 commits into from
Oct 23, 2019
Merged

Added UTF-8 encoding for blob names in methods that are used to build… #5943

merged 14 commits into from
Oct 23, 2019

Conversation

vcolin7
Copy link
Member

@vcolin7 vcolin7 commented Oct 18, 2019

… blob clients or manipulate blobs. Also added documentation where blob names are part of the flow so that the user knows what is happening in case they are thinking of passing an already encoded name.

… blob clients or manipulate blobs. Also added documentation where blob names are part of the flow so that the user knows what is happening in case they are thinking of passing an already encoded name.
@vcolin7
Copy link
Member Author

vcolin7 commented Oct 18, 2019

This PR is for issue #4324 and also provides a fix for #5920

…ue-4324

� Conflicts:
�	sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java
�	sdk/storage/azure-storage-blob-cryptography/src/main/java/com/azure/storage/blob/specialized/cryptography/EncryptedBlobClientBuilder.java
�	sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobClientBuilder.java
�	sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobContainerAsyncClient.java
�	sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobUrlParts.java
�	sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/specialized/SpecializedBlobClientBuilder.java
…s. Removed the use of the Utility.urlEncoder() in a couple methods where it was not necessary and would actually cause double encoding. Added one use of said method to BlobAsyncClient.
…ted session records did not account for the newly added encoding
@vcolin7 vcolin7 requested a review from alzimmermsft October 21, 2019 08:03
@vcolin7 vcolin7 self-assigned this Oct 21, 2019
…ncoded when putting together the blobUrl that is passed as a a constructor argument for specialized blob clients. Added blobName decoding before encoding it so that if users pass encoded names we can properly deal with them; little to no overhead was added by this.
@joshfree
Copy link
Member

Thanks @vcolin7 for making this fix to Storage. Once the tests are updated and you get signoff from at least 1 developer from Storage (@rickle-msft @jaschrep-msft and/or @gapra-msft) please proceed with squashing and merging this PR (hopefully today)

…ue-4324

� Conflicts:
�	sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java
�	sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobContainerAsyncClient.java
@@ -136,12 +139,15 @@
* @throws UnsupportedOperationException If this batch has already added an operation of another type.
*/
public Response<Void> deleteBlob(String containerName, String blobName) {
return deleteBlobHelper(String.format(PATH_TEMPLATE, containerName, blobName), null, null);
return deleteBlobHelper(String.format(PATH_TEMPLATE, containerName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just call into the next overload so we don't have to worry about missing formatting/encoding logic in one of them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, let's make it call the next overload.

@@ -217,12 +225,15 @@
* @throws UnsupportedOperationException If this batch has already added an operation of another type.
*/
public Response<Void> setBlobAccessTier(String containerName, String blobName, AccessTier accessTier) {
return setBlobAccessTierHelper(String.format(PATH_TEMPLATE, containerName, blobName), accessTier, null);
return setBlobAccessTierHelper(String.format(PATH_TEMPLATE, containerName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about calling the next overload.

@@ -324,6 +329,8 @@ public EncryptedBlobClientBuilder connectionString(String connectionString) {
* with blobs in the root container, it is best to set the endpoint to the account url and specify the blob name
* separately using the {@link EncryptedBlobClientBuilder#blobName(String) blobName} method.</p>
*
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's setting us up for double encoding since they're already handing us an endpoint. I thought we agreed to have an overload that accepts a doNotEncodeBlobName flag or whatever it should be called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlobUrlParts.parse() takes care of parsing the endpoint and decoding its blob name from UTF-8 in case it is already encoded, to then finally encode it. The reason behind this is that since it's part of the public API, it is entirely possible that people might pass a non-encoded URL as an argument too.

I think there was also a communication mishap on my behalf when I told you about the last approach I discussed with Alan. There are several reasons why I think it's best to just implicitly do encoding for the user instead of letting the responsibility rest with them:

  • It makes the API simple to use if they never have to worry about passing encoded or non-encoded blob names, since with the changes from my last commit we will decode first to avoid double encoding. From running some tests I noticed this added little to no overhead.
  • It can be argued that having overloaded methods that add a flag is unnecessary work and would make the API look less simple and straightforward, since we could just modify the already existing methods to add an encodeBlobName flag as an argument.
  • If we added a flag to either an overloaded method or the already existing ones, I don't think it would be much different from letting the users know we need the blobName to be encoded and that they can manually use Utility.urlEncode() to do so, as you and Alan first discussed in the original issue.

The bottomline is that I believe implicit encoding is the most user-friendly way as long as we make sure we don't double encode anywhere in the flow.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see that you were doing a urlDecode before urlEncode in the setters. In that case, I think I'm fine with the PR as long as you add the update to return the unencoded name since we're assuming that will be the most common case

And can you actually also add to either the existing test or a new test that passes in an already encoded name and ensures it doesn't get double encoded. And also getting the name off of one client and passing it to another and again checking it doesn't get double encoded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add said getter and test changes.

@@ -76,6 +76,8 @@
/**
* Package-private constructor for use by {@link BlobClientBuilder}.
*
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these comments on the constructors are accurate because the encoding happens in the builder. It seems like it's more accurate to say these constructors expect that the blobName is already properly encoded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of adding said comments wherever a blobName or URL/endpoint is passed as an argument to generate a blob client so that the user knew what would happen to it. In hindsight, I think we could omit these comments wherever the blobName is not directly altered. What do you think?

As a side note, there are some cases where the blobName is set in the constructor, such as BlobAsyncClientBase, which is used to create our specialized blob clients.

* @param blobName The blob name.
* @return the updated BlobUrlParts object.
*/
public BlobUrlParts setBlobName(String blobName) {
this.blobName = blobName;
this.blobName = Utility.urlEncode(Utility.urlDecode(blobName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overload to skip encoding?

Copy link
Member Author

@vcolin7 vcolin7 Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See answer about overloads and double encoding above.

@@ -267,6 +273,8 @@ public URL toUrl() {
* is no path element for the container, the name of this blob in the root container will be set as the
* containerName field in the resulting {@code BlobURLParts}.</p>
*
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place that feels likely to give us double encoding since we are taking in a URL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See answer about overloads and double encoding above.

@@ -110,7 +113,7 @@ protected BlobAsyncClientBase(HttpPipeline pipeline, String url, BlobServiceVers

this.accountName = accountName;
this.containerName = containerName;
this.blobName = blobName;
this.blobName = Utility.urlEncode(Utility.urlDecode(blobName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we encode in the builder, so encoding in the constructor again would be double encoding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See answer about overloads and double encoding above.

@@ -159,6 +162,8 @@ public final String getContainerName() {
/**
* Get the blob name.
*
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't know that we should specify what method is doing the encoding since Utilities are supposed to be implementation details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also probably be decoding on gets if we are encoding on sets. For one, we want it to round trip. For another, what if someone gets the name of one blob so they can set it as the name of a blob in a different container. Now we encode the name on setting it for the first blob, don't decode it in the getter, then encode it again in the second builder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can leave out which method is used for encoding as long as we specify is UTF-8. As for the getters, even though we won't incur in any double encoding (see comment about that and overloads above), it sounds good to me so that users will always get to see and manipulate the original name of the blob.

blobs.next().getName() == name
blobs.next().getName() == name + "2"
blobs.next().getName() == name + "3"
Utility.urlDecode(blobs.next().getName()) == name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment on decoding in getters. We should actually be validating that get without decoding is equal to the name.

Copy link
Member Author

@vcolin7 vcolin7 Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above answer about this topic.

…ue-4324

� Conflicts:
�	sdk/storage/azure-storage-blob/src/test/resources/session-records/ContainerAPITestlistblobshierdelim.json
…d by getters and modified certain builders to account for this. Added encoding for setting the blobName in BlobServiceSasSignatureValues. Modified and added tests to account for encoding and added the generated session records as well. Corrected a test that did not compile on BlockBlobAPITest.
@joshfree joshfree added blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Oct 23, 2019
…t in telling the user we are doing this, so I removed parts of the Javadoc that referenced the fact. For the sake of clarity, I added the the Javadoc in blob name getters so that users know the returned blob name will always be decoded (it's possible they set an encoded name and expect to see the same value from the get method). Also added a clarification that states blob names must be encoded to UTF-8 to the Javadoc of methods that take blob URLs as an argument.
…ed. According to Rick Ley from the Storage team: "storage account name and the resource name must be URL-decoded".
@vcolin7 vcolin7 merged commit e37d8a5 into Azure:master Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants