Added UTF-8 encoding for blob names in methods that are used to build… #5943

vcolin7 · 2019-10-18T22:48:49Z

… blob clients or manipulate blobs. Also added documentation where blob names are part of the flow so that the user knows what is happening in case they are thinking of passing an already encoded name.

vcolin7 · 2019-10-18T22:56:34Z

This PR is for issue #4324 and also provides a fix for #5920

…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java � sdk/storage/azure-storage-blob-cryptography/src/main/java/com/azure/storage/blob/specialized/cryptography/EncryptedBlobClientBuilder.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobClientBuilder.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobContainerAsyncClient.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobUrlParts.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/specialized/SpecializedBlobClientBuilder.java

…s. Removed the use of the Utility.urlEncoder() in a couple methods where it was not necessary and would actually cause double encoding. Added one use of said method to BlobAsyncClient.

…ted session records did not account for the newly added encoding

…t would make blob URLs to be encoded twice.

…ode style guidelines.

…style checker.

sdk/storage/azure-storage-blob/src/test/java/com/azure/storage/blob/ContainerAPITest.groovy

…ncoded when putting together the blobUrl that is passed as a a constructor argument for specialized blob clients. Added blobName decoding before encoding it so that if users pass encoded names we can properly deal with them; little to no overhead was added by this.

joshfree · 2019-10-21T22:03:08Z

Thanks @vcolin7 for making this fix to Storage. Once the tests are updated and you get signoff from at least 1 developer from Storage (@rickle-msft @jaschrep-msft and/or @gapra-msft) please proceed with squashing and merging this PR (hopefully today)

…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobContainerAsyncClient.java

rickle-msft · 2019-10-22T16:35:22Z

sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java

@@ -136,12 +139,15 @@
     * @throws UnsupportedOperationException If this batch has already added an operation of another type.
     */
    public Response<Void> deleteBlob(String containerName, String blobName) {
-        return deleteBlobHelper(String.format(PATH_TEMPLATE, containerName, blobName), null, null);
+        return deleteBlobHelper(String.format(PATH_TEMPLATE, containerName,


This should just call into the next overload so we don't have to worry about missing formatting/encoding logic in one of them

You are right, let's make it call the next overload.

rickle-msft · 2019-10-22T16:35:45Z

sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java

@@ -217,12 +225,15 @@
     * @throws UnsupportedOperationException If this batch has already added an operation of another type.
     */
    public Response<Void> setBlobAccessTier(String containerName, String blobName, AccessTier accessTier) {
-        return setBlobAccessTierHelper(String.format(PATH_TEMPLATE, containerName, blobName), accessTier, null);
+        return setBlobAccessTierHelper(String.format(PATH_TEMPLATE, containerName,


Same comment about calling the next overload.

rickle-msft · 2019-10-22T16:39:24Z

...rc/main/java/com/azure/storage/blob/specialized/cryptography/EncryptedBlobClientBuilder.java

@@ -324,6 +329,8 @@ public EncryptedBlobClientBuilder connectionString(String connectionString) {
     * with blobs in the root container, it is best to set the endpoint to the account url and specify the blob name
     * separately using the {@link EncryptedBlobClientBuilder#blobName(String) blobName} method.</p>
     *
+     * <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>
+     *


This seems like it's setting us up for double encoding since they're already handing us an endpoint. I thought we agreed to have an overload that accepts a doNotEncodeBlobName flag or whatever it should be called.

BlobUrlParts.parse() takes care of parsing the endpoint and decoding its blob name from UTF-8 in case it is already encoded, to then finally encode it. The reason behind this is that since it's part of the public API, it is entirely possible that people might pass a non-encoded URL as an argument too.

I think there was also a communication mishap on my behalf when I told you about the last approach I discussed with Alan. There are several reasons why I think it's best to just implicitly do encoding for the user instead of letting the responsibility rest with them:

It makes the API simple to use if they never have to worry about passing encoded or non-encoded blob names, since with the changes from my last commit we will decode first to avoid double encoding. From running some tests I noticed this added little to no overhead.

It can be argued that having overloaded methods that add a flag is unnecessary work and would make the API look less simple and straightforward, since we could just modify the already existing methods to add an encodeBlobName flag as an argument.

If we added a flag to either an overloaded method or the already existing ones, I don't think it would be much different from letting the users know we need the blobName to be encoded and that they can manually use Utility.urlEncode() to do so, as you and Alan first discussed in the original issue.

The bottomline is that I believe implicit encoding is the most user-friendly way as long as we make sure we don't double encode anywhere in the flow.

What do you think?

I didn't see that you were doing a urlDecode before urlEncode in the setters. In that case, I think I'm fine with the PR as long as you add the update to return the unencoded name since we're assuming that will be the most common case

And can you actually also add to either the existing test or a new test that passes in an already encoded name and ensures it doesn't get double encoded. And also getting the name off of one client and passing it to another and again checking it doesn't get double encoded?

Will add said getter and test changes.

rickle-msft · 2019-10-22T16:40:51Z

sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobAsyncClient.java

@@ -76,6 +76,8 @@
    /**
     * Package-private constructor for use by {@link BlobClientBuilder}.
     *
+     * <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>


I don't think these comments on the constructors are accurate because the encoding happens in the builder. It seems like it's more accurate to say these constructors expect that the blobName is already properly encoded.

I thought of adding said comments wherever a blobName or URL/endpoint is passed as an argument to generate a blob client so that the user knew what would happen to it. In hindsight, I think we could omit these comments wherever the blobName is not directly altered. What do you think?

As a side note, there are some cases where the blobName is set in the constructor, such as BlobAsyncClientBase, which is used to create our specialized blob clients.

rickle-msft · 2019-10-22T16:43:25Z

sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobUrlParts.java

     * @param blobName The blob name.
     * @return the updated BlobUrlParts object.
     */
    public BlobUrlParts setBlobName(String blobName) {
-        this.blobName = blobName;
+        this.blobName = Utility.urlEncode(Utility.urlDecode(blobName));


Overload to skip encoding?

See answer about overloads and double encoding above.

rickle-msft · 2019-10-22T16:44:55Z

sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobUrlParts.java

@@ -267,6 +273,8 @@ public URL toUrl() {
     * is no path element for the container, the name of this blob in the root container will be set as the
     * containerName field in the resulting {@code BlobURLParts}.</p>
     *
+     * <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>


Another place that feels likely to give us double encoding since we are taking in a URL

See answer about overloads and double encoding above.

rickle-msft · 2019-10-22T16:46:06Z

...azure-storage-blob/src/main/java/com/azure/storage/blob/specialized/BlobAsyncClientBase.java

@@ -110,7 +113,7 @@ protected BlobAsyncClientBase(HttpPipeline pipeline, String url, BlobServiceVers

        this.accountName = accountName;
        this.containerName = containerName;
-        this.blobName = blobName;
+        this.blobName = Utility.urlEncode(Utility.urlDecode(blobName));


Don't we encode in the builder, so encoding in the constructor again would be double encoding?

See answer about overloads and double encoding above.

rickle-msft · 2019-10-22T16:56:57Z

...azure-storage-blob/src/main/java/com/azure/storage/blob/specialized/BlobAsyncClientBase.java

@@ -159,6 +162,8 @@ public final String getContainerName() {
    /**
     * Get the blob name.
     *
+     * <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p>


I also don't know that we should specify what method is doing the encoding since Utilities are supposed to be implementation details

We should also probably be decoding on gets if we are encoding on sets. For one, we want it to round trip. For another, what if someone gets the name of one blob so they can set it as the name of a blob in a different container. Now we encode the name on setting it for the first blob, don't decode it in the getter, then encode it again in the second builder.

I agree we can leave out which method is used for encoding as long as we specify is UTF-8. As for the getters, even though we won't incur in any double encoding (see comment about that and overloads above), it sounds good to me so that users will always get to see and manipulate the original name of the blob.

rickle-msft · 2019-10-22T17:01:24Z

sdk/storage/azure-storage-blob/src/test/java/com/azure/storage/blob/ContainerAPITest.groovy

-        blobs.next().getName() == name
-        blobs.next().getName() == name + "2"
-        blobs.next().getName() == name + "3"
+        Utility.urlDecode(blobs.next().getName()) == name


See above comment on decoding in getters. We should actually be validating that get without decoding is equal to the name.

See above answer about this topic.

…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob/src/test/resources/session-records/ContainerAPITestlistblobshierdelim.json

…d by getters and modified certain builders to account for this. Added encoding for setting the blobName in BlobServiceSasSignatureValues. Modified and added tests to account for encoding and added the generated session records as well. Corrected a test that did not compile on BlockBlobAPITest.

…t in telling the user we are doing this, so I removed parts of the Javadoc that referenced the fact. For the sake of clarity, I added the the Javadoc in blob name getters so that users know the returned blob name will always be decoded (it's possible they set an encoded name and expect to see the same value from the get method). Also added a clarification that states blob names must be encoded to UTF-8 to the Javadoc of methods that take blob URLs as an argument.

…ed. According to Rick Ley from the Storage team: "storage account name and the resource name must be URL-decoded".

Added UTF-8 encoding for blob names in methods that are used to build…

9230cc1

… blob clients or manipulate blobs. Also added documentation where blob names are part of the flow so that the user knows what is happening in case they are thinking of passing an already encoded name.

vcolin7 requested review from rickle-msft and alzimmermsft October 18, 2019 22:48

vcolin7 requested review from gapra-msft, jaschrep-msft and sima-zhu as code owners October 18, 2019 22:48

alzimmermsft approved these changes Oct 19, 2019

View reviewed changes

vcolin7 added 6 commits October 20, 2019 16:11

Added missing comments about encryption to the Javadoc of some classe…

a665c1a

…s. Removed the use of the Utility.urlEncoder() in a couple methods where it was not necessary and would actually cause double encoding. Added one use of said method to BlobAsyncClient.

Changed a few couple Groovy tests that were failing because the expec…

1227465

…ted session records did not account for the newly added encoding

Removed a call to Utility.urlEncode() in BLobContainerAsyncClient tha…

55ac267

…t would make blob URLs to be encoded twice.

Broke a couple lines in BlobBatch into two parts to comply with the c…

b18fbed

…ode style guidelines.

Added missing space to a line in BlobBatch. It was flagged by the CI …

009d5ab

…style checker.

vcolin7 requested a review from alzimmermsft October 21, 2019 08:03

alzimmermsft reviewed Oct 21, 2019

View reviewed changes

sdk/storage/azure-storage-blob/src/test/java/com/azure/storage/blob/ContainerAPITest.groovy Outdated Show resolved Hide resolved

alzimmermsft approved these changes Oct 21, 2019

View reviewed changes

vcolin7 self-assigned this Oct 21, 2019

joshfree approved these changes Oct 21, 2019

View reviewed changes

rickle-msft reviewed Oct 22, 2019

View reviewed changes

vcolin7 added 2 commits October 22, 2019 20:02

Merge branch 'master' of github.com:Azure/azure-sdk-for-java into iss…

4b527de

…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob/src/test/resources/session-records/ContainerAPITestlistblobshierdelim.json

joshfree added blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Oct 23, 2019

vcolin7 added 3 commits October 23, 2019 11:32

Made sure that the blobName used to generate a SAS signature is decod…

6489555

…ed. According to Rick Ley from the Storage team: "storage account name and the resource name must be URL-decoded".

Merge branch 'master' into issue-4324

dfb9416

rickle-msft approved these changes Oct 23, 2019

View reviewed changes

vcolin7 merged commit e37d8a5 into Azure:master Oct 23, 2019

gapra-msft mentioned this pull request Oct 31, 2019

URL encoding for datalake #6130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added UTF-8 encoding for blob names in methods that are used to build… #5943

Added UTF-8 encoding for blob names in methods that are used to build… #5943

vcolin7 commented Oct 18, 2019

vcolin7 commented Oct 18, 2019

joshfree commented Oct 21, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019 •

edited

Loading

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019

rickle-msft Oct 22, 2019

vcolin7 Oct 22, 2019 •

edited

Loading

Added UTF-8 encoding for blob names in methods that are used to build… #5943

Added UTF-8 encoding for blob names in methods that are used to build… #5943

Conversation

vcolin7 commented Oct 18, 2019

vcolin7 commented Oct 18, 2019

joshfree commented Oct 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcolin7 Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcolin7 Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

vcolin7 Oct 22, 2019 •

edited

Loading

vcolin7 Oct 22, 2019 •

edited

Loading