-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added UTF-8 encoding for blob names in methods that are used to build… #5943
Conversation
… blob clients or manipulate blobs. Also added documentation where blob names are part of the flow so that the user knows what is happening in case they are thinking of passing an already encoded name.
…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java � sdk/storage/azure-storage-blob-cryptography/src/main/java/com/azure/storage/blob/specialized/cryptography/EncryptedBlobClientBuilder.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobClientBuilder.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobContainerAsyncClient.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobUrlParts.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/specialized/SpecializedBlobClientBuilder.java
…s. Removed the use of the Utility.urlEncoder() in a couple methods where it was not necessary and would actually cause double encoding. Added one use of said method to BlobAsyncClient.
…ted session records did not account for the newly added encoding
…t would make blob URLs to be encoded twice.
…ode style guidelines.
sdk/storage/azure-storage-blob/src/test/java/com/azure/storage/blob/ContainerAPITest.groovy
Outdated
Show resolved
Hide resolved
…ncoded when putting together the blobUrl that is passed as a a constructor argument for specialized blob clients. Added blobName decoding before encoding it so that if users pass encoded names we can properly deal with them; little to no overhead was added by this.
Thanks @vcolin7 for making this fix to Storage. Once the tests are updated and you get signoff from at least 1 developer from Storage (@rickle-msft @jaschrep-msft and/or @gapra-msft) please proceed with squashing and merging this PR (hopefully today) |
…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob-batch/src/main/java/com/azure/storage/blob/batch/BlobBatch.java � sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobContainerAsyncClient.java
@@ -136,12 +139,15 @@ | |||
* @throws UnsupportedOperationException If this batch has already added an operation of another type. | |||
*/ | |||
public Response<Void> deleteBlob(String containerName, String blobName) { | |||
return deleteBlobHelper(String.format(PATH_TEMPLATE, containerName, blobName), null, null); | |||
return deleteBlobHelper(String.format(PATH_TEMPLATE, containerName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just call into the next overload so we don't have to worry about missing formatting/encoding logic in one of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, let's make it call the next overload.
@@ -217,12 +225,15 @@ | |||
* @throws UnsupportedOperationException If this batch has already added an operation of another type. | |||
*/ | |||
public Response<Void> setBlobAccessTier(String containerName, String blobName, AccessTier accessTier) { | |||
return setBlobAccessTierHelper(String.format(PATH_TEMPLATE, containerName, blobName), accessTier, null); | |||
return setBlobAccessTierHelper(String.format(PATH_TEMPLATE, containerName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about calling the next overload.
@@ -324,6 +329,8 @@ public EncryptedBlobClientBuilder connectionString(String connectionString) { | |||
* with blobs in the root container, it is best to set the endpoint to the account url and specify the blob name | |||
* separately using the {@link EncryptedBlobClientBuilder#blobName(String) blobName} method.</p> | |||
* | |||
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p> | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's setting us up for double encoding since they're already handing us an endpoint. I thought we agreed to have an overload that accepts a doNotEncodeBlobName flag or whatever it should be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlobUrlParts.parse()
takes care of parsing the endpoint and decoding its blob name from UTF-8 in case it is already encoded, to then finally encode it. The reason behind this is that since it's part of the public API, it is entirely possible that people might pass a non-encoded URL as an argument too.
I think there was also a communication mishap on my behalf when I told you about the last approach I discussed with Alan. There are several reasons why I think it's best to just implicitly do encoding for the user instead of letting the responsibility rest with them:
- It makes the API simple to use if they never have to worry about passing encoded or non-encoded blob names, since with the changes from my last commit we will decode first to avoid double encoding. From running some tests I noticed this added little to no overhead.
- It can be argued that having overloaded methods that add a flag is unnecessary work and would make the API look less simple and straightforward, since we could just modify the already existing methods to add an
encodeBlobName
flag as an argument. - If we added a flag to either an overloaded method or the already existing ones, I don't think it would be much different from letting the users know we need the
blobName
to be encoded and that they can manually useUtility.urlEncode()
to do so, as you and Alan first discussed in the original issue.
The bottomline is that I believe implicit encoding is the most user-friendly way as long as we make sure we don't double encode anywhere in the flow.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see that you were doing a urlDecode before urlEncode in the setters. In that case, I think I'm fine with the PR as long as you add the update to return the unencoded name since we're assuming that will be the most common case
And can you actually also add to either the existing test or a new test that passes in an already encoded name and ensures it doesn't get double encoded. And also getting the name off of one client and passing it to another and again checking it doesn't get double encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add said getter and test changes.
@@ -76,6 +76,8 @@ | |||
/** | |||
* Package-private constructor for use by {@link BlobClientBuilder}. | |||
* | |||
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these comments on the constructors are accurate because the encoding happens in the builder. It seems like it's more accurate to say these constructors expect that the blobName is already properly encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of adding said comments wherever a blobName
or URL/endpoint is passed as an argument to generate a blob client so that the user knew what would happen to it. In hindsight, I think we could omit these comments wherever the blobName
is not directly altered. What do you think?
As a side note, there are some cases where the blobName
is set in the constructor, such as BlobAsyncClientBase
, which is used to create our specialized blob clients.
* @param blobName The blob name. | ||
* @return the updated BlobUrlParts object. | ||
*/ | ||
public BlobUrlParts setBlobName(String blobName) { | ||
this.blobName = blobName; | ||
this.blobName = Utility.urlEncode(Utility.urlDecode(blobName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overload to skip encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See answer about overloads and double encoding above.
@@ -267,6 +273,8 @@ public URL toUrl() { | |||
* is no path element for the container, the name of this blob in the root container will be set as the | |||
* containerName field in the resulting {@code BlobURLParts}.</p> | |||
* | |||
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place that feels likely to give us double encoding since we are taking in a URL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See answer about overloads and double encoding above.
@@ -110,7 +113,7 @@ protected BlobAsyncClientBase(HttpPipeline pipeline, String url, BlobServiceVers | |||
|
|||
this.accountName = accountName; | |||
this.containerName = containerName; | |||
this.blobName = blobName; | |||
this.blobName = Utility.urlEncode(Utility.urlDecode(blobName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we encode in the builder, so encoding in the constructor again would be double encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See answer about overloads and double encoding above.
@@ -159,6 +162,8 @@ public final String getContainerName() { | |||
/** | |||
* Get the blob name. | |||
* | |||
* <p>Blob name is encoded to UTF-8 using the {@link com.azure.storage.common.Utility#urlEncode(String)} method.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't know that we should specify what method is doing the encoding since Utilities are supposed to be implementation details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also probably be decoding on gets if we are encoding on sets. For one, we want it to round trip. For another, what if someone gets the name of one blob so they can set it as the name of a blob in a different container. Now we encode the name on setting it for the first blob, don't decode it in the getter, then encode it again in the second builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can leave out which method is used for encoding as long as we specify is UTF-8. As for the getters, even though we won't incur in any double encoding (see comment about that and overloads above), it sounds good to me so that users will always get to see and manipulate the original name of the blob.
blobs.next().getName() == name | ||
blobs.next().getName() == name + "2" | ||
blobs.next().getName() == name + "3" | ||
Utility.urlDecode(blobs.next().getName()) == name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment on decoding in getters. We should actually be validating that get without decoding is equal to the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above answer about this topic.
…ue-4324 � Conflicts: � sdk/storage/azure-storage-blob/src/test/resources/session-records/ContainerAPITestlistblobshierdelim.json
…d by getters and modified certain builders to account for this. Added encoding for setting the blobName in BlobServiceSasSignatureValues. Modified and added tests to account for encoding and added the generated session records as well. Corrected a test that did not compile on BlockBlobAPITest.
…t in telling the user we are doing this, so I removed parts of the Javadoc that referenced the fact. For the sake of clarity, I added the the Javadoc in blob name getters so that users know the returned blob name will always be decoded (it's possible they set an encoded name and expect to see the same value from the get method). Also added a clarification that states blob names must be encoded to UTF-8 to the Javadoc of methods that take blob URLs as an argument.
…ed. According to Rick Ley from the Storage team: "storage account name and the resource name must be URL-decoded".
… blob clients or manipulate blobs. Also added documentation where blob names are part of the flow so that the user knows what is happening in case they are thinking of passing an already encoded name.