-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature add ability to turn off canonicalization in System.Uri #52628
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsBackground and MotivationThis request is coming for the AWS SDK for .NET to support a use case for Amazon S3 the AWS object storage service. S3 stores objects with an object key name. The key name supports any UTF-8 character and the key name is used as part of the resource path. Some users have a scenario where they are storing objects with "./" or "../" in the object key. The other AWS SDKs handle this behavior fine. In .NET we can't handle the scenario because the System.Uri class always canonicalizes the resource path. For example if a user is trying to get an object at "http://mybucket/s3.amazonaws.com/foo/../bar" the Uri class transforms this to "http://mybucket/s3.amazonaws.com/bar". This is now changing the object key from "foo/../bar" to "bar". That of course causes the user to attempt to access a different object then intended. Also since our requests are signed they get a signature error because the request changed after signing happened. I'm not saying Uri canonicalizing is wrong but what I would like is a way to specify to the Uri class to turn off canonicalizing. I see Uri class does have the Canonicalize method that ideally we could have subclassed and skip the code in Canonicalize but that method is deprecated and is not called. Proposed APIFor flexiblity sake and avoiding yet another boolean to the constructor I propose adding another enum called Usage Examples[Flags]
enum UriParserOptions { DisableCanonicalization = 1 };
Uri s3Uri = new Uri("http://mybucket/s3.amazonaws.com/foo/../bar", UriParserOptions.DisableCanonicalization); Alternative DesignsThought about adding constructors with extra an boolean. I think this is infeasible because there is already the constructor that takes in a boolean for Another alternative design I decided against was adding a property on the Uri class to disable canonicalization but that would violate the design of keeping Uri immutable. RisksAdds new constructors to the new URI class. URI class being so low level this might break somebodies reflection code that expect the class to never change. Seems unlikely but I recognize this is some of the really old code in the .NET codebase.
|
Have you considered encoding the key into the query instead? "http://mybucket/s3.amazonaws.com/?key=" + Uri.EscapeDataString("foo/../bar")
// http://mybucket/s3.amazonaws.com/?key=foo%2F..%2Fbar What is stopping users from deciding to use other problematic characters as part of the key? There are lots of ways to change/make the Uri invalid when just appending random data. There are other problems with such Uris - they will fall over when hitting proxies, used in a browser etc. Using properly escaped queries on the other hand, you can store arbitrary bytes safely and still be confident that the shape will be preserved through different systems. If you are only looking at the validation part of const string AwsScheme = "aws-object";
UriParser.Register(new GenericUriParser(GenericUriParserOptions.DontCompressPath), AwsScheme, 443);
string uriString = $"{AwsScheme}://mybucket/s3.amazonaws.com/foo/../bar";
Console.WriteLine(new Uri(uriString).AbsoluteUri);
// aws-object://mybucket/s3.amazonaws.com/foo/../bar
|
What about using |
@MihaZupan I can't change S3 to use the query string for object key. It is too built into the nature of S3 at this point. Escaping problematic characters is generally not a problem because that doesn't really change the request since the resource path gets unescaped on the S3 side. @karelz Ideally |
I think we need to first understand this better.
Overall this feels kind of ugly and hacky to me. I am afraid that it may not be the only thing needed to make .NET AWS SDK work well with S3. @geoffkizer any additional thoughts here? |
Answering what I can
var uri = new Uri("http://s3.amazonaws.com/foo/bar/%2E%2E/text.txt");
Console.WriteLine(uri.ToString()); I get the following
So even though I escaped the periods the canalization happen removing the parent folder "bar" from the resource path. |
From looking at the source, I suspect IriHelper unescapes the %2E when it shouldn't. Can the IRI parsing be disabled? |
Decoding %2E and interpreting it as |
If you fix that behavior I'm fine making the SDK encode periods. |
It is not a bug. See RFC3986 Section 2.3. Unreserved Characters (emphasis mine)
Further see c# Uri docs - remarks
Regarding the overall discussion, I would also point out RFC3986 Section 6.2.2.3. Path Segment Normalization
It is critical from a security perspective (path traversal) that Uri decodes sequences like
I was not implying to replace all usages.
It can be for custom schemes, but not for internally-recognized ones like http. |
If you percent-encode the slashes instead, will that preserve the dots and keep S3 happy? |
I am still worried that this is just focusing on one specific symptom (case of Gathering from the documentation it seems that other problematic characters are also allowed. What happens if the key contains:
|
@normj looking at RFC snippets from @MihaZupan above in #52628 (comment), it seems that removing canonicalization from I am hesitant to create an API option to make Can you help us understand what are the scenarios where the customers hit these problems in production? Did S3 force them to use |
using System;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
foreach (string uriString in args)
{
var uri = new Uri(uriString);
Console.WriteLine($"Input: {uriString}");
Console.WriteLine($"OriginalString: {uri.OriginalString}");
Console.WriteLine($"AbsoluteUri: {uri.AbsoluteUri}");
Console.WriteLine($"AbsolutePath: {uri.AbsolutePath}");
Console.WriteLine($"LocalPath: {uri.LocalPath}");
Console.WriteLine();
}
}
}
} <?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.8"/>
</startup>
<uri>
<iriParsing enabled="false"/>
</uri>
</configuration>
I get the same results with each of the following:
So the IRI parsing is not relevant after all, and the behavior is not a regression. AbsoluteUri does preserve "%2F", which may then be viable as a workaround, if S3 treats it as equivalent to an unencoded slash. |
HTTP demands URIs, so this is technically also asking for This is the kind of thing non-validating LLHTTP would be perfect for. |
I guess LLHTTP means #525. |
@karelz Yes, I recognize this is going against the RFC. Which is why I definitely don't want to change any existing behavior just add a back door for use cases that are not RFC compatible. S3 is not forcing anybody to use ./ and ../ in their scenarios. Some customers are just choosing to have object keys with those characters in there. Since S3 considers the object key an opaque string at the service level ./ and ../ mean nothing. I wish S3 would have had more restrictions on the characters of an object key other then saying UTF-8 but we are 15 years too late on changing that decisions. I talked to our Go team and Go allows you to set a |
@normj could you please confirm whether other cases pointed out in #52628 (comment) are also affecting you |
We have no issue using the characters using System;
using System.IO;
using System.Linq;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using var s3Client = new AmazonS3Client(RegionEndpoint.USEast1);
var bucketName = "normj-east1";
var objectKey = "%?#";
await s3Client.PutObjectAsync(new PutObjectRequest
{
BucketName = bucketName,
Key = objectKey,
ContentBody = "hello"
});
var response = await s3Client.ListObjectsAsync(bucketName);
Console.WriteLine($"Found Key: {response.S3Objects.Any(x => string.Equals(objectKey, x.Key))}");
using var getResponse = await s3Client.GetObjectAsync(bucketName, objectKey);
Console.WriteLine($"Content: {new StreamReader(getResponse.ResponseStream).ReadToEnd()}"); The output:
In this case the special characters are percent encoded so the previous SDK calls go to the following URI
But with |
Does S3 differentiate between new Uri("http://foo/one%2Ftwo%2F..%2F%three").AbsoluteUri
// http://foo/one%2Ftwo%2F..%2F%25three |
@MihaZupan That solution comes close to working but the it doesn't work when the S3 object starts with periods like |
PowerShell 7.1.3 using .NET 5.0.4:
i.e. it did not decode to |
@normj I can confirm the behavior Kalle described: for a Uri like Can you please confirm that those are the right examples? |
Background and Motivation
This request is coming for the AWS SDK for .NET to support a use case for Amazon S3 the AWS object storage service.
S3 stores objects with an object key name. The key name supports any UTF-8 character and the key name is used as part of the resource path. Some users have a scenario where they are storing objects with "./" or "../" in the object key. The other AWS SDKs handle this behavior fine. In .NET we can't handle the scenario because the System.Uri class always canonicalizes the resource path.
For example if a user is trying to get an object at "http://mybucket/s3.amazonaws.com/foo/../bar" the Uri class transforms this to "http://mybucket/s3.amazonaws.com/bar". This is now changing the object key from "foo/../bar" to "bar". That of course causes the user to attempt to access a different object then intended. Also since our requests are signed they get a signature error because the request changed after signing happened.
I'm not saying Uri canonicalizing is wrong but what I would like is a way to specify to the Uri class to turn off canonicalizing. I see Uri class does have the Canonicalize method that ideally we could have subclassed and skip the code in Canonicalize but that method is deprecated and is not called.
Proposed API
For flexiblity sake and avoiding yet another boolean to the constructor I propose adding another enum called
UriParserOptions
that can be passed into the constructor. That would provide room in the future for other possible options that need to be set.Usage Examples
Alternative Designs
Thought about adding constructors with extra an boolean. I think this is infeasible because there is already the constructor that takes in a boolean for
dontEscape
. Adding another boolean will likely get confused with the dontEscape boolean and you would have to extend thedontEscape
constructors to avoid collisions with the boolean but those constructors are obsolete. Also the boolean approach is too single purpose and doesn't leave any room for handling any other future scenarios.Another alternative design I decided against was adding a property on the Uri class to disable canonicalization but that would violate the design of keeping Uri immutable.
Risks
Adds new constructors to the new URI class. URI class being so low level this might break somebodies reflection code that expect the class to never change. Seems unlikely but I recognize this is some of the really old code in the .NET codebase.
The text was updated successfully, but these errors were encountered: