-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to virtual style addressing #1387
Conversation
Botocore currently couples `s3v4` signing and path style addressing. Using the older `s3` signature version results in the `auto` style addressing (virtual hosted by default with a fallback to path style if needed). Support for `s3v4` was added in b30a393 in support for the new `cn-north-1` region. The virtual hosting logic didn't work with the new `.aws.cn` TLD so the code was updated to just use path style addressing when using `s3v4`. This created an inconsistency in the default behavior where if you're using the old signature version we'll use virtual hosted addressing, but if you use s3v4 you get path style by default. Combine this with some regions only supporting s3v4, additional edge cases such as us-gov-west-1, and dual stack and accelerate endpoints, the logic was becoming hard to follow. The new logic is simpler because it's decoupled from region and signature version. We'll try to use virtual hosted addressing by default, and fall back to path style if necessary.
This removes the hard coded references to 's3.amazonaws.com' when using the virtual hosted addressing mode of S3 and instead uses regionalized endpoints when converting to virtual hosted addressing. For example, given a bucket 'foo' in us-west-2 and a key 'bar', we would convert the URL from `s3.us-west-2.amazonaws.com/foo/bar` to `foo.s3.amazonaws.com/bar`. With this change we'll now convert the URL to `foo.s3.us-west-2.amazonaws.com/bar`. When the initial code for 's3.amazonaws.com' was first added to botocore, it provided a number of benefits: 1. You could avoid a 301 response by using `.s3.amazonaws.com`. This is because the DNS would resolve to an endpoint in the correct region, and the older signature version `s3` didn't include a region name as part of its signing process. The end result is that a user did not have to correctly configure a region for an S3 bucket, they'd automatically get to the correct region due to the DNS resolution. 2. The 301 PermanentRedirect responses did not include any structured data about the correct region to use so it wasn't easy to know which region you _should_ be sending the request to. As a result, 301 responses weren't automatically handled and ended up just raising an exception back to the user. Since this code was first introduced there were several things that have changed: 1. The introduction of the `s3v4` signature version, which requires a correct region name as part of its signature. Signing a request with the wrong region results in a 400 response. As a result, it didn't matter if `foo.s3.amazonaws.com` got you to the right region, if the request was _signed_ with the wrong region, you'd get a 400 response. 2. The 301 response (as well as most responses from S3) contain request metadata that tell you which region a bucket is in. This means that it's now possible to automatically handle 301 responses because we know which region to send the request to. 3. The introduction of various partitions outside of the `aws` partition, such as `aws-cn` meant there were other TLDs we needed to handle. The "hack" put in place in botocore was to just disable virtual hosted addressing in certain scenarios. Given all this, it makes sense to no longer hardcode `s3.amazonaws.com` when converting to virtual hosted addressing. There's already a growing number of edge cases where we have to disable this, and most importantly it's not needed anymore.
Similar to cn-north-1 (though implemented differently), using a govcloud region would disable virtual hosted addressing. This removes the whole "allowed regions" concept in the fix_s3_host code. Now the addressing style logic is completely separate of both signature version and region.
When you sign a HeadBucket/HeadObject request with the wrong region you'll get a 400 response with no body (expected for HEAD requests). We need to update the special casing for these operations to also catch this case and redirect to the new region appropriately.
Codecov Report
@@ Coverage Diff @@
## develop #1387 +/- ##
===========================================
+ Coverage 80.52% 80.54% +0.02%
===========================================
Files 87 87
Lines 12121 12119 -2
===========================================
+ Hits 9760 9761 +1
+ Misses 2361 2358 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. I think the decrease in coverage is just because we removed lines.
I want to spend a deeper time reviewing this, but skimming over the more iffy change of removing
If I then make a presign request:
And then try to curl it, I get a permanent redirect (and before it worked fine because
To get it working, I had to set my region correctly:
I think we will have to add some special logic for |
Yeah I'm aware, @joguSD and I chatted offline about this last week. You technically don't need to set your region correctly you have to set it to us-east-1 to get the Let me think on this. If we do add this logic back I'd still want it out of the general auth code and instead moved into the presigner code. |
Makes sense. For what it is worth, I did add logic recently in the |
This puts back the original behavior to use the global endpoint for s3 presigning for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks fine. Just had a couple questions related to the tests.
tests/functional/test_s3.py
Outdated
yield t.case(region='us-west-2', bucket='bucket', key='key', | ||
signature_version=None, | ||
s3_config={'use_dualstack_endpoint': True}, | ||
expected_url='https://bucket.s3.amazonaws.com/key') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the expected url does not look like a dualstack endpoint. Should that be the case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it looks like the existing behavior is buggy for presigned URLs and dualstack. On the develop branch this is what I'm seeing:
Region | Signature | URL |
---|---|---|
us-east-1 | (default) | https://bucket.s3.amazonaws.com/foo |
us-east-1 | s3 | https://bucket.s3.dualstack.us-east-1.amazonaws.com/foo |
us-east-1 | s3v4 | https://s3.dualstack.us-east-1.amazonaws.com/bucket/foo |
which is clearly wrong. In testing this out, it doesn't appear that you can use bucket.s3.dualstack.amazonaws.com
(i.e a regionless endpoint for dualstack) so I think I'll update this test case to just use bucket.s3.dualstack.us-east-1.amazonaws.com
. Let me know if you see any issues with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah seems fine with me.
yield t.case(region='us-west-50', bucket='bucket', key='key', | ||
signature_version=None, | ||
expected_url='https://bucket.s3.amazonaws.com/key') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add a test case for a user provided url?
@kyleknap Updated. I ended up moving all the checks of whether or not to use the global endpoint into the presigner code, since that's what's triggering the special casing anyways. That makes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I like how you lowered the special logic into the presigner and out of the handler. 🚢
It might be worth adding a changelog for this as well. So we can point back to when this change happened. Otherwise, it should be good to merge. |
botocore/signers.py
Outdated
@@ -586,6 +587,8 @@ def generate_presigned_url(self, ClientMethod, Params=None, ExpiresIn=3600, | |||
prepare_request_dict( | |||
request_dict, endpoint_url=self.meta.endpoint_url, context=context) | |||
|
|||
request_dict['context']['partition'] = self.meta.partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crap, I meant to take this out. We don't need partition
in the request context.
As part of this change I moved out the logic for whether to use a global endpoint for fix_s3_host() into the presigner code, which is where the special casing happens anyways.
41f6a16
to
61dacf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. 🚢
* virtualhost-default: Add changelog entry for #1387 Use regional endpoint for dualstack endpoints Use global endpoint for s3 presigning Account for 400 response from HeadBucket/HeadObject Remove special case govcloud handling Remove hardcoded 's3.amazonaws.com' for virtual hosted addressing Decouple signature version of S3 addressing style
Rebased and merged via 939fbeb |
* release-1.9.0: Bumping version to 1.9.0 Update to latest models Added changelog for metadata stubbing support setting response metadata when adding client error to stub Add changelog entry for #1387 Use regional endpoint for dualstack endpoints Use global endpoint for s3 presigning Account for 400 response from HeadBucket/HeadObject Remove special case govcloud handling Remove hardcoded 's3.amazonaws.com' for virtual hosted addressing Decouple signature version of S3 addressing style
This PR updates botocore to always default to virtual hosted addressing
regardless of signature version and/or region. A user can always explicitly
configure which addressing mode they'd like to use but the behavior of botocore
is now much simpler and more consistent. Previously, botocore would use
virtual hosted addressing for the
s3
signature version, but usings3v4
would switch to path style addressing. Additionally, the GovCloud regions
would also use to path style addressing by default as would unsigned
requests.
Now the logic is much simpler and actually consistent:
path style if needed. This happens regardless of which signature version
or region you have configured.
The commit messages have more details about the background and rationale
for these changes. The one big change worth pointing out is that the
hardcoded
s3.amazonaws.com
suffix used by default for virtual hostedaddressing has been removed. This caused a number of inconsistencies
depending on whether or not you're in the
aws
partition. Also, theoriginal motiviation for adding this no longer applies. See cfeaae6
for a detailed explanation.
This will also require a change one of the AWS CLI tests.