Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to virtual style addressing #1387

Closed
wants to merge 7 commits into from

Conversation

jamesls
Copy link
Member

@jamesls jamesls commented Feb 16, 2018

This PR updates botocore to always default to virtual hosted addressing
regardless of signature version and/or region. A user can always explicitly
configure which addressing mode they'd like to use but the behavior of botocore
is now much simpler and more consistent. Previously, botocore would use
virtual hosted addressing for the s3 signature version, but using s3v4
would switch to path style addressing. Additionally, the GovCloud regions
would also use to path style addressing by default as would unsigned
requests.

Now the logic is much simpler and actually consistent:

  1. If the user explicitly configures an addressing style, use that value.
  2. Otherwise, try to use virtual hosted addressing and fall back to
    path style if needed. This happens regardless of which signature version
    or region you have configured.

The commit messages have more details about the background and rationale
for these changes. The one big change worth pointing out is that the
hardcoded s3.amazonaws.com suffix used by default for virtual hosted
addressing has been removed. This caused a number of inconsistencies
depending on whether or not you're in the aws partition. Also, the
original motiviation for adding this no longer applies. See cfeaae6
for a detailed explanation.

This will also require a change one of the AWS CLI tests.

Botocore currently couples `s3v4` signing and path style
addressing.  Using the older `s3` signature version results
in the `auto` style addressing (virtual hosted by default with
a fallback to path style if needed).

Support for `s3v4` was added in b30a393 in support for the
new `cn-north-1` region.  The virtual hosting logic didn't work
with the new `.aws.cn` TLD so the code was updated to just use
path style addressing when using `s3v4`.

This created an inconsistency in the default behavior where if
you're using the old signature version we'll use virtual hosted
addressing, but if you use s3v4 you get path style by default.
Combine this with some regions only supporting s3v4, additional
edge cases such as us-gov-west-1, and dual stack and accelerate
endpoints, the logic was becoming hard to follow.

The new logic is simpler because it's decoupled from region and
signature version.  We'll try to use virtual hosted addressing
by default, and fall back to path style if necessary.
This removes the hard coded references to 's3.amazonaws.com' when
using the virtual hosted addressing mode of S3 and instead uses
regionalized endpoints when converting to virtual hosted
addressing.

For example, given a bucket 'foo' in us-west-2 and a key 'bar', we would
convert the URL from `s3.us-west-2.amazonaws.com/foo/bar` to
`foo.s3.amazonaws.com/bar`.  With this change we'll now convert
the URL to `foo.s3.us-west-2.amazonaws.com/bar`.

When the initial code for 's3.amazonaws.com' was first added to
botocore, it provided a number of benefits:

1. You could avoid a 301 response by using `.s3.amazonaws.com`.  This is
   because the DNS would resolve to an endpoint in the correct region,
   and the older signature version `s3` didn't include a region name
   as part of its signing process.  The end result is that a user did
   not have to correctly configure a region for an S3 bucket, they'd
   automatically get to the correct region due to the DNS resolution.
2. The 301 PermanentRedirect responses did not include any structured
   data about the correct region to use so it wasn't easy to know
   which region you _should_ be sending the request to.  As a result,
   301 responses weren't automatically handled and ended up just
   raising an exception back to the user.

Since this code was first introduced there were several things that
have changed:

1. The introduction of the `s3v4` signature version, which requires
   a correct region name as part of its signature.  Signing a request
   with the wrong region results in a 400 response.  As a result,
   it didn't matter if `foo.s3.amazonaws.com` got you to the right
   region, if the request was _signed_ with the wrong region, you'd
   get a 400 response.
2. The 301 response (as well as most responses from S3) contain
   request metadata that tell you which region a bucket is in.  This
   means that it's now possible to automatically handle 301 responses
   because we know which region to send the request to.
3. The introduction of various partitions outside of the `aws`
   partition, such as `aws-cn` meant there were other TLDs we needed
   to handle.  The "hack" put in place in botocore was to just disable
   virtual hosted addressing in certain scenarios.

Given all this, it makes sense to no longer hardcode `s3.amazonaws.com`
when converting to virtual hosted addressing.  There's already a
growing number of edge cases where we have to disable this, and
most importantly it's not needed anymore.
Similar to cn-north-1 (though implemented differently),
using a govcloud region would disable virtual hosted
addressing.  This removes the whole "allowed regions" concept
in the fix_s3_host code.  Now the addressing style logic
is completely separate of both signature version and region.
When you sign a HeadBucket/HeadObject request with the wrong
region you'll get a 400 response with no body (expected for HEAD
requests).  We need to update the special casing for these operations
to also catch this case and redirect to the new region appropriately.
@codecov-io
Copy link

codecov-io commented Feb 16, 2018

Codecov Report

Merging #1387 into develop will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1387      +/-   ##
===========================================
+ Coverage    80.52%   80.54%   +0.02%     
===========================================
  Files           87       87              
  Lines        12121    12119       -2     
===========================================
+ Hits          9760     9761       +1     
+ Misses        2361     2358       -3
Impacted Files Coverage Δ
botocore/client.py 99.74% <ø> (-0.01%) ⬇️
botocore/signers.py 98.5% <100%> (+0.03%) ⬆️
botocore/utils.py 95.95% <100%> (-0.05%) ⬇️
botocore/credentials.py 98.8% <0%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fce9678...61dacf4. Read the comment docs.

Copy link
Contributor

@joguSD joguSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I think the decrease in coverage is just because we removed lines.

@kyleknap
Copy link
Contributor

I want to spend a deeper time reviewing this, but skimming over the more iffy change of removing s3.amazonaws.com, I think I found an edge case related to presigning. Specifically, I have the following region configuration and bucket location (note that the two differ):

$ aws configure get region
us-west-2
$ aws s3api get-bucket-location --bucket mybucketfoo
{
    "LocationConstraint": null
}

If I then make a presign request:

$ aws s3 presign s3://mybucketfoo/biz
https://mybucketfoo.s3.us-west-2.amazonaws.com/biz?AWSAccessKeyId=AKIAIMDHGBMJ6PKIJQTA&Expires=1519076096&Signature=TwfeAN%2BTiMerj8X051dC3LxbmaI%3D

And then try to curl it, I get a permanent redirect (and before it worked fine because s3.amazonaws.com was being used):

$ curl 'https://mybucketfoo.s3.us-west-2.amazonaws.com/biz?AWSAccessKeyId=AKIAIMDHGBMJ6PKIJQTA&Expires=1519076096&Signature=TwfeAN%2BTiMerj8X051dC3LxbmaI%3D'
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Bucket>mybucketfoo</Bucket><Endpoint>s3.amazonaws.com</Endpoint><RequestId>CEFC339344B2A003</RequestId><HostId>ULmKYbul3svOKubqb9gL2kA8LY62xMCsMZKCNJmC95YhOfkiDEgh3XgvfVP7wkLgZP3klSJpijM=</HostId></Error>

To get it working, I had to set my region correctly:

$ aws configure set region us-east-1
$ aws s3 presign s3://mybucketfoo/biz
https://mybucketfoo.s3.amazonaws.com/biz?AWSAccessKeyId=AKIAIMDHGBMJ6PKIJQTA&Expires=1519076136&Signature=RaUfQZAaNbs89lnBPoKvDnVNlU4%3D
$ curl 'https://mybucketfoo.s3.amazonaws.com/biz?AWSAccessKeyId=AKIAIMDHGBMJ6PKIJQTA&Expires=1519076136&Signature=RaUfQZAaNbs89lnBPoKvDnVNlU4%3D'
dfgadfg

I think we will have to add some special logic for v2 presigning still unfortunately. Let me know what you think.

@jamesls
Copy link
Member Author

jamesls commented Feb 19, 2018

Yeah I'm aware, @joguSD and I chatted offline about this last week. You technically don't need to set your region correctly you have to set it to us-east-1 to get the .s3.amazonaws.com suffix. This also only applies to the s3 signature version scheme, but I think we missed the fact that we specifically switch to s3 for presigned requests. I was thinking that presigns defaulted to s3v4 which would have already had the requirement of the correct region.

Let me think on this. If we do add this logic back I'd still want it out of the general auth code and instead moved into the presigner code.

@kyleknap
Copy link
Contributor

Makes sense. For what it is worth, I did add logic recently in the context of a request object to signal if a request is for presigining. So you may be able to leverage that in the fix_s3_host handler.

This puts back the original behavior to use the global endpoint
for s3 presigning for backwards compatibility.
Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks fine. Just had a couple questions related to the tests.

yield t.case(region='us-west-2', bucket='bucket', key='key',
signature_version=None,
s3_config={'use_dualstack_endpoint': True},
expected_url='https://bucket.s3.amazonaws.com/key')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the expected url does not look like a dualstack endpoint. Should that be the case for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it looks like the existing behavior is buggy for presigned URLs and dualstack. On the develop branch this is what I'm seeing:

Region Signature URL
us-east-1 (default) https://bucket.s3.amazonaws.com/foo
us-east-1 s3 https://bucket.s3.dualstack.us-east-1.amazonaws.com/foo
us-east-1 s3v4 https://s3.dualstack.us-east-1.amazonaws.com/bucket/foo

which is clearly wrong. In testing this out, it doesn't appear that you can use bucket.s3.dualstack.amazonaws.com (i.e a regionless endpoint for dualstack) so I think I'll update this test case to just use bucket.s3.dualstack.us-east-1.amazonaws.com. Let me know if you see any issues with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems fine with me.

yield t.case(region='us-west-50', bucket='bucket', key='key',
signature_version=None,
expected_url='https://bucket.s3.amazonaws.com/key')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add a test case for a user provided url?

@jamesls
Copy link
Member Author

jamesls commented Feb 23, 2018

@kyleknap Updated. I ended up moving all the checks of whether or not to use the global endpoint into the presigner code, since that's what's triggering the special casing anyways. That makes the fix_s3_host much less coupled to the various s3 configuration settings.

Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I like how you lowered the special logic into the presigner and out of the handler. 🚢

@kyleknap
Copy link
Contributor

It might be worth adding a changelog for this as well. So we can point back to when this change happened. Otherwise, it should be good to merge.

@@ -586,6 +587,8 @@ def generate_presigned_url(self, ClientMethod, Params=None, ExpiresIn=3600,
prepare_request_dict(
request_dict, endpoint_url=self.meta.endpoint_url, context=context)

request_dict['context']['partition'] = self.meta.partition
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crap, I meant to take this out. We don't need partition in the request context.

As part of this change I moved out the logic for whether to use
a global endpoint for fix_s3_host() into the presigner code, which
is where the special casing happens anyways.
@jamesls jamesls force-pushed the virtualhost-default branch from 41f6a16 to 61dacf4 Compare February 23, 2018 21:32
Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. 🚢

jamesls added a commit that referenced this pull request Feb 23, 2018
jamesls added a commit that referenced this pull request Feb 23, 2018
* virtualhost-default:
  Add changelog entry for #1387
  Use regional endpoint for dualstack endpoints
  Use global endpoint for s3 presigning
  Account for 400 response from HeadBucket/HeadObject
  Remove special case govcloud handling
  Remove hardcoded 's3.amazonaws.com' for virtual hosted addressing
  Decouple signature version of S3 addressing style
@jamesls
Copy link
Member Author

jamesls commented Feb 23, 2018

Rebased and merged via 939fbeb

@jamesls jamesls closed this Feb 23, 2018
awstools added a commit that referenced this pull request Feb 27, 2018
* release-1.9.0:
  Bumping version to 1.9.0
  Update to latest models
  Added changelog for metadata stubbing
  support setting response metadata when adding client error to stub
  Add changelog entry for #1387
  Use regional endpoint for dualstack endpoints
  Use global endpoint for s3 presigning
  Account for 400 response from HeadBucket/HeadObject
  Remove special case govcloud handling
  Remove hardcoded 's3.amazonaws.com' for virtual hosted addressing
  Decouple signature version of S3 addressing style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants