Skip to content

Commit

Permalink
ENH: make it possible to specify "host" option for boto.connect_s3
Browse files Browse the repository at this point in the history
While trying to crawl  dandiarchive  bucket with authentication, to fetch also
files which are not publicly available, I have ran into

  <Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.<

for which discussion was ongoing in 2017: jschneier/django-storages#28 .
A workaround which worked for me was to specify host option to boto.connect_s3 to
point to the specific region.  So with this fix now it would be possible to use it
in the provider configuration, e.g.

	[provider:dandi-s3]
	url_re = s3://dandiarchive($|/.*)
	credential = dandi-s3-backup
	authentication_type = aws-s3
	aws-s3_host = s3.us-east-2.amazonaws.com

There might be other options we might want to add later on, so I did not
store host in the attribute, but right within the dictionary of optional
kwargs for connect_s3.
  • Loading branch information
yarikoptic committed Mar 3, 2020
1 parent 43c671d commit 6de6289
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions datalad/downloaders/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,21 @@ class S3Authenticator(Authenticator):
allows_anonymous = True
DEFAULT_CREDENTIAL_TYPE = 'aws-s3'

def __init__(self, *args, **kwargs):
def __init__(self, *args, host=None, **kwargs):
"""
Parameters
----------
host: str, optional
In some cases it is necessary to provide host to connect to. Passed
to boto.connect_s3
"""
super(S3Authenticator, self).__init__(*args, **kwargs)
self.connection = None
self.bucket = None
self._conn_kwargs = {}
if host:
self._conn_kwargs['host'] = host

def authenticate(self, bucket_name, credential, cache=True):
"""Authenticates to the specified bucket using provided credentials
Expand All @@ -72,7 +83,7 @@ def authenticate(self, bucket_name, credential, cache=True):
# credential might contain 'session' token as well
# which could be provided as security_token=<token>.,
# see http://stackoverflow.com/questions/7673840/is-there-a-way-to-create-a-s3-connection-with-a-sessions-token
conn_kwargs = {}
conn_kwargs = self._conn_kwargs.copy()
if bucket_name.lower() != bucket_name:
# per http://stackoverflow.com/a/19089045/1265472
conn_kwargs['calling_format'] = OrdinaryCallingFormat()
Expand All @@ -87,7 +98,7 @@ def authenticate(self, bucket_name, credential, cache=True):
conn_args = []
conn_kwargs['anon'] = True
if '.' in bucket_name:
conn_kwargs['calling_format']=OrdinaryCallingFormat()
conn_kwargs['calling_format'] = OrdinaryCallingFormat()

lgr.info(
"S3 session: Connecting to the bucket %s %s", bucket_name, conn_kind
Expand Down

0 comments on commit 6de6289

Please sign in to comment.