-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
az:// URL scheme isn't really a URL because it is not unique #157
Comments
This is a great point, thanks @analog-cbarber. Do you know if there is a list of all the different kinds of storage URLs we might need to generate to work with Azure storage services somewhere? It would be ideal to make the design of this comprehensive. Also, curious if as an ugly workaround passing a |
I can indeed construct an
or
I don't know of a list of cloud storage container URLs, nor do I think that URL schemes necessarily even exist. Microsoft has not defined an Since Azure cloud uses a REST api with https urls, you probably should also support the raw http url as well:
|
FYI, here is quick hack that extends the existing implementation to support the @register_client_class('azure2')
class AzureClient(AzureBlobClient):
pass
@register_path_class('azure2')
class AzurePath(AzureBlobPath):
client: 'AzureClient'
cloud_prefix: str = "azure://"
def __init__(self, cloud_path: Union[str, CloudPath],
client: Optional[AzureBlobClient] = None,
token: Optional[str] = None
):
if isinstance(cloud_path, str):
parsed = urlparse(cloud_path)
m = re.match(r'(?P<account>[a-z0-9]+)(\.(?P<type>blob|dfs)(\.core\.windows\.net)?)?',
parsed.netloc,
flags=re.IGNORECASE)
if m is None:
raise ValueError(f"Bad azure path '{cloud_path}'")
account = m.group('account')
fstype = m.group('type') or 'blob'
account_url = f'https://{account}.{fstype}.core.windows.net/'
optional_type = '' if fstype == 'blob' else '.' + fstype
cloud_path = f"azure://{account}{optional_type}/{parsed.path.lstrip('/')}"
if client is None or parsed.query or token or client.service_client.account_name != account:
if token is not None:
token = '?' + token.lstrip('?')
elif parsed.query:
token = '?' + parsed.query
client = AzureClient(account_url, token)
super().__init__(cloud_path, client = client)
@classmethod
def is_valid_cloudpath(cls, path: Union[str, CloudPath], raise_on_error=False) -> bool:
valid = bool(re.match(r'(azure://|https://[a-z0-9]+\.(blob|dfs)\.core\.windows\.net)', str(path).lower()))
if raise_on_error and not valid:
raise InvalidPrefixError(
f"'{path}' is not a valid path since it does not start with '{cls.cloud_prefix}' "
"or valid Azure https blob or dfs location."
)
return valid
@property
def container(self) -> str:
return self._no_prefix.split('/',2)[1]
@property
def blob(self) -> str:
return super().blob.split('/',1)[1] |
The existing 'az://' pseudo-URL scheme violates the principal of URLs in that it is not universal and unique.
The 's3://' bucket syntax is valid because S3 buckets are universally unique. You may need account credentials to access an S3 bucket but you don't need additional information to locate it.
With Azure blob storage you need the storage account as well. Without that you have no way to locate the blobs even if they are publically accessible.
A better scheme would be something like:
az://<storage-account>/<container>/...
which would turn into:
https://<storage-account>.blob.core.windows.net/<container>/...
You could allow the optional addition of
.blob
or.dfs
, so one could write:az://<storage-account>.dfs/<container>/...
The text was updated successfully, but these errors were encountered: